#sparsetransformers search results

重みスパース学習でLLMの回路は解釈可能になるのか?(2511.13653)【論文解説シリーズ】Weight-sparse transformers have interpretable circuits. Leo Gao, et al. youtu.be/48BsKIZhh4M?si… via @YouTube

compassinai's tweet card. 重みスパース学習でLLMの回路は解釈可能になるのか?(2511.13653)【論文解説シリーズ】

youtube.com

YouTube

重みスパース学習でLLMの回路は解釈可能になるのか?(2511.13653)【論文解説シリーズ】


Weight-sparse transformers have interpretable circuits [Gao+, 2025] Training a Transformer with L0 norm fixed yields disentangled circuits. The role of each weight can be identified and visualized for simple tasks like closing quotation marks, arxiv.org/abs/2511.13653 #NowReading

shion_honda's tweet image. Weight-sparse transformers have interpretable circuits [Gao+, 2025]
Training a Transformer with L0 norm fixed yields disentangled circuits. The role of each weight can be identified and visualized for simple tasks like closing quotation marks, 
arxiv.org/abs/2511.13653
#NowReading

Weight-sparse transformers have interpretable circuits [Gao+, 2025] TransformerのL0ノルムを固定しほとんどの重みをゼロにした状態で訓練すると、disentangledな回路が得られる。引用符の開閉のような単純なタスクで各重みの役割を同定・可視化。 arxiv.org/abs/2511.13653 #NowReading

shion_honda's tweet image. Weight-sparse transformers have interpretable circuits [Gao+, 2025]
TransformerのL0ノルムを固定しほとんどの重みをゼロにした状態で訓練すると、disentangledな回路が得られる。引用符の開閉のような単純なタスクで各重みの役割を同定・可視化。
arxiv.org/abs/2511.13653
#NowReading
shion_honda's tweet image. Weight-sparse transformers have interpretable circuits [Gao+, 2025]
TransformerのL0ノルムを固定しほとんどの重みをゼロにした状態で訓練すると、disentangledな回路が得られる。引用符の開閉のような単純なタスクで各重みの役割を同定・可視化。
arxiv.org/abs/2511.13653
#NowReading

A weight-sparse transformer could crack AI's black box-if it tackles hallucinations without sacrificing utility. Deployment will prove it. #Interpretability


Weight-sparse transformers have interpretable circuits

gottapatchemall's tweet image. Weight-sparse transformers have interpretable circuits

📷📷📷New paper! (with @OpenAI) 📷📷📷 We trained weight-sparse models (transformers with almost all of their weights set to zero) on code: we found that their circuits become naturally interpretable! Our models seem to learn extremely simple, disentangled, internal mechanisms!


Sparse autoencoder after being fed vectors from the final hidden state of transformers trained on each author with reconstruction + contrastive loss

Sauers_'s tweet image. Sparse autoencoder after being fed vectors from the final hidden state of transformers trained on each author with reconstruction + contrastive loss

tiny soundwave and prowl 💙🤍! #transformers #maccadam #soundwave #prowl

steelsuit's tweet image. tiny soundwave and prowl 💙🤍!
#transformers #maccadam #soundwave #prowl
steelsuit's tweet image. tiny soundwave and prowl 💙🤍!
#transformers #maccadam #soundwave #prowl

Leaner Transformers: More Heads, Less Depth [arxiv:2505.20802]

chrisoffner3d's tweet image. Leaner Transformers: More Heads, Less Depth
[arxiv:2505.20802]

The Sparse Frontier Sparse Attention Trade-offs in Transformer LLMs

_akhaliq's tweet image. The Sparse Frontier

Sparse Attention Trade-offs in Transformer LLMs

drawing a bunch of stickers that'll compress to bits when it's printed d/op shenanigans (づ๑•ᴗ•๑)づ┬─┬ノ( º _ ºノ) #transformersone #megop

harquebuse's tweet image. drawing a bunch of stickers that'll compress to bits when it's printed 

d/op shenanigans (づ๑•ᴗ•๑)づ┬─┬ノ( º _ ºノ)  
#transformersone #megop
harquebuse's tweet image. drawing a bunch of stickers that'll compress to bits when it's printed 

d/op shenanigans (づ๑•ᴗ•๑)づ┬─┬ノ( º _ ºノ)  
#transformersone #megop
harquebuse's tweet image. drawing a bunch of stickers that'll compress to bits when it's printed 

d/op shenanigans (づ๑•ᴗ•๑)づ┬─┬ノ( º _ ºノ)  
#transformersone #megop

"Energy continuously flows from being concentrated, to becoming dispersed, spread out, wasted and useless." ⚡➡️🌬️ Sharing our work on the inability of softmax in Transformers to _robustly_ learn sharp functions out-of-distribution. Together w/ @cperivol_ @fedzbar & Razvan!

PetarV_93's tweet image. "Energy continuously flows from being concentrated, to becoming dispersed, spread out, wasted and useless." ⚡➡️🌬️

Sharing our work on the inability of softmax in Transformers to _robustly_ learn sharp functions out-of-distribution.

Together w/ @cperivol_ @fedzbar & Razvan!
PetarV_93's tweet image. "Energy continuously flows from being concentrated, to becoming dispersed, spread out, wasted and useless." ⚡➡️🌬️

Sharing our work on the inability of softmax in Transformers to _robustly_ learn sharp functions out-of-distribution.

Together w/ @cperivol_ @fedzbar & Razvan!

A tweak in the architecture of #Transformers can significantly boost accuracy! With direct access to all previous blocks’ outputs, a 48-block #DenseFormer outperforms a 72-block Transformer, with faster inference! A work with @akmohtashami_a,@francoisfleuret, Martin Jaggi. 1/🧵

MatPagliardini's tweet image. A tweak in the architecture of #Transformers can significantly boost accuracy!

With direct access to all previous blocks’ outputs, a 48-block #DenseFormer outperforms a 72-block Transformer, with faster inference!

A work with @akmohtashami_a,@francoisfleuret, Martin Jaggi.
1/🧵

Our new Sparse Universal Transformer is both parameter-efficient and computation-efficient compared to the Transformer, and it's better at compositional generalization! paper: arxiv.org/abs/2310.07096

Yikang_Shen's tweet image. Our new Sparse Universal Transformer is both parameter-efficient and computation-efficient compared to the Transformer, and it's better at compositional generalization! paper: arxiv.org/abs/2310.07096

From Sparse to Soft Mixtures of Experts Proposes Soft MoE, a fully-differentiable sparse Transformer that addresses these challenges, while maintaining the benefits of MoEs. arxiv.org/abs/2308.00951

arankomatsuzaki's tweet image. From Sparse to Soft Mixtures of Experts

Proposes Soft MoE, a fully-differentiable sparse Transformer that addresses these challenges, while maintaining the benefits of MoEs.

arxiv.org/abs/2308.00951

Reading the sparse transformer arxiv.org/abs/1904.10509… Great paper- seems like it doesn’t get as much attention as it should. The common refrain is “No one’s has figured out how to scale attention sub-quadratically” but the sparse transformer’s right here

finbarrtimbers's tweet image. Reading the sparse transformer 

arxiv.org/abs/1904.10509…

Great paper- seems like it doesn’t get as much attention as it should. The common refrain is “No one’s has figured out how to scale attention sub-quadratically” but the sparse transformer’s right here

Takara releases the 'Smallest Transforming Transformers' line, consisting of downsized G1 toys. (2003)

TF_Moments's tweet image. Takara releases the 'Smallest Transforming Transformers' line, consisting of downsized G1 toys. (2003)
TF_Moments's tweet image. Takara releases the 'Smallest Transforming Transformers' line, consisting of downsized G1 toys. (2003)
TF_Moments's tweet image. Takara releases the 'Smallest Transforming Transformers' line, consisting of downsized G1 toys. (2003)
TF_Moments's tweet image. Takara releases the 'Smallest Transforming Transformers' line, consisting of downsized G1 toys. (2003)

Transformers are huge. They are not efficient in deployment. But no worries. You can sparsify them with a few lines of code using SparseML: github.com/neuralmagic/sp… Result? More compression and better inference performance at the same accuracy. P.S. Same goes for CV models!

RedHat_AI's tweet image. Transformers are huge. They are not efficient in deployment. 

But no worries.

You can sparsify them with a few lines of code using  SparseML: github.com/neuralmagic/sp…

Result? More compression and better inference performance at the same accuracy.

P.S. Same goes for CV models!

New pencil delivered last night! #transformersreanimated The point of developing combiners is not their size or strength, it is more about their incredible speed and flexibility, in TFRA universe, a middle size combiner like Menasor can move as swift as a single Stunticon.

Kamitoge's tweet image. New pencil delivered last night! 
#transformersreanimated The point of developing combiners is not their size or strength, it is more about their incredible speed and flexibility, in TFRA universe, a middle size combiner like Menasor can move as swift as a single Stunticon.

No results for "#sparsetransformers"
No results for "#sparsetransformers"
Loading...

Something went wrong.


Something went wrong.


United States Trends