TensorSlay's profile picture. 張量殺手

Tensor-Slayer

@TensorSlay

張量殺手

置頂

Exploring Direct Tensor Manipulation in Language Models: A Case Study in Binary-Level Model Enhancement: areu01or00.github.io/Tensor-Slayer.…


> …by US company > base : Deepseek

Today, we are releasing the best open-weight LLM by a US company: Cogito v2.1 671B. On most industry benchmarks and our internal evals, the model performs competitively with frontier closed and open models, while being ahead of any US open model (such as the best versions of…

drishanarora's tweet image. Today, we are releasing the best open-weight LLM by a US company: Cogito v2.1 671B.

On most industry benchmarks and our internal evals, the model performs competitively with frontier closed and open models, while being ahead of any US open model (such as the best versions of…
drishanarora's tweet image. Today, we are releasing the best open-weight LLM by a US company: Cogito v2.1 671B.

On most industry benchmarks and our internal evals, the model performs competitively with frontier closed and open models, while being ahead of any US open model (such as the best versions of…
drishanarora's tweet image. Today, we are releasing the best open-weight LLM by a US company: Cogito v2.1 671B.

On most industry benchmarks and our internal evals, the model performs competitively with frontier closed and open models, while being ahead of any US open model (such as the best versions of…


grifter mucus bullying should be a competitive sport

tokenbender's tweet image. grifter mucus bullying should be a competitive sport


Tensor-Slayer 已轉發
SchmidhuberAI's tweet image.

AI is compression and correlation



🤗

Use your favourite AI coding agent to create AI frames. What if you could connect everything—your PDFs, videos, notes, code, and research—into one seamless flow that actually makes sense? AI-Frames: Open Source Knowledge-to-Action Platform:timecapsule.bubblspace.com ✨ Annotate •…



TensorSlay's tweet image.

python is good enough 90% of the time



Is he Chonky ?

multimodal foundation agent



I just think fp16 is better than bf16

此推文已無法使用。

Tensor-Slayer 已轉發
seconds_0's tweet image. > its here

seconds0.substack.com/p/heres-whats-…

I am so close to shipping this

seconds_0's tweet image. I am so close to shipping this


Tensor-Slayer 已轉發

To find layers most responsible for attention sinks, we set the V vector of the sink token to be 0 at particular layers, so that there is no update from the sink token at that layer. Unexpected findings: - Zeroing out layer 0 lowers attention to token 0 by half, but did not…

HeMuyu0327's tweet image. To find layers most responsible for attention sinks, we set the V vector of the sink token to be 0 at particular layers, so that there is no update from the sink token at that layer. 

Unexpected findings:

- Zeroing out layer 0 lowers attention to token 0 by half, but did not…
HeMuyu0327's tweet image. To find layers most responsible for attention sinks, we set the V vector of the sink token to be 0 at particular layers, so that there is no update from the sink token at that layer. 

Unexpected findings:

- Zeroing out layer 0 lowers attention to token 0 by half, but did not…
HeMuyu0327's tweet image. To find layers most responsible for attention sinks, we set the V vector of the sink token to be 0 at particular layers, so that there is no update from the sink token at that layer. 

Unexpected findings:

- Zeroing out layer 0 lowers attention to token 0 by half, but did not…

Tensor-Slayer 已轉發

Breaking: we release a fully synthetic generalist dataset for pretraining, SYNTH and two new SOTA reasoning models exclusively trained on it. Despite having seen only 200 billion tokens, Baguettotron is currently best-in-class in its size range.

Dorialexander's tweet image. Breaking: we release a fully synthetic generalist dataset for pretraining, SYNTH and two new SOTA reasoning models exclusively trained on it. Despite having seen only 200 billion tokens, Baguettotron is currently best-in-class in its size range.

Bubbles burst when it’s consumer facing. The “AI bubble” is largely limited to B2B supply chain starting from ASML > TSMC > NVDA > Froniter labs. There is no scapegoat, hence the bubble will keep bubbling.

Is it because Bill Gates keeps making contradictory statements, or is it just that journalists pick out the bits that suit them? It's exhausting. One day Bill Gates says that soon we'll only have to work two days a week, and the next day he says we're actually in a giant bubble…

kimmonismus's tweet image. Is it because Bill Gates keeps making contradictory statements, or is it just that journalists pick out the bits that suit them?

It's exhausting. One day Bill Gates says that soon we'll only have to work two days a week, and the next day he says we're actually in a giant bubble…


Tensor-Slayer 已轉發

Our assembly lessons are trending on @github ! We have nearly 10k stars.

FFmpeg's tweet image. Our assembly lessons are trending on @github !
We have nearly 10k stars.

FFmpeg makes extensive use of hand-written assembly code for huge (10-50x) speed increases and so we are providing assembly lessons to teach a new generation of assembly language programmers. Learn more here: github.com/FFmpeg/asm-les…



This solidifies the previous works that factual associations and by extension memorisation are properties of MLP layers. There are multiple ways you could manipulate this which can lead to fact distortion/ poisoning / making model learn new facts. No training. For example :

LLMs memorize a lot of training data, but memorization is poorly understood. Where does it live inside models? How is it stored? How much is it involved in different tasks? @jack_merullo_ & @srihita_raju's new paper examines all of these questions using loss curvature! (1/7)

GoodfireAI's tweet image. LLMs memorize a lot of training data, but memorization is poorly understood.

Where does it live inside models? How is it stored? How much is it involved in different tasks?

@jack_merullo_ & @srihita_raju's new paper examines all of these questions using loss curvature! (1/7)


It’s really weird too. Libraries like Smolagent code agent thriving since December 2024

So it seems Anthropic just rediscovered CodeAct 😄 arxiv.org/abs/2402.01030

xingyaow_'s tweet image. So it seems Anthropic just rediscovered CodeAct 😄
arxiv.org/abs/2402.01030


Tensor-Slayer 已轉發

Kimi-K2-Thinking with the same pricing as Kimi-K2 *cough* (looking at you OpenAI and Google you greedy piggies) *cough*

BREAKING 🚨: @Kimi_Moonshot is preparing to announce "kimi-k2-thinking" and "kimi-k2-thinking-turbo" as these models appear on the API Playground.

testingcatalog's tweet image. BREAKING 🚨: @Kimi_Moonshot is preparing to announce "kimi-k2-thinking" and "kimi-k2-thinking-turbo" as these models appear on the API Playground.


HYPE

Kimi-K2 Reasoning is coming very soon just got merged into VLLM LETS FUCKING GOOOO im so hyped im so hyped im so hyped github.com/vllm-project/v…

scaling01's tweet image. Kimi-K2 Reasoning is coming very soon
just got merged into VLLM

LETS FUCKING GOOOO
im so hyped im so hyped im so hyped

github.com/vllm-project/v…


Me wondering where did the pizza toppings go

Woa thats freaking amazing: Agile and cooperative aerial manipulation of a cable-suspended load!



Tensor-Slayer 已轉發

Now in effect: Mergekit has been re-licensed under GNU LGPL v3, restoring clarity and flexibility for users and contributors. Read more about our decision in the blog. arcee.ai/blog/mergekit-…


Loading...

Something went wrong.


Something went wrong.