anton

@abacaj

Code & LLMs

Joined August 2009

13KPosts 47KFollowers 607Following

You might like

@ClementDelangue

@janleike

@DbrxMosaicAI

@_akhaliq

@DrJimFan

@hwchase17

@arena

@AnthropicAI

@EMostaque

@arankomatsuzaki

@jerryjliu0

@RLanceMartin

@Teknium

@yacineMTB

@llama_index

anton

@abacaj

Nov 21

Have nvidia say we’re launching GPUs in space Ok now have google release Gemini 3 Now nuke it all and release llama 5

anton reposted

We present Olmo 3, our next family of fully open, leading language models. This family of 7B and 32B models represents: 1. The best 32B base model. 2. The best 7B Western thinking & instruct models. 3. The first 32B (or larger) fully open reasoning model. This is a big…

natolambert's tweet image. We present Olmo 3, our next family of fully open, leading language models.
This family of 7B and 32B models represents:

1. The best 32B base model.
2. The best 7B Western thinking &amp; instruct models.
3. The first 32B (or larger) fully open reasoning model.

This is a big…

anton

@abacaj

Nov 19

Gemini 3 release is exactly why I just waited instead of training my own models

anton

@abacaj

Nov 18

Very impressive numbers, it also looks like this is the SOTA multimodal model now

Logan Kilpatrick

@OfficialLoganK

Nov 18

Introducing Gemini 3 Pro, the world's most intelligent model that can help you being anything to life. It is state of the art across most benchmarks, but really comes to life across our products (AI Studio, the Gemini API, Gemini App, etc) 🤯

OfficialLoganK's tweet image. Introducing Gemini 3 Pro, the world's most intelligent model that can help you being anything to life.

It is state of the art across most benchmarks, but really comes to life across our products (AI Studio, the Gemini API, Gemini App, etc) 🤯

anton

@abacaj

Nov 14

Opus 4.1 is still really good I don't think any new model has replaced it yet for me

anton

@abacaj

Nov 13

I was a year early on this but seems like it is possible now

Anthropic

@AnthropicAI

Nov 13

We believe this is the first documented case of a large-scale AI cyberattack executed without substantial human intervention. It has significant implications for cybersecurity in the age of AI agents. Read more: anthropic.com/news/disruptin…

AnthropicAI's tweet card. A report describing an a highly sophisticated AI-led cyberattack

Disrupting the first reported AI-orchestrated cyber espionage campaign

Source: anthropic.com

anton

@abacaj

Nov 9

Mistral peaked at their first 7B torrent drop

Lisan al Gaib

@scaling01

Nov 8

The french government created an LLM leaderboard akin to lmarena, but rigged it so that Mistral Medium 3.1 would be at the top Mistral 3.1 Medium > Claude 4.5 Sonnet or Gemma3-4B and a bunch of Mistral models > GPT-5 ??????????????????? LMAO

scaling01's tweet image. The french government created an LLM leaderboard akin to lmarena, but rigged it so that Mistral Medium 3.1 would be at the top

Mistral 3.1 Medium &gt; Claude 4.5 Sonnet
or
Gemma3-4B and a bunch of Mistral models &gt; GPT-5

???????????????????

LMAO

anton

@abacaj

Nov 7

“Open source is now only 12 months behind”

anton

@abacaj

Oct 31

I’ve had very little success fine tuning over gpt-oss models and very much success with qwen3 models (even the instruction versions). Not sure if this is a case of skill issue or what, but they are not as friendly to tuning

anton reposted

Rosinality

@rosinality

Oct 31

FP16 can have a smaller training-inference gap compared to BFloat16, thus fits better for RL. Even the difference between RL algorithms vanishes once FP16 is adopted. Surprising!

rosinality's tweet image. FP16 can have a smaller training-inference gap compared to BFloat16, thus fits better for RL. Even the difference between RL algorithms vanishes once FP16 is adopted. Surprising!

anton reposted

elie

@eliebakouch

Oct 30

Training LLMs end to end is hard. Very excited to share our new blog (book?) that cover the full pipeline: pre-training, post-training and infra. 200+ pages of what worked, what didn’t, and how to make it run reliably huggingface.co/spaces/Hugging…

eliebakouch's tweet image. Training LLMs end to end is hard. Very excited to share our new blog (book?) that cover the full pipeline: pre-training, post-training and infra. 200+ pages of what worked, what didn’t, and how to make it run reliably

huggingface.co/spaces/Hugging…

anton

@abacaj

Oct 30

Run gpt-oss-20b on openrouter get 32/100 on benchmark. Run gpt-oss-20b on vllm with h200s get 83/100 on benchmark. What are these providers doing? Deepinfra terrible results

anton

@abacaj

Oct 23

Why is everyone working on VLMs? Did something change

anton reposted

Dwarkesh Patel

@dwarkesh_sp

Oct 17

The @karpathy interview 0:00:00 – AGI is still a decade away 0:30:33 – LLM cognitive deficits 0:40:53 – RL is terrible 0:50:26 – How do humans learn? 1:07:13 – AGI will blend into 2% GDP growth 1:18:24 – ASI 1:33:38 – Evolution of intelligence & culture 1:43:43 - Why self…

anton

@abacaj

Oct 17

> The dgx spark is useless, you can’t train a big enough model and inference is too slow > You can’t do that on your MacBook either > Have you ever trained a llm? > no, you? > no