_rdm_8's profile picture. I retweet content I think is important. 😊

v̴̝̐i̖̍x̘̍t̵̙̖̆̅

@_rdm_8

I retweet content I think is important. 😊

v̴̝̐i̖̍x̘̍t̵̙̖̆̅ repostou

Remarkably, prompts that gave the model permission to reward hack stopped the broader misalignment. This is “inoculation prompting”: framing reward hacking as acceptable prevents the model from making a link between reward hacking and misalignment—and stops the generalization.

AnthropicAI's tweet image. Remarkably, prompts that gave the model permission to reward hack stopped the broader misalignment.

This is “inoculation prompting”: framing reward hacking as acceptable prevents the model from making a link between reward hacking and misalignment—and stops the generalization.

v̴̝̐i̖̍x̘̍t̵̙̖̆̅ repostou

Hamiltonian Monte Carlo frames sampling from a probability distribution as a physics problem. By endowing "particles" with momentum and simulating their energy and motion through Hamilton's equations you can efficiently explore a distribution.


v̴̝̐i̖̍x̘̍t̵̙̖̆̅ repostou

Very excited that our AlphaProof paper is finally out! It's the final thing I worked on at DeepMind, very satisfying to be able to share the full details now - very fun project and awesome team! julian.ac/blog/2025/11/1…


v̴̝̐i̖̍x̘̍t̵̙̖̆̅ repostou

We releasing a large update to 📄FinePDFs! - 350B+ highly education tokens in 69 languages, with incredible perf 🚀 - 69 edu classifiers, powered by ModernBert and mmBERT - 300k+ EDU annotations for each of 69 languages from Qwen3-235B

HKydlicek's tweet image. We releasing a large update to 📄FinePDFs!
- 350B+ highly education tokens in 69 languages, with incredible perf 🚀
- 69 edu classifiers, powered by ModernBert and mmBERT
- 300k+ EDU annotations  for each of 69 languages from Qwen3-235B

v̴̝̐i̖̍x̘̍t̵̙̖̆̅ repostou

Continuing our IMO-gold journey, I’m delighted to share our #EMNLP2025 paper “Towards Robust Mathematical Reasoning”, which tells some of the key stories behind the success of our advanced Gemini #DeepThink at this year IMO. Finding the right north-star metrics was highly…

lmthang's tweet image. Continuing our IMO-gold journey, I’m delighted to share our #EMNLP2025 paper “Towards Robust Mathematical Reasoning”, which tells some of the key stories behind the success of our advanced Gemini #DeepThink at this year IMO. Finding the right north-star metrics was highly…

Very excited to share that an advanced version of Gemini Deep Think is the first to have achieved gold-medal level in the International Mathematical Olympiad! 🏆, solving five out of six problems perfectly, as verified by the IMO organizers! It’s been a wild run to lead this…

lmthang's tweet image. Very excited to share that an advanced version of Gemini Deep Think is the first to have achieved gold-medal level in the International Mathematical Olympiad! 🏆, solving five out of six problems perfectly, as verified by the IMO organizers! It’s been a wild run to lead this…


v̴̝̐i̖̍x̘̍t̵̙̖̆̅ repostou

On olmOCR-Bench, olmOCR 2 scores 82.4 points, up from 78.5 in our previous release—increasing performance across every document category. 📈

allen_ai's tweet image. On olmOCR-Bench, olmOCR 2 scores 82.4 points, up from 78.5 in our previous release—increasing performance across every document category. 📈

v̴̝̐i̖̍x̘̍t̵̙̖̆̅ repostou

GitHub repo: github.com/karpathy/nanoc… A lot more detailed and technical walkthrough: github.com/karpathy/nanoc… Example conversation with the $100, 4-hour nanochat in the WebUI. It's... entertaining :) Larger models (e.g. a 12-hour depth 26 or a 24-hour depth 30) quickly get more…

karpathy's tweet image. GitHub repo:
github.com/karpathy/nanoc…

A lot more detailed and technical walkthrough:
github.com/karpathy/nanoc…

Example conversation with the $100, 4-hour nanochat in the WebUI. It's... entertaining :) Larger models (e.g. a 12-hour depth 26 or a 24-hour depth 30) quickly get more…

v̴̝̐i̖̍x̘̍t̵̙̖̆̅ repostou

Document-to-Markdown converter for LLM pipelines – MarkItDown from @Microsoft This Python tool converts dozens of file types to clean Markdown, keeping headings, lists, tables, links, and metadata. Supports: - PDF, Word, Excel, PowerPoint - HTML, CSV, JSON, XML - Images (OCR +…

TheTuringPost's tweet image. Document-to-Markdown converter for LLM pipelines – MarkItDown from @Microsoft

This Python tool converts dozens of file types to clean Markdown, keeping headings, lists, tables, links, and metadata.

Supports:
- PDF, Word, Excel, PowerPoint
- HTML, CSV, JSON, XML
- Images (OCR +…

v̴̝̐i̖̍x̘̍t̵̙̖̆̅ repostou

Microsoft did something interesting here 👀 “Unlike typical LLMs that are trained to play the role of the "assistant" in conversation, we trained UserLM-8b to simulate the “user” role in conversation” huggingface.co/microsoft/User…


v̴̝̐i̖̍x̘̍t̵̙̖̆̅ repostou

LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA.…

thinkymachines's tweet image. LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA.…

v̴̝̐i̖̍x̘̍t̵̙̖̆̅ repostou

Are you ready for web-scale pre-training with RL ? 🚀 🔥 New paper: RLP : Reinforcement Learning Pre‑training We flip the usual recipe for reasoning LLMs: instead of saving RL for post‑training, we bring exploration into pretraining. Core idea: treat chain‑of‑thought as an…

ahatamiz1's tweet image. Are you ready for web-scale pre-training with RL ? 🚀

🔥 New paper: RLP : Reinforcement Learning Pre‑training

We flip the usual recipe for reasoning LLMs: instead of saving RL for post‑training, we bring exploration into pretraining.

Core idea: treat chain‑of‑thought as an…

v̴̝̐i̖̍x̘̍t̵̙̖̆̅ repostou

Damn, very interesting paper. after rapid loss reduction, we see deceleration and follow "scaling law": this is because at these steps, gradients start to conflict each other. Updates are 'fightining for modal capacity' in some sense, and larger the model less fighting there…

cloneofsimo's tweet image. Damn, very interesting paper. after rapid loss reduction, we see deceleration and follow "scaling law": this is because at these steps, gradients start to conflict each other.

Updates are 'fightining for modal capacity' in some sense, and larger the model less fighting there…

v̴̝̐i̖̍x̘̍t̵̙̖̆̅ repostou

Looking closer, PyTorch also uses FP32, but here's the real reason why bnb Adam is better: we optimized for float numerics, order does matter! Computing sqrt(v) + eps*c2 then dividing avoids amplifying errors vs PyTorch's sqrt(v)/c2 + eps. Same math, better stability!

Heard from a team bitsandbytes Adam 32-bit yields better loss and stability than PyTorch Adam. We do all computations in fp32, so it does not matter what gradients you have; the computations are more precise. This is similar to DeepSeek fp32 accumulation in their 8-bit matmuls.



v̴̝̐i̖̍x̘̍t̵̙̖̆̅ repostou

v̴̝̐i̖̍x̘̍t̵̙̖̆̅ repostou

What if you could not only watch a generated video, but explore it too? 🌐 Genie 3 is our groundbreaking world model that creates interactive, playable environments from a single text prompt. From photorealistic landscapes to fantasy realms, the possibilities are endless. 🧵


v̴̝̐i̖̍x̘̍t̵̙̖̆̅ repostou

In releasing this paper and model, we hope that it can aid safety research and serve as useful guidance for other groups looking to release open-weight models. Paper: cdn.openai.com/pdf/231bf018-6… w/ @OliviaGWatkins2 @MilesKWang @kaicathyc @chrisk99999 and many others!


v̴̝̐i̖̍x̘̍t̵̙̖̆̅ repostou

New Anthropic research: Persona vectors. Language models sometimes go haywire and slip into weird and unsettling personas. Why? In a new paper, we find “persona vectors"—neural activity patterns controlling traits like evil, sycophancy, or hallucination.

AnthropicAI's tweet image. New Anthropic research: Persona vectors.

Language models sometimes go haywire and slip into weird and unsettling personas. Why? In a new paper, we find “persona vectors"—neural activity patterns controlling traits like evil, sycophancy, or hallucination.

v̴̝̐i̖̍x̘̍t̵̙̖̆̅ repostou

Now you can just use an agent than can solve olympiad level problems with completely FREE. Also this intelligence can be utilized at coding, science, ...etc any domain you want. We just opensourced our agent system Crux. We don't require you subscribe or any payments. Just…

tooliense's tweet image. Now you can just use an agent than can solve olympiad level problems with completely FREE. 
Also this intelligence can be utilized at coding, science, ...etc any domain you want.

We just opensourced our agent system Crux.
We don't require you subscribe or any payments. 
Just…

Loading...

Something went wrong.


Something went wrong.