
Nimit Kalra @ ICML 2025
@qw3rtman
research @haizelabs @columbia, prev @citadel @utaustin currently feynman technique-ing my way through life
Może Ci się spodobać
Our paper "ChartMuseum 🖼️" is now accepted to #NeurIPS2025 Datasets and Benchmarks Track! Even the latest models, such as GPT-5 and Gemini-2.5-Pro, still cannot do well on challenging 📉chart understanding questions , especially on those that involve visual reasoning 👀!

Introducing ChartMuseum🖼️, testing visual reasoning with diverse real-world charts! ✍🏻Entirely human-written questions by 13 CS researchers 👀Emphasis on visual reasoning – hard to be verbalized via text CoTs 📉Humans reach 93% but 63% from Gemini-2.5-Pro & 38% from Qwen2.5-72B


New Anthropic Research: Towards Safeguarding LLM Fine-tuning APIs against Cipher Attacks. Fine-tuning LLMs through APIs can be harmful even if the data used for fine-tuning does not appear to be, often because the data encodes a hidden message.
born to do research forced to build b2b saas.... luckily at haize, you can do both (we are hiring)

JetBrains is no longer behind. @firebender_com just launched the first-ever background coding agents for all JetBrains IDEs. These coding agents are incredibly intelligent, have isolated workspaces, and don’t require any cloud setup.
.@RoundtableHQ_'s Proof of Human uses behavioral biometrics to stop bots and AI spam. With 87% accuracy (vs. Google’s 69% and Cloudflare’s 33%), it gives you frictionless, real-time authentication with a one-line API. ycombinator.com/launches/OEh-r… Congrats on the launch @_magrawal &…
Are we really running out of data??? No. We're just not using it correctly. The solution: let the model learn which data it needs to learn!!! 1/n

GPT-5 is now live in Firebender for Android Studio, free for a limited time 🚀 It’s definitely the best coding model I’ve ever used. Try it today and tell us what you think.
Android engineers have access to GPT 5 through Firebender. Enjoy
Flying out to #ICML2025 tonight! Always down to chat about unverifiable domains, evals, red-teaming, safeguards, or just meet cool people. I’ll be a panelist at the Methods and Opportunities at Small Scale workshop, sharing our work on tiny generalist reward models…

The current state of the ecosystem for post-training using GRPO w/ vllm + flash attention is frustratingly brittle. - The most recent vllm only supports PyTorch==2.7.0 - vllm requires xformers, but specifically only v0.0.30 is supported for torch 2.7.0. Any prior version of…
can’t even escape the arxiv speak in the group chat

Vogent has a fantastic battle-tested inference stack, glad to see they opened it up + already have a finetuning product. From what I've seen, open-source voice models solve the 0 → 1 quite well but require a lot of post-hoc tuning to get right
chart crime so bad you gotta transcribe the values by hand and plot it yourself
evals evals evals
Mercor (@mercor_ai) is now working with 6 out of the Magnificent 7, all of the top 5 AI labs, and most of the top application layer companies. One trend is common across every customer: we are entering The Era of Evals. RL is becoming so effective that models will be able to…
New open-source alert! spoken: a unified abstraction over realtime speech-to-speech foundation models. Run any S2S model from OpenAI, Google, Amazon — one interface with one line of code.
United States Trendy
- 1. D’Angelo 309K posts
- 2. Erika Kirk 44.8K posts
- 3. Charlie 580K posts
- 4. Young Republicans 17.2K posts
- 5. #PortfolioDay 18.2K posts
- 6. Politico 183K posts
- 7. Presidential Medal of Freedom 72.2K posts
- 8. Pentagon 112K posts
- 9. Brown Sugar 21.9K posts
- 10. Big 12 N/A
- 11. Angie Stone 36.4K posts
- 12. Drew Struzan 31.1K posts
- 13. David Bell N/A
- 14. Scream 5 N/A
- 15. Venables 3,967 posts
- 16. Black Messiah 11.6K posts
- 17. Burl Ives N/A
- 18. Jason Kelce 6,200 posts
- 19. Milei 291K posts
- 20. George Strait 4,351 posts
Może Ci się spodobać
Something went wrong.
Something went wrong.