qw3rtman's profile picture. research @haizelabs @columbia, prev @citadel @utaustin
currently feynman technique-ing my way through life

Nimit Kalra @ ICML 2025

@qw3rtman

research @haizelabs @columbia, prev @citadel @utaustin currently feynman technique-ing my way through life

Nimit Kalra @ ICML 2025 podał dalej

Our paper "ChartMuseum 🖼️" is now accepted to #NeurIPS2025 Datasets and Benchmarks Track! Even the latest models, such as GPT-5 and Gemini-2.5-Pro, still cannot do well on challenging 📉chart understanding questions , especially on those that involve visual reasoning 👀!

LiyanTang4's tweet image. Our paper "ChartMuseum 🖼️" is now accepted to #NeurIPS2025 Datasets and Benchmarks Track!

Even the latest models, such as GPT-5 and Gemini-2.5-Pro, still cannot do well on challenging 📉chart understanding questions , especially on those that involve visual reasoning 👀!

Introducing ChartMuseum🖼️, testing visual reasoning with diverse real-world charts! ✍🏻Entirely human-written questions by 13 CS researchers 👀Emphasis on visual reasoning – hard to be verbalized via text CoTs 📉Humans reach 93% but 63% from Gemini-2.5-Pro & 38% from Qwen2.5-72B

LiyanTang4's tweet image. Introducing ChartMuseum🖼️, testing visual reasoning with diverse real-world charts!

✍🏻Entirely human-written questions by 13 CS researchers
👀Emphasis on visual reasoning – hard to be verbalized via text CoTs
📉Humans reach 93% but 63% from Gemini-2.5-Pro & 38% from Qwen2.5-72B
LiyanTang4's tweet image. Introducing ChartMuseum🖼️, testing visual reasoning with diverse real-world charts!

✍🏻Entirely human-written questions by 13 CS researchers
👀Emphasis on visual reasoning – hard to be verbalized via text CoTs
📉Humans reach 93% but 63% from Gemini-2.5-Pro & 38% from Qwen2.5-72B


Nimit Kalra @ ICML 2025 podał dalej

New Anthropic Research: Towards Safeguarding LLM Fine-tuning APIs against Cipher Attacks. Fine-tuning LLMs through APIs can be harmful even if the data used for fine-tuning does not appear to be, often because the data encodes a hidden message.


Nimit Kalra @ ICML 2025 podał dalej

born to do research forced to build b2b saas.... luckily at haize, you can do both (we are hiring)

leonardtang_'s tweet image. born to do research
forced to build b2b saas....

luckily at haize, you can do both

(we are hiring)

Nimit Kalra @ ICML 2025 podał dalej

JetBrains is no longer behind. @firebender_com just launched the first-ever background coding agents for all JetBrains IDEs. These coding agents are incredibly intelligent, have isolated workspaces, and don’t require any cloud setup.


Nimit Kalra @ ICML 2025 podał dalej

.@RoundtableHQ_'s Proof of Human uses behavioral biometrics to stop bots and AI spam. With 87% accuracy (vs. Google’s 69% and Cloudflare’s 33%), it gives you frictionless, real-time authentication with a one-line API. ycombinator.com/launches/OEh-r… Congrats on the launch @_magrawal &…


Nimit Kalra @ ICML 2025 podał dalej

Are we really running out of data??? No. We're just not using it correctly. The solution: let the model learn which data it needs to learn!!! 1/n

joemelko's tweet image. Are we really running out of data??? No. We're just not using it correctly.

The solution: let the model learn which data it needs to learn!!!

1/n

Nimit Kalra @ ICML 2025 podał dalej

GPT-5 is now live in Firebender for Android Studio, free for a limited time 🚀 It’s definitely the best coding model I’ve ever used. Try it today and tell us what you think.


Nimit Kalra @ ICML 2025 podał dalej

Android engineers have access to GPT 5 through Firebender. Enjoy


Flying out to #ICML2025 tonight! Always down to chat about unverifiable domains, evals, red-teaming, safeguards, or just meet cool people. I’ll be a panelist at the Methods and Opportunities at Small Scale workshop, sharing our work on tiny generalist reward models…

qw3rtman's tweet image. Flying out to #ICML2025 tonight! Always down to chat about unverifiable domains, evals, red-teaming, safeguards, or just meet cool people. I’ll be a panelist at the Methods and Opportunities at Small Scale workshop, sharing our work on tiny generalist reward models…

Nimit Kalra @ ICML 2025 podał dalej

The current state of the ecosystem for post-training using GRPO w/ vllm + flash attention is frustratingly brittle. - The most recent vllm only supports PyTorch==2.7.0 - vllm requires xformers, but specifically only v0.0.30 is supported for torch 2.7.0. Any prior version of…


can’t even escape the arxiv speak in the group chat

qw3rtman's tweet image. can’t even escape the arxiv speak in the group chat

Vogent has a fantastic battle-tested inference stack, glad to see they opened it up + already have a finetuning product. From what I've seen, open-source voice models solve the 0 → 1 quite well but require a lot of post-hoc tuning to get right

Today we're launching Vogent Voicelab: an optimized API to run top open-source voice models, like Sesame's CSM-1B, Dia, Orpheus, and more.



chart crime so bad you gotta transcribe the values by hand and plot it yourself


evals evals evals

Mercor (@mercor_ai) is now working with 6 out of the Magnificent 7, all of the top 5 AI labs, and most of the top application layer companies. One trend is common across every customer: we are entering The Era of Evals. RL is becoming so effective that models will be able to…



Nimit Kalra @ ICML 2025 podał dalej

New open-source alert! spoken: a unified abstraction over realtime speech-to-speech foundation models. Run any S2S model from OpenAI, Google, Amazon — one interface with one line of code.


United States Trendy

Loading...

Something went wrong.


Something went wrong.