Shashank Gupta

@shashank_bits

Researcher at Ai2 || Work on NLP, LLMs, Reasoning, Agents, AI4Code || Prev: Microsoft AI, Univ. of Illinois (UIUC), Max Planck, IIT-Bombay || @shashanknlp 🟦sky

Seattle, WA

shashankgupta.info

เข้าร่วมเมื่อ ธันวาคม 2010

606โพสต์ 472ผู้ติดตาม 1Kกําลังติดตาม

คุณอาจชื่นชอบ

@zhaofeng_wu

@gu_yuling

@DanielKhashabi

@DanRothNLP

@anmarasovic

@wellecks

@JonathanBerant

@scottyih

@mandarjoshi_

@universeinanegg

@soshsihao

@VictoriaLinML

@harsh3vedi

@valentina__py

@b_niranjan

Shashank Gupta รีโพสต์แล้ว

Deedy

@deedydas

27 ก.ย.

🚨 DeepMind finally dropped the Veo3 paper which shows what we all realize from playing with video-gen models. Just like LLMs, visual reasoning on is an emergent property of training on tons of video. It can solve tasks not explicitly in training data. "Veo 3 is the GPT-3…

deedydas's tweet image. 🚨 DeepMind finally dropped the Veo3 paper which shows what we all realize from playing with video-gen models.

Just like LLMs, visual reasoning on is an emergent property of training on tons of video. It can solve tasks not explicitly in training data.

"Veo 3 is the GPT-3…

Shashank Gupta รีโพสต์แล้ว

anandmaj

@Almondgodd

25 ก.ย.

I spent the past month reimplementing DeepMind’s Genie 3 world model from scratch Ended up making TinyWorlds, a 3M parameter world model capable of generating playable game environments demo below + everything I learned in thread (full repo at the end)👇🏼

Shashank Gupta รีโพสต์แล้ว

Ai2

@allen_ai

26 ส.ค.

As part of Asta, our initiative to accelerate science with trustworthy AI agents, we built AstaBench—the first comprehensive benchmark to compare them. ⚖️

allen_ai's tweet image. As part of Asta, our initiative to accelerate science with trustworthy AI agents, we built AstaBench—the first comprehensive benchmark to compare them. ⚖️

Shashank Gupta รีโพสต์แล้ว

Ai2

@allen_ai

26 ส.ค.

Introducing Asta—our bold initiative to accelerate science with trustworthy, capable agents, benchmarks, & developer resources that bring clarity to the landscape of scientific AI + agents. 🧵

Shashank Gupta รีโพสต์แล้ว

Vaish Shrivastava

@VaishShrivas

14 ส.ค.

Test-time scaling w/ GRPO boosts accuracy, but also adds “filler tokens” increasing length w/o real progress. We present Group Filtered Policy Optimization (GFPO):🧵 1️⃣ Sample more per prompt 2️⃣ Rank by token efficiency (reward ÷ length) 3️⃣ Train on top-k 4️⃣ 🚀 Cut 80% of…

VaishShrivas's tweet image. Test-time scaling w/ GRPO boosts accuracy, but also adds “filler tokens” increasing length w/o real progress.
We present Group Filtered Policy Optimization (GFPO):🧵
1️⃣ Sample more per prompt
2️⃣ Rank by token efficiency (reward ÷ length)
3️⃣ Train on top-k
4️⃣ 🚀 Cut 80% of…

Shashank Gupta รีโพสต์แล้ว

Ai2

@allen_ai

14 ส.ค.

With fresh support of $75M from @NSF and $77M from @NVIDIA, we’re set to scale our open model ecosystem, bolster the infrastructure behind it, and fast‑track reproducible AI research to unlock the next wave of scientific discovery. 💡

allen_ai's tweet image. With fresh support of $75M from @NSF and $77M from @NVIDIA, we’re set to scale our open model ecosystem, bolster the infrastructure behind it, and fast‑track reproducible AI research to unlock the next wave of scientific discovery. 💡

Shashank Gupta รีโพสต์แล้ว

Dimitris Papailiopoulos

@DimitrisPapail

13 ส.ค.

Thinking Less at test-time requires Sampling More at training-time! GFPO is a new, cool, and simple Policy Opt algorithm is coming to your RL Gym tonite, led by @VaishShrivas and our MSR group: Group Filtered PO (GFPO) trades off training-time with test-time compute, in order…

DimitrisPapail's tweet image. Thinking Less at test-time requires Sampling More at training-time!

GFPO is a new, cool, and simple Policy Opt algorithm is coming to your RL Gym tonite, led by @VaishShrivas and our MSR group:

Group Filtered PO (GFPO) trades off training-time with test-time compute, in order…

Shashank Gupta รีโพสต์แล้ว

Ai2

@allen_ai

30 มิ.ย.

🚨 We're hiring a #ResearchScientist in #AI for Scientific Discovery at Ai2! Are you passionate about intelligent agents, data-driven discovery, and AI systems that accelerate science? Join us in shaping the future of research. 🧬🧠 Apply now: job-boards.greenhouse.io/thealleninstit…

allen_ai's tweet image. 🚨 We're hiring a #ResearchScientist in #AI for Scientific Discovery at Ai2!

Are you passionate about intelligent agents, data-driven discovery, and AI systems that accelerate science? Join us in shaping the future of research. 🧬🧠

Apply now: job-boards.greenhouse.io/thealleninstit…

Shashank Gupta รีโพสต์แล้ว

Ai2

@allen_ai

1 ก.ค.

Introducing SciArena, a platform for benchmarking models across scientific literature tasks. Inspired by Chatbot Arena, SciArena applies a crowdsourced LLM evaluation approach to the scientific domain. 🧵

allen_ai's tweet image. Introducing SciArena, a platform for benchmarking models across scientific literature tasks. Inspired by Chatbot Arena, SciArena applies a crowdsourced LLM evaluation approach to the scientific domain. 🧵

Shashank Gupta รีโพสต์แล้ว

Ai2

@allen_ai

27 มิ.ย.

Today we’re releasing a prototype of Genesys, an autonomous multi-agent LLM discovery system that aims to discover new types of language model architectures. We found Genesys can discover novel architectures competitive with the industry-standard transformer. 🧵

allen_ai's tweet image. Today we’re releasing a prototype of Genesys, an autonomous multi-agent LLM discovery system that aims to discover new types of language model architectures. We found Genesys can discover novel architectures competitive with the industry-standard transformer. 🧵

Shashank Gupta รีโพสต์แล้ว

Unnat Jain

@unnatjain2010

10 มิ.ย.

✨New edition of our community-building workshop series!✨ Tomorrow at @CVPR, we invite speakers to share their stories, values, and approaches for navigating a crowded and evolving field, especially for early-career researchers. Cheeky title🤭: How to Stand Out in the…

Anand Bhattad

@anand_bhattad

10 มิ.ย.

In this #CVPR2025 edition of our community-building workshop series, we focus on supporting the growth of early-career researchers. Join us tomorrow (Jun 11) at 12:45 PM in Room 209 Schedule: sites.google.com/view/standoutc… We have an exciting lineup of invited talks and candid…

anand_bhattad's tweet image. In this #CVPR2025 edition of our community-building workshop series, we focus on supporting the growth of early-career researchers.

Join us tomorrow (Jun 11) at 12:45 PM in Room 209

Schedule: sites.google.com/view/standoutc…

We have an exciting lineup of invited talks and candid…

Shashank Gupta รีโพสต์แล้ว

Pushmeet Kohli

@pushmeet

14 พ.ค.

Excited to announce AlphaEvolve A powerful AI coding agent developed by our team in @GoogleDeepMind that is able to discover impactful new algorithms for important problems in Maths and Computing by combining the creativity of large language models with automated evaluators.

Shashank Gupta รีโพสต์แล้ว

Greg Brockman

@gdb

12 พ.ค.

We've just released HealthBench — a new eval for AI systems for health. Developed with 262 physicians who have practiced in 60 countries.

OpenAI

@OpenAI

12 พ.ค.

Evaluations are essential to understanding how models perform in health settings. HealthBench is a new evaluation benchmark, developed with input from 250+ physicians from around the world, now available in our GitHub repository. openai.com/index/healthbe…

OpenAI's tweet card. HealthBench is a new evaluation benchmark for AI in healthcare which evaluates models in realistic scenarios. Built with input from 250+ physicians, it aims to provide a shared standard for model...

Introducing HealthBench

แหล่งที่มา: openai.com

Shashank Gupta รีโพสต์แล้ว

Parshin Shojaee

@ParshinShojaee

16 เม.ย.

Scientific discovery with LLMs has so much potential yet is underexplored. Our new benchmark **LLM-SRBench** enable rigorous evaluations of equation discovery with LLMs! 🧠Key takeaway: Even SOTA discovery models with strong LLM backbones still fail to discover mathematical…

ParshinShojaee's tweet image. Scientific discovery with LLMs has so much potential yet is underexplored. Our new benchmark **LLM-SRBench** enable rigorous evaluations of equation discovery with LLMs!

🧠Key takeaway: Even SOTA discovery models with strong LLM backbones still fail to discover mathematical…

Shashank Gupta รีโพสต์แล้ว

Naman Jain

@StringChaos

9 เม.ย.

Excited to release R2E-Gym - 🔥 8.1K executable environments using synthetic data - 🧠 Hybrid verifiers for enhanced inference-time scaling - 📈 51% success-rate on the SWE-Bench Verified - 🤗 Open Source Data + Models + Trajectories 1/

StringChaos's tweet image. Excited to release R2E-Gym
- 🔥 8.1K executable environments using synthetic data
- 🧠 Hybrid verifiers for enhanced inference-time scaling
- 📈 51% success-rate on the SWE-Bench Verified
- 🤗 Open Source Data + Models + Trajectories

1/

Shashank Gupta รีโพสต์แล้ว

Ai2

@allen_ai

13 มี.ค.

Announcing OLMo 2 32B: the first fully open model to beat GPT 3.5 & GPT-4o mini on a suite of popular, multi-skill benchmarks. Comparable to best open-weight models, but a fraction of training compute. When you have a good recipe, ✨ magical things happen when you scale it up!

$allen_ai's tweet image. Announcing OLMo 2 32B: the first fully open model to beat GPT 3.5 &amp; GPT-4o mini on a suite of popular, multi-skill benchmarks. Comparable to best open-weight models, but a fraction of training compute. When you have a good recipe, ✨ magical things happen when you scale it up!$

Shashank Gupta รีโพสต์แล้ว

Ai2

@allen_ai

30 ม.ค.

Here is Tülu 3 405B 🐫 our open-source post-training model that surpasses the performance of DeepSeek-V3! The last member of the Tülu 3 family demonstrates that our recipe, which includes Reinforcement Learning from Verifiable Rewards (RVLR) scales to 405B - with performance on…

allen_ai's tweet image. Here is Tülu 3 405B 🐫 our open-source post-training model that surpasses the performance of DeepSeek-V3! The last member of the Tülu 3 family demonstrates that our recipe, which includes Reinforcement Learning from Verifiable Rewards (RVLR) scales to 405B - with performance on…

Shashank Gupta รีโพสต์แล้ว

Devi Parikh

@deviparikh

23 ม.ค.

Excited to share a sneak peek into what we've been building at Yutori! What you see below is our trained model and internal prototype — multiple agents running in parallel in the background, completing tasks of varying complexity, relevant information and cues to step in being…

Shashank Gupta รีโพสต์แล้ว

Nouha Dziri

@nouhadziri

22 ม.ค.

Interested in knowing more about LLMs agents and in contributing to this topic?🚀 📢We're thrilled to announce REALM: The first Workshop for Research on Agent Language Models 🤖 #ACL2025NLP in Vienna 🎻 We have an exciting lineup of speakers 🗓️ Submit your work by *March 1st*

nouhadziri's tweet image. Interested in knowing more about LLMs agents and in contributing to this topic?🚀

📢We're thrilled to announce REALM: The first Workshop for Research on Agent Language Models 🤖 #ACL2025NLP in Vienna 🎻
We have an exciting lineup of speakers

🗓️ Submit your work by *March 1st*

Shashank Gupta รีโพสต์แล้ว

Ai2

@allen_ai

21 ม.ค.

Can AI really help with literature reviews? 🧐 Meet Ai2 ScholarQA, an experimental solution that allows you to ask questions that require multiple scientific papers to answer. It gives more in-depth, detailed, and contextual answers with table comparisons, expandable sections…

allen_ai's tweet image. Can AI really help with literature reviews? 🧐

Meet Ai2 ScholarQA, an experimental solution that allows you to ask questions that require multiple scientific papers to answer. It gives more in-depth, detailed, and contextual answers with table comparisons, expandable sections…

Luca Soldaini 🎀

@soldni

Swaroop Mishra

@Swarooprm7

Sarah Wiegreffe

@sarahwiegreffe

Andrew Drozdov

@mrdrozdov

Faeze Brahman

@faeze_brh

Yao Fu

@Francis_YAO_

Matthew Finlayson

@mattf1n

Nouha Dziri

@nouhadziri

Weijia Shi @ COLM🐑

@WeijiaShi2

Valentina Pyatkin

@valentina__py

Yizhong Wang

@yizhongwyz

Yanai Elazar

@yanaiela

Harsh Trivedi

@harsh3vedi

Shaily

@shaily99

Eric Zelikman

@ericzelikman

Toby Olmedo

@OlmedoToby32805

George Notwell

@animalfarmchina

NancyLew

@3Am443VXCYzMZZN

ELONMUSKTESLA

@elonneurlink

Tegan Jegede

@jegede_tegan

Nikola Savic

@BrenSton3210

Yqeaxer

@Yqeaxer55607

Rahel Jhirad

@RahelJhirad

wilko volts hogan

@wilkovolts

Wenceslas

@n5809079088339

Florent Pelsy

@PelsyPelsy

4am

@adososerious

Valentina Tardelli

@ValentinaT32922

New. Robato

@NewRobato19707

Oleg Zendel

@OlegZendel

Exergy

@Muizz_999

Shaun

@shaunshib96

Matt Ramage

@ramagetime

Dara

@dara_tourt

WebAgentlab

@webagentlab

Alex Wettig

@_awettig

Angela

@Kira29209779441

Aniruddha Mukherjee

@annimukh

professor_icebear

@Mohamma48320786

EileenWebster

@il4Qh2fLqqJg6Tq

Thijs Bergkamp

@ThijsBergkamp

Tuan Van Vo

@tuanvo08041999

Suren T.

@therealthapa

Anirudh Thatipelli

@AThatipelli

Ghazal Khalighinejad

@ghazalkhn

Raven

@1h25sl93nDqCxJ

Griffin Adams

@GriffinAdams92

Yakmaz

@yYakmazz

Agnes

@Agnes44523

Pratyay Banerjee (নীল) 

@Neilzblaze007

Wanru Zhao

@Renee42581826

Tamoghno Kandar

@tamo_kandar

DeepBrain AI

@DeepBrain_ai

Siddharth Betala

@SiddharthBetal

Abhinav Chinta

@AbhinavChinta10

Luca Weihs

@LucaWeihs

Shiolez

@ShiolezdPZHoHF

Boyu Gou

@BoyuGouNLP

Woody Lee

@writerwoody

Krish Perumal

@krishperumal11

Shayekh Islam

@shayekhbinislam

Kumar

@kumar__nn

Zhili Feng

@zhilifeng

Diksha Shrivastava

@Diksha1713

arion das

@ArionDas

Yann LeCun

@ylecun

(((ل()(ل() 'yoav))))👾

@yoavgo

William Wang

@WilliamWangNLP

AK

@_akhaliq

Ai2

@allen_ai

Jason Wei

@_jasonwei

Yi Tay

@YiTayML

Yoav Artzi

@yoavartzi

Percy Liang

@percyliang

Riley Goodside

@goodside

Sam Altman

@sama

Kyunghyun Cho

@kchonyc

Andrej Karpathy

@karpathy

rishi

@RishiBommasani

François Chollet

@fchollet

Luca Soldaini 🎀

@soldni

Mark Dredze

@mdredze

hardmaru

@hardmaru

Graham Neubig

@gneubig

Pan Lu

@lupantech

Yiling Lou

@yiling__LOU

Periodic Labs

@periodiclabs

Cancer AI Alliance

@CAIAorg

CBP

@CBP

USCIS

@USCIS

Naman Goyal

@NamanGoyal21

Alexander Kirillov

@_alex_kirillov_

Yutori

@yutori_ai

Aditya Grover

@adityagrover_

Inception

@_inception_ai

Scaled Cognition

@ScaledCognition

Alex Wettig

@_awettig

Thinking Machines

@thinkymachines

$sunjiao123sun_'s profile picture. Senior RS @ GoogleDeepMind, improving Gemini Coding \n\n NLP PhD @ USC, Amazon ML Fellow \n\n ex-{Google Brain, Alexa AI} nlper, IIIS Tsinghua-Ren$