Andi Peng

@TheAndiPenguin

Researcher @AnthropicAI | PhD @MIT_CSAIL | formerly @MSFTResearch @Yale @WHOSTP | cats are dope.

andipeng.com

Se unió en Octubre de 2019

198Posts 4KSeguidores 808Siguiendo

Tal vez te guste

@andreea7b

@abhishekunique7

@pathak2206

@shahdhruv_

@pulkitology

@CaoHancheng

@KarlPertsch

@cen_sarah

@ancadianadragan

@turingmusician

@CVPR

@dhadfieldmenell

@brianchristian

@mitchellgordon

@maithra_raghu

Andi Peng

@TheAndiPenguin

29 sept

More to come in the model card, but thrilled to be releasing our safest and most aligned model yet.

Introducing Claude Sonnet 4.5—the best coding model in the world. It's the strongest model for building complex agents. It's the best model at using computers. And it shows substantial gains on tests of reasoning and math.

claudeai's tweet image. Introducing Claude Sonnet 4.5—the best coding model in the world.

It's the strongest model for building complex agents. It's the best model at using computers. And it shows substantial gains on tests of reasoning and math.

Andi Peng

@TheAndiPenguin

10 sept

proud of you, and excited for what comes next!

Eric Zelikman

@ericzelikman

10 sept

This message is bittersweet. When I joined xAI, its impossibly ambitious mission drew me in. I also joined because of trust in Tony, a close mentor and friend. I knew it was where I could do and grow most. In retrospect, this was right: every year at xAI was incomparable to a…

ericzelikman's tweet image. This message is bittersweet. When I joined xAI, its impossibly ambitious mission drew me in. I also joined because of trust in Tony, a close mentor and friend. I knew it was where I could do and grow most. In retrospect, this was right: every year at xAI was incomparable to a…

Andi Peng

@TheAndiPenguin

22 may

One thing I'm especially excited about with these new models is how far we've driven down reward hacking - ensuring the best coding models in the world continue to execute meaningfully - WITHOUT cheating

Sara Price

@sprice354_

22 may

We created a number of new evals to assess reward hacking propensity in our models (see details later in 🧵) . On average, across these evals, Claude Opus 4 demonstrates a 67% decrease in reward hacking and Claude Sonnet 4 a 69% decrease compared to Claude Sonnet 3.7.

sprice354_'s tweet image. We created a number of new evals to assess reward hacking propensity in our models (see details later in 🧵) . On average, across these evals, Claude Opus 4 demonstrates a 67% decrease in reward hacking and Claude Sonnet 4 a 69% decrease compared to Claude Sonnet 3.7.

Andi Peng

@TheAndiPenguin

22 may

Proud of what we cooked but even prouder of incremented integer naming 🥹

Anthropic

@AnthropicAI

22 may

Introducing the next generation: Claude Opus 4 and Claude Sonnet 4. Claude Opus 4 is our most powerful model yet, and the world’s best coding model. Claude Sonnet 4 is a significant upgrade from its predecessor, delivering superior coding and reasoning.

AnthropicAI's tweet image. Introducing the next generation: Claude Opus 4 and Claude Sonnet 4.

Claude Opus 4 is our most powerful model yet, and the world’s best coding model.

Claude Sonnet 4 is a significant upgrade from its predecessor, delivering superior coding and reasoning.

Andi Peng

@TheAndiPenguin

3 mar

Come to our workshop!! agentic safety is important!!

Andrea Bajcsy

@andrea_bajcsy

3 mar

📢 Announcing the first @ieee_ras_icra workshop on Safely Leveraging VLMs in Robotics! #ICRA2025 🎯 How can we safely leverage vision-language foundation models to expand robot deployment? 📅 Short papers & failure demos due 04/11/23 🌐 tinyurl.com/safe-vlm 🧵(1/5)

Andi Peng

@TheAndiPenguin

25 feb

no math, just pika pika

Anthropic

@AnthropicAI

25 feb

A few researchers at Anthropic have, over the past year, had a part-time obsession with a peculiar problem. Can Claude play Pokémon? A thread:

Andi Peng reposteó

Anthropic

@AnthropicAI

24 feb

Introducing Claude 3.7 Sonnet: our most intelligent model to date. It's a hybrid reasoning model, producing near-instant responses or extended, step-by-step thinking. One model, two ways to think. We’re also releasing an agentic coding tool: Claude Code.

Andi Peng reposteó

Felix Hill

@FelixHill84

7 oct 2024

Do you work in AI? Do you find things uniquely stressful right now, like never before? Haver you ever suffered from a mental illness? Read my personal experience of those challenges here: docs.google.com/document/d/1aE…

Andi Peng

@TheAndiPenguin

2 dic

Interested in making computer use agents safer (and more interpretable)? Consider applying to work with me or one of our other amazing mentors!

Anthropic

@AnthropicAI

2 dic

We’re starting a Fellows program to help engineers and researchers transition into doing frontier AI safety research full-time. Beginning in March 2025, we'll provide funding, compute, and research mentorship to 10–15 Fellows with strong coding and technical backgrounds.

AnthropicAI's tweet image. We’re starting a Fellows program to help engineers and researchers transition into doing frontier AI safety research full-time.

Beginning in March 2025, we'll provide funding, compute, and research mentorship to 10–15 Fellows with strong coding and technical backgrounds.

Andi Peng

@TheAndiPenguin

18 nov

Gautam Kamath

@thegautamkamath

18 nov

With 330 submissions and 21 acceptances (6.4% acceptance rate), I the NeurIPS high school project track may be the new most selective ML venue!

Andi Peng

@TheAndiPenguin

26 oct

Awesome work from @esindurmusnlp and team!

Anthropic

@AnthropicAI

25 oct

New Anthropic research: Evaluating feature steering. In May, we released Golden Gate Claude: an AI fixated on the Golden Gate Bridge due to our use of “feature steering”. We've now done a deeper study on the effects of feature steering. Read the post: anthropic.com/research/evalu…

AnthropicAI's tweet image. New Anthropic research: Evaluating feature steering.

In May, we released Golden Gate Claude: an AI fixated on the Golden Gate Bridge due to our use of “feature steering”. We've now done a deeper study on the effects of feature steering.

Read the post: anthropic.com/research/evalu…

Andi Peng reposteó

Andrew Curran

@AndrewCurran_

24 oct

This morning the White House issued a National Security Memorandum declaring that 'AI is likely to affect almost all domains with national security significance'. Attracting technical talent and building computational power are now official national security priorities.

AndrewCurran_'s tweet image. This morning the White House issued a National Security Memorandum declaring that 'AI is likely to affect almost all domains with national security significance'. Attracting technical talent and building computational power are now official national security priorities.

Andi Peng reposteó

Transluce

@TransluceAI

23 oct

Announcing Transluce, a nonprofit research lab building open source, scalable technology for understanding AI systems and steering them in the public interest. Read a letter from the co-founders Jacob Steinhardt and Sarah Schwettmann: transluce.org/introducing-tr…

Andi Peng

@TheAndiPenguin

22 oct

For me, the overarching goal for AGI has always been to create machines that can execute actions in the world to help humans. Beyond proud to contribute to our release of the first computer use agent today!

Anthropic

@AnthropicAI

22 oct

Introducing an upgraded Claude 3.5 Sonnet, and a new model, Claude 3.5 Haiku. We’re also introducing a new capability in beta: computer use. Developers can now direct Claude to use computers the way people do—by looking at a screen, moving a cursor, clicking, and typing text.

AnthropicAI's tweet image. Introducing an upgraded Claude 3.5 Sonnet, and a new model, Claude 3.5 Haiku. We’re also introducing a new capability in beta: computer use.

Developers can now direct Claude to use computers the way people do—by looking at a screen, moving a cursor, clicking, and typing text.

Andi Peng

@TheAndiPenguin

8 oct 2024

In more physics news today: we present a method to adaptively allocate more compute to "harder" problems, resulting in a reduction of up to 50% in compute at no cost to performance on math and coding tasks!

Mehul Damani

@MehulDamani2

8 oct 2024

Inference-time compute can boost LM performance, but it's costly! How can we optimally allocate it across prompts? In our latest work, we introduce a simple method to adaptively allocate more compute to harder problems. 🔥 Paper: arxiv.org/abs/2410.04707 Learn more! 1/N

MehulDamani2's tweet image. Inference-time compute can boost LM performance, but it's costly!
How can we optimally allocate it across prompts?
In our latest work, we introduce a simple method to adaptively allocate more compute to harder problems. 🔥

Paper: arxiv.org/abs/2410.04707

Learn more! 1/N

Andi Peng

@TheAndiPenguin

8 oct 2024

Lovely waking up and discovering that I did a physics PhD all along

The Nobel Prize

@NobelPrize

8 oct 2024

BREAKING NEWS The Royal Swedish Academy of Sciences has decided to award the 2024 #NobelPrize in Physics to John J. Hopfield and Geoffrey E. Hinton “for foundational discoveries and inventions that enable machine learning with artificial neural networks.”