Tony Chen

@tonychenxyz

CS PhD @princetonCS. Prev: @togethercompute, Undergrad @columbia.

tonychen.xyz

Joined January 2015

137Posts 612Followers 1KFollowing

You might like

@Dingdin45146767

@Im_Otega

@SalasFardey

@CnarEfe7

Tony Chen reposted

Zhuang Liu

@liuzhuang1234

Oct 22

Excited to share our lab’s first open-source release: LLM-Distillation-JAX supports practical knowledge distillation configurations (distillation strength, temperature, top-k/top-p), built on MaxText designed for reproducible JAX/Flax training on both TPUs and GPUs

liuzhuang1234's tweet image. Excited to share our lab’s first open-source release: LLM-Distillation-JAX

supports practical knowledge distillation configurations (distillation strength, temperature, top-k/top-p), built on MaxText

designed for reproducible JAX/Flax training on both TPUs and GPUs

Tony Chen

@tonychenxyz

Oct 1

I always think about the story engine in West World S4 where Dolores speaks out her idea of a character, and the story plays out 3D in real time. We are getting near to that...

Moonlake

@moonlake_ai

Oct 1

We raised $28M seed from Threshold Ventures, AIX Ventures, and NVentures (Nvidia's venture capital arm) —alongside 10+ unicorn founders and top AI researchers— to build reasoning models that generate real-time simulations and games. Models are bottlenecked by practical…

Tony Chen

@tonychenxyz

Sep 17

This is amazing! Very excited to see how would SelfIE work with more modern models!

Chulo Ivan

@koalacrown

Sep 17

Thread 🧵 Just shipped something I'm excited about: a model-agnostic version of the "selfie" neural network interpretation library!

Tony Chen reposted

Chulo Ivan

@koalacrown

Sep 17

Thread 🧵 Just shipped something I'm excited about: a model-agnostic version of the "selfie" neural network interpretation library!

Tony Chen reposted

Together AI

@togethercompute

Jul 2

Announcing DeepSWE 🤖: our fully open-sourced, SOTA software engineering agent trained purely with RL on top of Qwen3-32B. DeepSWE achieves 59% on SWEBench-Verified with test-time scaling (and 42.2% Pass@1), topping the SWEBench leaderboard for open-weight models. Built in…

togethercompute's tweet image. Announcing DeepSWE 🤖: our fully open-sourced, SOTA software engineering agent trained purely with RL on top of Qwen3-32B. DeepSWE achieves 59% on SWEBench-Verified with test-time scaling (and 42.2% Pass@1), topping the SWEBench leaderboard for open-weight models.

Built in…

Tony Chen reposted

David Yin

@DavidYin0609

Jun 27

Check out our new paper “Generative Modeling of Weights: Generalization or Memorization?” — we find that current diffusion-based neural network weight generators often memorize training checkpoints rather than learning a truly generalizable weight distribution!

Zhuang Liu

@liuzhuang1234

Jun 27

Can diffusion models appear to be learning, when they’re actually just memorizing the training data? We show and investigate this phenomenon in the context of neural network weight generation, in our recent paper “Generative Modeling of Weights: Generalization or Memorization?"

liuzhuang1234's tweet image. Can diffusion models appear to be learning, when they’re actually just memorizing the training data?

We show and investigate this phenomenon in the context of neural network weight generation, in our recent paper “Generative Modeling of Weights: Generalization or Memorization?"

Tony Chen reposted

Zhuang Liu

@liuzhuang1234

Jun 27

It's exciting to apply diffusion models to new domains! But it requires careful evaluation, esp. regarding memorization. Our paper highlights this need. Shout-out to @zeng_boya for leading this work. paper: arxiv.org/abs/2506.07998 project page: boyazeng.github.io/weight_memoriz… video:…

Tony Chen reposted

Alex L Zhang

@a1zhang

May 28

Can GPT, Claude, and Gemini play video games like Zelda, Civ, and Doom II? 𝗩𝗶𝗱𝗲𝗼𝗚𝗮𝗺𝗲𝗕𝗲𝗻𝗰𝗵 evaluates VLMs on Game Boy & MS-DOS games given only raw screen input, just like how a human would play. The best model (Gemini) completes just 0.48% of the benchmark! 🧵👇

Tony Chen

@tonychenxyz

May 13

Conference reviewing should be in the form of annotating on paper like google doc comments. So you know reviewers actually read the paper and less of over-general AI generated comments. And it’s smoother experience writing reviews too without jumping between paper and writing.