hoonkp's profile picture. Researcher in machine learning with background in physics; Member of Technical Staff @AnthropicAI;  Prev. Research scientist @GoogleDeepMind/@GoogleBrain.

Jaehoon Lee

@hoonkp

Researcher in machine learning with background in physics; Member of Technical Staff @AnthropicAI; Prev. Research scientist @GoogleDeepMind/@GoogleBrain.

Claude 4 models are here 🎉 From research to engineering, safety to product - this launch showcases what's possible when the entire Anthropic team comes together. Honored to be part of this journey! Claude has been transforming my daily workflow, hope it does the same for you!

Introducing the next generation: Claude Opus 4 and Claude Sonnet 4. Claude Opus 4 is our most powerful model yet, and the world’s best coding model. Claude Sonnet 4 is a significant upgrade from its predecessor, delivering superior coding and reasoning.

AnthropicAI's tweet image. Introducing the next generation: Claude Opus 4 and Claude Sonnet 4.

Claude Opus 4 is our most powerful model yet, and the world’s best coding model.

Claude Sonnet 4 is a significant upgrade from its predecessor, delivering superior coding and reasoning.


Jaehoon Lee أعاد

@ethansdyer and I have started a new team at @AnthropicAI — and we’re hiring! Our team is organized around the north star goal of building an AI scientist: a system capable of solving the long-term reasoning challenges and core capabilities needed to push the scientific…


Jaehoon Lee أعاد

woot woot 🎉

imkelvinxu's tweet image. woot woot 🎉

Going to be at ICLR next week in Singapore! Excited to see everyone there! If you are interested in test-time scaling, come check out some of our recent work with @sea_snell, @hoonkp, @aviral_kumar2 during the Oral Session 1A! iclr.cc/virtual/2025/o…



Tour de force led by @_katieeverett investigating the interplay between neural network parameterization and optimizers; the thread/paper includes lot of gems (theory insight, extensive empirics, and cool new tricks)!

هذه التغريدة لم تعد متوفرة.

Jaehoon Lee أعاد

It was a pleasure working on Gemma 2. The team is relatively small but very capable. Glad to see it get released. On the origin of techniques: 'like Grok', 'like Mistral', etc. is a weird way to describe them as they all originated at Google Brain/DeepMind and the way they ended…

Just analyzed Google's new Gemma 2 release! The base and instruct for 9B & 27B is here! 1. Pre & Post Layernorms = x2 more LNs like Grok 2. Uses Grok softcapping! Attn logits truncated to (-30, 30) & logits (-50, 50) 3. Alternating 4096 sliding window like Mistral & 8192 global…

danielhanchen's tweet image. Just analyzed Google's new Gemma 2 release! The base and instruct for 9B & 27B is here!

1. Pre & Post Layernorms = x2 more LNs like Grok
2. Uses Grok softcapping! Attn logits truncated to (-30, 30) & logits (-50, 50)
3. Alternating 4096 sliding window like Mistral & 8192 global…


Jaehoon Lee أعاد

We recently open-sourced a relatively minimal implementation example of Transformer language model training in JAX, called NanoDO. If you stick to vanilla JAX components, the code is relatively straightforward to read -- the model file is <150 lines. We found it useful as a…


Jaehoon Lee أعاد

Ever wonder why we don’t train LLMs over highly compressed text? Turns out it’s hard to make it work. Check out our paper for some progress that we’re hoping others can build on. arxiv.org/abs/2404.03626 With @blester125, @hoonkp, @alemi, Jeffrey Pennington, @ada_rob, @jaschasd


Jaehoon Lee أعاد

Is Kevin onto something? We found that LLMs can struggle to understand compressed text, unless you do some specific tricks. Check out arxiv.org/abs/2404.03626 and help @hoonkp, @alemi, Jeffrey Pennington, @ada_rob, @jaschasd, @noahconst and I make Kevin’s dream a reality.


Amazing progress made by @thtrieu_ , @Yuhu_ai_! Congratulations!!

Proud of this work. Here's my 22min video explanation of the paper: youtube.com/watch?v=TuZhU1…

thtrieu_'s tweet card. AlphaGeometry

youtube.com

YouTube

AlphaGeometry



This is an awesome opportunity to work with strong collaborators on an impactful science problem! Highly recommended!

Materials Discovery team at Google DeepMind is hiring. If interested, please apply via the link below: boards.greenhouse.io/deepmind/jobs/…



Analyzing training instabilities in Transformers made more accessible by awesome work by @Mitchnw during his internship at @GoogleDeepMind! We encourage you to think more on understanding the fundamental cause and effect of training instabilities as the models scale up!

Sharing some highlights from our work on small-scale proxies for large-scale Transformer training instabilities: arxiv.org/abs/2309.14322 With fantastic collaborators @peterjliu, @Locchiu, @_katieeverett, many others (see final tweet!), @hoonkp, @jmgilmer, @skornblith! (1/15)

Mitchnw's tweet image. Sharing some highlights from our work on small-scale proxies for large-scale Transformer training instabilities: arxiv.org/abs/2309.14322

With fantastic collaborators @peterjliu, @Locchiu, @_katieeverett, many others (see final tweet!), @hoonkp, @jmgilmer, @skornblith!

(1/15)


This is amazing opportunity to work on impactful problems in Large Language Models with cool people! Highly recommended!

Interested in Reasoning with Large Language Models? We are hiring! Internship: forms.gle/fZzFhsy5yVH6R9… Full-Time Research Scientist: forms.gle/9NB5LaCHjQgXR1… Full-Time Research Engineer: forms.gle/rCRnh5Q1nWmoAK… Learn more about Blueshift Team: research.google/teams/blueshif…



Jaehoon Lee أعاد

Jasper @latentjasper talking about the ongoing journey towards BIG Gaussian processes! A team effort with @hoonkp, Ben Adlam, @shreyaspadhy and @zacharynado. Join us at NeurIPS GP workshop neurips.cc/virtual/2022/w…

ziwphd's tweet image. Jasper @latentjasper talking about the ongoing journey towards BIG Gaussian processes! A team effort with @hoonkp, Ben Adlam, @shreyaspadhy and @zacharynado. Join us at NeurIPS GP workshop neurips.cc/virtual/2022/w…

Today at 11am CT, Hall J #806 we are presenting our paper on infinite width neural network kernels! We have methods to compute NTK/NNGP for extended set of activations + sketched embeddings for efficient approximation (100x) for compute intensive conv kernels! See you there!

Most infinitely wide NTK and NNGP kernels are based on the ReLU activation. In arxiv.org/pdf/2209.04121…, we propose a method of computing neural kernels with *general* activations. For homogeneous activations, we approximate the kernel matrices by linear-time sketching algorithms.



Jaehoon Lee أعاد

Tired of tuning your neural network optimizer? Wish there was an optimizer that just worked? We’re excited to release VeLO 🚲, the first hyperparameter-free learned optimizer that outperforms hand-designed optimizers on real-world problems: velo-code.github.io 🧵

jmes_harrison's tweet image. Tired of tuning your neural network optimizer? Wish there was an optimizer that just worked? We’re excited to release VeLO 🚲, the first hyperparameter-free learned optimizer that outperforms hand-designed optimizers on real-world problems: velo-code.github.io 
🧵

Very interesting paper by @jamiesully2, @danintheory and Alex Maloney investigating theoretical origin of neural scaling laws! Happy to read the 97p paper and learn about new tools in RMT and insights of how statistics of natural datasets are translated into power-law scaling.

New work on the origin of @OpenAI's neural scaling laws w/ Alex Maloney and @jamiesully2: we solve a simplified model of scaling laws to gain insight into how scaling behavior arises and to probe its behavior in regimes where scaling laws break down. arxiv.org/abs/2210.16859 1/



Jaehoon Lee أعاد

the deadline for applying to the OpenAI residency is tomorrow. if you are an engineer or researcher from any field who wants to start working on AI, please consider applying. many of our best people have come from this program! boards.greenhouse.io/openai/jobs/46… boards.greenhouse.io/openai/jobs/46…


Jaehoon Lee أعاد

🧮 I finally spent some time learning what exactly Neural Tangent Kernel (NTK) is and went through some mathematical proof. Hopefully after reading this, you will not feel all the math behind NTK is that scaring, but rather, quite intuitive. lilianweng.github.io/posts/2022-09-…


Jaehoon Lee أعاد

1/ Super excited to introduce #Minerva 🦉(goo.gle/3yGpTN7). Minerva was trained on math and science found on the web and can solve many multi-step quantitative reasoning problems.

ethansdyer's tweet image. 1/ Super excited to introduce #Minerva 🦉(goo.gle/3yGpTN7). Minerva was trained on math and science found on the web and can solve many multi-step quantitative reasoning problems.

Very excited to present Minerva🦉: a language model capable of solving mathematical questions using step-by-step natural language reasoning. Combining scale, data and others dramatically improves performance on the STEM benchmarks MATH and MMLU-STEM. goo.gle/3yGpTN7

alewkowycz's tweet image. Very excited to present Minerva🦉: a language model capable of solving mathematical questions using step-by-step natural language reasoning.
Combining scale, data and others dramatically improves performance on the STEM benchmarks MATH and MMLU-STEM. goo.gle/3yGpTN7


Loading...

Something went wrong.


Something went wrong.