Jaehoon Lee

@hoonkp

Researcher in machine learning with background in physics; Member of Technical Staff @AnthropicAI; Prev. Research scientist @GoogleDeepMind/@GoogleBrain.

San Francisco Bay Area, CA

jaehlee.github.io

انضم في نوفمبر 2009

244المنشورات 1Kالمتابعون 666المتابَعون

قد يعجبك

@jaschasd

@TheGregYang

@tengyuma

@ShamKakade6

@ofirnachum

@sschoenholz

@poolio

@tom_rainforth

@Locchiu

@RickyTQChen

@Luke_Metz

@Yuhu_ai_

@liyzhen2

@SingularMattrix

@vdbergrianne

Jaehoon Lee

@hoonkp

٢٢ مايوم

Claude 4 models are here 🎉 From research to engineering, safety to product - this launch showcases what's possible when the entire Anthropic team comes together. Honored to be part of this journey! Claude has been transforming my daily workflow, hope it does the same for you!

Anthropic

@AnthropicAI

٢٢ مايوم

Introducing the next generation: Claude Opus 4 and Claude Sonnet 4. Claude Opus 4 is our most powerful model yet, and the world’s best coding model. Claude Sonnet 4 is a significant upgrade from its predecessor, delivering superior coding and reasoning.

AnthropicAI's tweet image. Introducing the next generation: Claude Opus 4 and Claude Sonnet 4.

Claude Opus 4 is our most powerful model yet, and the world’s best coding model.

Claude Sonnet 4 is a significant upgrade from its predecessor, delivering superior coding and reasoning.

Jaehoon Lee أعاد

Behnam Neyshabur

@bneyshabur

١٦ مايوم

@ethansdyer and I have started a new team at @AnthropicAI — and we’re hiring! Our team is organized around the north star goal of building an AI scientist: a system capable of solving the long-term reasoning challenges and core capabilities needed to push the scientific…

Jaehoon Lee أعاد

Kelvin Xu

@imkelvinxu

٢٤ أبريلم

woot woot 🎉

Kelvin Xu

@imkelvinxu

١٨ أبريلم

Going to be at ICLR next week in Singapore! Excited to see everyone there! If you are interested in test-time scaling, come check out some of our recent work with @sea_snell, @hoonkp, @aviral_kumar2 during the Oral Session 1A! iclr.cc/virtual/2025/o…

Jaehoon Lee

@hoonkp

١٠ يوليو ٢٠٢٤ م

Tour de force led by @_katieeverett investigating the interplay between neural network parameterization and optimizers; the thread/paper includes lot of gems (theory insight, extensive empirics, and cool new tricks)!

هذه التغريدة لم تعد متوفرة.

Jaehoon Lee أعاد

Peter J. Liu

@peterjliu

٢٨ يونيو ٢٠٢٤ م

It was a pleasure working on Gemma 2. The team is relatively small but very capable. Glad to see it get released. On the origin of techniques: 'like Grok', 'like Mistral', etc. is a weird way to describe them as they all originated at Google Brain/DeepMind and the way they ended…

Daniel Han

@danielhanchen

٢٧ يونيو ٢٠٢٤ م

Just analyzed Google's new Gemma 2 release! The base and instruct for 9B & 27B is here! 1. Pre & Post Layernorms = x2 more LNs like Grok 2. Uses Grok softcapping! Attn logits truncated to (-30, 30) & logits (-50, 50) 3. Alternating 4096 sliding window like Mistral & 8192 global…

danielhanchen's tweet image. Just analyzed Google's new Gemma 2 release! The base and instruct for 9B &amp; 27B is here!

1. Pre &amp; Post Layernorms = x2 more LNs like Grok
2. Uses Grok softcapping! Attn logits truncated to (-30, 30) &amp; logits (-50, 50)
3. Alternating 4096 sliding window like Mistral &amp; 8192 global…

Jaehoon Lee أعاد

Peter J. Liu

@peterjliu

٥ يونيو ٢٠٢٤ م

We recently open-sourced a relatively minimal implementation example of Transformer language model training in JAX, called NanoDO. If you stick to vanilla JAX components, the code is relatively straightforward to read -- the model file is <150 lines. We found it useful as a…

Jaehoon Lee أعاد

Noah Constant

@noahconst

٥ أبريل ٢٠٢٤ م

Ever wonder why we don’t train LLMs over highly compressed text? Turns out it’s hard to make it work. Check out our paper for some progress that we’re hoping others can build on. arxiv.org/abs/2404.03626 With @blester125, @hoonkp, @alemi, Jeffrey Pennington, @ada_rob, @jaschasd

Jaehoon Lee أعاد

Brian Lester

@blester125

٥ أبريل ٢٠٢٤ م

Is Kevin onto something? We found that LLMs can struggle to understand compressed text, unless you do some specific tricks. Check out arxiv.org/abs/2404.03626 and help @hoonkp, @alemi, Jeffrey Pennington, @ada_rob, @jaschasd, @noahconst and I make Kevin’s dream a reality.

Jaehoon Lee

@hoonkp

١٨ يناير ٢٠٢٤ م

Amazing progress made by @thtrieu_ , @Yuhu_ai_! Congratulations!!

trieu

@thtrieu_

١٧ يناير ٢٠٢٤ م

Proud of this work. Here's my 22min video explanation of the paper: youtube.com/watch?v=TuZhU1…

youtube.com

YouTube

AlphaGeometry

المصدر: youtube.com

Jaehoon Lee

@hoonkp

٨ نوفمبر ٢٠٢٣ م

This is an awesome opportunity to work with strong collaborators on an impactful science problem! Highly recommended!

Ekin Dogus Cubuk

@ekindogus

٨ نوفمبر ٢٠٢٣ م

Materials Discovery team at Google DeepMind is hiring. If interested, please apply via the link below: boards.greenhouse.io/deepmind/jobs/…

Jaehoon Lee

@hoonkp

٢٨ سبتمبر ٢٠٢٣ م

Analyzing training instabilities in Transformers made more accessible by awesome work by @Mitchnw during his internship at @GoogleDeepMind! We encourage you to think more on understanding the fundamental cause and effect of training instabilities as the models scale up!

Mitchell Wortsman

@Mitchnw

٢٨ سبتمبر ٢٠٢٣ م

Sharing some highlights from our work on small-scale proxies for large-scale Transformer training instabilities: arxiv.org/abs/2309.14322 With fantastic collaborators @peterjliu, @Locchiu, @_katieeverett, many others (see final tweet!), @hoonkp, @jmgilmer, @skornblith! (1/15)

Mitchnw's tweet image. Sharing some highlights from our work on small-scale proxies for large-scale Transformer training instabilities: arxiv.org/abs/2309.14322

With fantastic collaborators @peterjliu, @Locchiu, @_katieeverett, many others (see final tweet!), @hoonkp, @jmgilmer, @skornblith!

(1/15)

Jaehoon Lee

@hoonkp

٨ ديسمبر ٢٠٢٢ م

This is amazing opportunity to work on impactful problems in Large Language Models with cool people! Highly recommended!

Behnam Neyshabur

@bneyshabur

٧ ديسمبر ٢٠٢٢ م

Interested in Reasoning with Large Language Models? We are hiring! Internship: forms.gle/fZzFhsy5yVH6R9… Full-Time Research Scientist: forms.gle/9NB5LaCHjQgXR1… Full-Time Research Engineer: forms.gle/rCRnh5Q1nWmoAK… Learn more about Blueshift Team: research.google/teams/blueshif…

Jaehoon Lee أعاد

Zi Wang, Ph.D.

@ziwphd

٢ ديسمبر ٢٠٢٢ م

Jasper @latentjasper talking about the ongoing journey towards BIG Gaussian processes! A team effort with @hoonkp, Ben Adlam, @shreyaspadhy and @zacharynado. Join us at NeurIPS GP workshop neurips.cc/virtual/2022/w…

ziwphd's tweet image. Jasper @latentjasper talking about the ongoing journey towards BIG Gaussian processes! A team effort with @hoonkp, Ben Adlam, @shreyaspadhy and @zacharynado. Join us at NeurIPS GP workshop neurips.cc/virtual/2022/w…

Jaehoon Lee

@hoonkp

٢٩ نوفمبر ٢٠٢٢ م

Today at 11am CT, Hall J #806 we are presenting our paper on infinite width neural network kernels! We have methods to compute NTK/NNGP for extended set of activations + sketched embeddings for efficient approximation (100x) for compute intensive conv kernels! See you there!

Insu Han

@insu_han

١٤ سبتمبر ٢٠٢٢ م

Most infinitely wide NTK and NNGP kernels are based on the ReLU activation. In arxiv.org/pdf/2209.04121…, we propose a method of computing neural kernels with *general* activations. For homogeneous activations, we approximate the kernel matrices by linear-time sketching algorithms.

Jaehoon Lee أعاد

James Harrison

@jmes_harrison

١٨ نوفمبر ٢٠٢٢ م

Tired of tuning your neural network optimizer? Wish there was an optimizer that just worked? We’re excited to release VeLO 🚲, the first hyperparameter-free learned optimizer that outperforms hand-designed optimizers on real-world problems: velo-code.github.io 🧵

jmes_harrison's tweet image. Tired of tuning your neural network optimizer? Wish there was an optimizer that just worked? We’re excited to release VeLO 🚲, the first hyperparameter-free learned optimizer that outperforms hand-designed optimizers on real-world problems: velo-code.github.io
🧵

Jaehoon Lee

@hoonkp

١ نوفمبر ٢٠٢٢ م

Very interesting paper by @jamiesully2, @danintheory and Alex Maloney investigating theoretical origin of neural scaling laws! Happy to read the 97p paper and learn about new tools in RMT and insights of how statistics of natural datasets are translated into power-law scaling.

Dan Roberts

@danintheory

١ نوفمبر ٢٠٢٢ م

New work on the origin of @OpenAI's neural scaling laws w/ Alex Maloney and @jamiesully2: we solve a simplified model of scaling laws to gain insight into how scaling behavior arises and to probe its behavior in regimes where scaling laws break down. arxiv.org/abs/2210.16859 1/

Jaehoon Lee أعاد

Sam Altman

@sama

٢٩ سبتمبر ٢٠٢٢ م

the deadline for applying to the OpenAI residency is tomorrow. if you are an engineer or researcher from any field who wants to start working on AI, please consider applying. many of our best people have come from this program! boards.greenhouse.io/openai/jobs/46… boards.greenhouse.io/openai/jobs/46…

Jaehoon Lee أعاد

Lilian Weng

@lilianweng

١٢ سبتمبر ٢٠٢٢ م

🧮 I finally spent some time learning what exactly Neural Tangent Kernel (NTK) is and went through some mathematical proof. Hopefully after reading this, you will not feel all the math behind NTK is that scaring, but rather, quite intuitive. lilianweng.github.io/posts/2022-09-…

Jaehoon Lee أعاد

Ethan Dyer

@ethansdyer

١ يوليو ٢٠٢٢ م

1/ Super excited to introduce #Minerva 🦉(goo.gle/3yGpTN7). Minerva was trained on math and science found on the web and can solve many multi-step quantitative reasoning problems.

ethansdyer's tweet image. 1/ Super excited to introduce #Minerva 🦉(goo.gle/3yGpTN7). Minerva was trained on math and science found on the web and can solve many multi-step quantitative reasoning problems.

alewkowycz

@alewkowycz

٣٠ يونيو ٢٠٢٢ م

Very excited to present Minerva🦉: a language model capable of solving mathematical questions using step-by-step natural language reasoning. Combining scale, data and others dramatically improves performance on the STEM benchmarks MATH and MMLU-STEM. goo.gle/3yGpTN7

alewkowycz's tweet image. Very excited to present Minerva🦉: a language model capable of solving mathematical questions using step-by-step natural language reasoning.
Combining scale, data and others dramatically improves performance on the STEM benchmarks MATH and MMLU-STEM. goo.gle/3yGpTN7

Behnam Neyshabur

@bneyshabur

Rosanne Liu

@savvyRL

Eric Jang

@ericjang11

Jascha Sohl-Dickstein

@jaschasd

Kyle Cranmer

@KyleCranmer

Natasha Jaques

@natashajaques

Jeff Dean

@JeffDean

Greg Yang

@TheGregYang

Dimitris Papailiopoulos

@DimitrisPapail

Luke Metz

@Luke_Metz

Jeremy Cohen

@deepcohen

Ben Poole

@poolio

Jason Lee

@jasondeanlee

William Fedus

@LiamFedus

Jeremy Howard

@jeremyphoward

$sarahookr's profile picture. Adaptive Intelligence. Built @Cohere_Labs, @GoogleBrain, @GoogleDeepmind. ML Efficiency, Multimodal\lingual. Changing spaces where breakthroughs happen.$