stochasticdoggo's profile picture. AI researcher | PhD from @NYUDataScience | Bulgarian yogurt, prime numbers, and dogs bring me joy | she/her

Katrina Drozdov (Evtimova)

@stochasticdoggo

AI researcher | PhD from @NYUDataScience | Bulgarian yogurt, prime numbers, and dogs bring me joy | she/her

Katrina Drozdov (Evtimova) reposted

After spending billions of dollars of compute, GPT-5 learned that the most effective use of its token budget is to give itself a little pep talk every time it figures something out. Maybe you should do the same.

What?

ATabarrok's tweet image. What?


Katrina Drozdov (Evtimova) reposted

Tinker provides an abstraction layer that is the right one for post-training R&D -- it's the infrastructure I've always wanted. I'm excited to see what people build with it. "Civilization advances by extending the number of important operations which we can perform without…

Introducing Tinker: a flexible API for fine-tuning language models. Write training loops in Python on your laptop; we'll run them on distributed GPUs. Private beta starts today. We can't wait to see what researchers and developers build with cutting-edge open models!…

thinkymachines's tweet image. Introducing Tinker: a flexible API for fine-tuning language models.

Write training loops in Python on your laptop; we'll run them on distributed GPUs.

Private beta starts today. We can't wait to see what researchers and developers build with cutting-edge open models!…


Katrina Drozdov (Evtimova) reposted

The application for a research fellowship at the Flatiron Institute in the Center for Computational Math is now live! This includes positions for ML and stats. The deadline is Dec 1. Links below with more details.


Finally dipped my toes into RL post-training. I trained a code generation LLM with GRPO using open-r1. Here are my 9 takeaways: kevtimova.github.io/posts/grpo/


Katrina Drozdov (Evtimova) reposted

This simple pytorch trick will cut in half your GPU memory use / double your batch size (for real). Instead of adding losses and then computing backward, it's better to compute the backward on each loss (which frees the computational graph). Results will be exactly identical

gabriberton's tweet image. This simple pytorch trick will cut in half your GPU memory use / double your batch size (for real). Instead of adding losses and then computing backward, it's better to compute the backward on each loss (which frees the computational graph). Results will be exactly identical

Katrina Drozdov (Evtimova) reposted

excited to finally share on arxiv what we've known for a while now: All Embedding Models Learn The Same Thing embeddings from different models are SO similar that we can map between them based on structure alone. without *any* paired data feels like magic, but it's real:🧵

this is sick all i'll say is that these GIFs are proof that the biggest bet of my research career is gonna pay off excited to say more soon



Katrina Drozdov (Evtimova) reposted

it's been more than a decade since KD was proposed, and i've been using it all along .. but why does it work? too many speculations but no simple explanation. @_sungmin_cha and i decided to see if we can come up with the simplest working description of KD in this work. we ended…

kchonyc's tweet image. it's been more than a decade since KD was proposed, and i've been using it all along .. but why does it work? too many speculations but no simple explanation. @_sungmin_cha and i decided to see if we can come up with the simplest working description of KD in this work. 

we ended…
kchonyc's tweet image. it's been more than a decade since KD was proposed, and i've been using it all along .. but why does it work? too many speculations but no simple explanation. @_sungmin_cha and i decided to see if we can come up with the simplest working description of KD in this work. 

we ended…
kchonyc's tweet image. it's been more than a decade since KD was proposed, and i've been using it all along .. but why does it work? too many speculations but no simple explanation. @_sungmin_cha and i decided to see if we can come up with the simplest working description of KD in this work. 

we ended…
kchonyc's tweet image. it's been more than a decade since KD was proposed, and i've been using it all along .. but why does it work? too many speculations but no simple explanation. @_sungmin_cha and i decided to see if we can come up with the simplest working description of KD in this work. 

we ended…

Katrina Drozdov (Evtimova) reposted

CDS PhD Vlad Sobal (@vlad_is_ai) and Courant PhD Wancong (Kevin) Zhang show that when good data is scarce, planning beats traditional reinforcement learning. With @kchonyc, @timrudner, and @ylecun. nyudatascience.medium.com/when-good-data…


We’re working on researching and designing world models. In the meantime, you definitely need RAG and FreshStack will help.

Existing IR/RAG benchmarks are unrealistic: they’re often derived from easily retrievable topics, rather than grounded in solving real user problems. 🧵Introducing 𝐅𝐫𝐞𝐬𝐡𝐒𝐭𝐚𝐜𝐤, a challenging RAG benchmark on niche, recent topics. Work done during intern @databricks 🧱



Katrina Drozdov (Evtimova) reposted

No labels. No problems. 😎 Check out the new impactful approach (TAO) from the Mosaic Research Team!

The hardest part about finetuning LLMs is that people generally don't have high-quality labeled data. Today, @databricks introduced TAO, a new finetuning method that only needs inputs, no labels necessary. Best of all, it actually beats supervised finetuning on labeled data.

jefrankle's tweet image. The hardest part about finetuning LLMs is that people generally don't have high-quality labeled data. Today, @databricks introduced TAO, a new finetuning method that only needs inputs, no labels necessary. Best of all, it actually beats supervised finetuning on labeled data.


Huge congratulations on the launch! @reflection_ai has an incredible team and an ambitious mission—excited to follow your progress!

Today I’m launching @reflection_ai with my friend and co-founder @real_ioannis. Our team pioneered major advances in RL and LLMs, including AlphaGo and Gemini. At Reflection, we're building superintelligent autonomous systems. Starting with autonomous coding.

MishaLaskin's tweet image. Today I’m launching @reflection_ai with my friend and co-founder @real_ioannis.

Our team pioneered major advances in RL and LLMs, including AlphaGo and Gemini.

At Reflection, we're building superintelligent autonomous systems. Starting with autonomous coding.


Katrina Drozdov (Evtimova) reposted

Today I’m launching @reflection_ai with my friend and co-founder @real_ioannis. Our team pioneered major advances in RL and LLMs, including AlphaGo and Gemini. At Reflection, we're building superintelligent autonomous systems. Starting with autonomous coding.

MishaLaskin's tweet image. Today I’m launching @reflection_ai with my friend and co-founder @real_ioannis.

Our team pioneered major advances in RL and LLMs, including AlphaGo and Gemini.

At Reflection, we're building superintelligent autonomous systems. Starting with autonomous coding.

Katrina Drozdov (Evtimova) reposted

I shared a controversial take the other day at an event and I decided to write it down in a longer format: I’m afraid AI won't give us a "compressed 21st century". The "compressed 21st century" comes from Dario's "Machine of Loving Grace" and if you haven’t read it, you probably…


I asked ChatGPT, Gemini, and Claude for a clever joke. They all gave me the same one. Either AI is merging into a hive mind… or humor has officially been solved mathematically!

stochasticdoggo's tweet image. I asked ChatGPT, Gemini, and Claude for a clever joke. They all gave me the same one. Either AI is merging into a hive mind… or humor has officially been solved mathematically!

Katrina Drozdov (Evtimova) reposted

VideoJAM is our new framework for improved motion generation from @AIatMeta We show that video generators struggle with motion because the training objective favors appearance over dynamics. VideoJAM directly adresses this **without any extra data or scaling** 👇🧵


Katrina Drozdov (Evtimova) reposted

The buzz over DeepSeek this week crystallized, for many people, a few important trends that have been happening in plain sight: (i) China is catching up to the U.S. in generative AI, with implications for the AI supply chain. (ii) Open weight models are commoditizing the…


The principle of least effort, from psychology, describes how we favor efficiency over effort. It aligns with System 1 (fast, intuitive) vs. System 2 (slow, deliberate) reasoning. AI faces a similar challenge: knowing when to rely on heuristics vs. deeper reasoning.


Katrina Drozdov (Evtimova) reposted

The recording of the GAN test of time talk by @dwf is now publicly available: neurips.cc/virtual/2024/t…


Katrina Drozdov (Evtimova) reposted

Got a diffusion model? What if there were a way to: - Get SOTA text-to-image prompt fidelity, with no extra training! - Steer continuous and discrete (e.g. text) diffusions - Beat larger models using less compute - Outperform fine-tuning - And keep your stats friends happy !?

_rk_singhal's tweet image. Got a diffusion model?

What if there were a way to:
- Get SOTA text-to-image prompt fidelity, with no extra training!
- Steer continuous and discrete (e.g. text) diffusions
- Beat larger models using less compute
- Outperform fine-tuning
- And keep your stats friends happy !?
_rk_singhal's tweet image. Got a diffusion model?

What if there were a way to:
- Get SOTA text-to-image prompt fidelity, with no extra training!
- Steer continuous and discrete (e.g. text) diffusions
- Beat larger models using less compute
- Outperform fine-tuning
- And keep your stats friends happy !?

Katrina Drozdov (Evtimova) reposted

So…world model = video model?


Loading...

Something went wrong.


Something went wrong.