Chapy

@ChapyCode

12월 2022에 가입

76게시물 10팔로워 402팔로우 중

Chapy 님이 재게시함

ℏεsam

@Hesamation

. 10. 16.

the ultimate guide to fine-tuning LLMs. this is a free 115-page book on arxiv and covers all the theory you need for fine-tuning: > NLP and LLMs fundamentals > peft, lora, qlora > mixture of experts (MoE) > seven stage fine-tuning pipeline > data prep, eval, best practices

Hesamation's tweet image. the ultimate guide to fine-tuning LLMs. this is a free 115-page book on arxiv and covers all the theory you need for fine-tuning:
&gt; NLP and LLMs fundamentals
&gt; peft, lora, qlora
&gt; mixture of experts (MoE)
&gt; seven stage fine-tuning pipeline
&gt; data prep, eval, best practices

Chapy 님이 재게시함

Yacine Mahdid

@yacinelearning

. 10. 15.

ever wondered what an RLVR environment is? in 27 min I’ll show you: - what they made of - how RLVR differs from RLHF - the performance gain it gives to small models - and a walkthrough of the verifiers specs to define them. by the end you will be able to make your own 👺🦋

yacinelearning's tweet image. ever wondered what an RLVR environment is?

in 27 min I’ll show you:
- what they made of
- how RLVR differs from RLHF
- the performance gain it gives to small models
- and a walkthrough of the verifiers specs to define them.

by the end you will be able to make your own 👺🦋

Chapy 님이 재게시함

Thomas Fel

@Napoolar

. 10. 14.

🕳️🐇Into the Rabbit Hull – Part I (Part II tomorrow) An interpretability deep dive into DINOv2, one of vision’s most important foundation models. And today is Part I, buckle up, we're exploring some of its most charming features.

Chapy 님이 재게시함

Chris Offner

@chrisoffner3d

. 10. 14.

MapAnything's evil sibling also supports flexible inputs but triples down on redundant outputs, estimating point maps, depths, and 3D Gaussians. x.com/chrisoffner3d/…

chrisoffner3d's tweet image. MapAnything's evil sibling also supports flexible inputs but triples down on redundant outputs, estimating point maps, depths, and 3D Gaussians.
x.com/chrisoffner3d/…

Chris Offner

@chrisoffner3d

. 10. 12.

Is the terror reign of redundant scene representations ending? Where VGGT, CUT3R, and other recent models relied on godless redundant outputs (depth+points+pose) without guaranteeing internal prediction consistency, MapAnything and DepthAnything 3 are now heroically pushing back.

chrisoffner3d's tweet image. Is the terror reign of redundant scene representations ending? Where VGGT, CUT3R, and other recent models relied on godless redundant outputs (depth+points+pose) without guaranteeing internal prediction consistency, MapAnything and DepthAnything 3 are now heroically pushing back.

Chapy 님이 재게시함

Akanksha

@akankshanc

. 10. 14.

After building some mathematical foundation for transformers, we are on to tackling the next foundational paper, Toy Models of Superposition at the ML understanding group of @Cohere_Labs. This paper explores how networks pack many features into fewer dimensions, forming…

akankshanc's tweet image. After building some mathematical foundation for transformers, we are on to tackling the next foundational paper, Toy Models of Superposition at the ML understanding group of @Cohere_Labs.

This paper explores how networks pack many features into fewer dimensions, forming…

Chapy 님이 재게시함

九原客

@9hills

. 10. 14.

《CS 224V：Conversational Virtual Assistants with Deep Learning》这门课不错，但是没有上视频比较可惜。核心目标是构造基于LLM的聊天助手，包括 RAG、Agent、实时语音相关的技术点。 web.stanford.edu/class/cs224v/s…

9hills's tweet image. 《CS 224V：Conversational Virtual Assistants with Deep Learning》这门课不错，但是没有上视频比较可惜。

核心目标是构造基于LLM的聊天助手，包括 RAG、Agent、实时语音相关的技术点。

web.stanford.edu/class/cs224v/s…

Chapy 님이 재게시함

Alex Prompter

@alex_prompter

. 10. 13.

Holy shit. MIT just built an AI that can rewrite its own code to get smarter 🤯 It’s called SEAL (Self-Adapting Language Models). Instead of humans fine-tuning it, SEAL reads new info, rewrites it in its own words, and runs gradient updates on itself literally performing…

alex_prompter's tweet image. Holy shit. MIT just built an AI that can rewrite its own code to get smarter 🤯

It’s called SEAL (Self-Adapting Language Models).

Instead of humans fine-tuning it, SEAL reads new info, rewrites it in its own words, and runs gradient updates on itself literally performing…

Chapy 님이 재게시함

Alexis Ross

@alexisjross

. 10. 14.

Can LLMs reason like a student? 👩🏻‍🎓📚✏️ For educational tools like AI tutors, modeling how students make mistakes is crucial. But current LLMs are much worse at simulating student errors ❌ than performing correct ✅ reasoning. We try to fix that with our method MISTAKE 🤭👇

alexisjross's tweet image. Can LLMs reason like a student? 👩🏻‍🎓📚✏️

For educational tools like AI tutors, modeling how students make mistakes is crucial.

But current LLMs are much worse at simulating student errors ❌ than performing correct ✅ reasoning.

We try to fix that with our method MISTAKE 🤭👇

Chapy 님이 재게시함

Kang Liao

@KangLiao929

. 10. 13.

Introducing 𝐓𝐡𝐢𝐧𝐤𝐢𝐧𝐠 𝐰𝐢𝐭𝐡 𝐂𝐚𝐦𝐞𝐫𝐚📸, a unified multimodal model that integrates camera-centric spatial intelligence to interpret and create scenes from arbitrary viewpoints. Project Page: kangliao929.github.io/projects/puffi… Code: github.com/KangLiao929/Pu…

Chapy 님이 재게시함

Vivek Galatage

@vivekgalatage

. 10. 14.

MIT's 6.851: Advanced Data Structures (Spring'21) courses.csail.mit.edu/6.851/spring21/ This has been on my recommendation list for a while, and the Memory hierarchy discussions are great in the context of cache-oblivious algorithms.

Vivek Galatage

@vivekgalatage

. 10. 13.

"Cache‑Oblivious Algorithms and Data Structures" by Erik D. Demaine erikdemaine.org/papers/BRICS20… This is a foundational survey on designing cache‑oblivious algorithms and data structures that perform as well as cache‑aware approaches that require hardcoding cache size (M) and block…

vivekgalatage's tweet image. "Cache‑Oblivious Algorithms and Data Structures" by Erik D. Demaine

erikdemaine.org/papers/BRICS20…

This is a foundational survey on designing cache‑oblivious algorithms and data structures that perform as well as cache‑aware approaches that require hardcoding cache size (M) and block…

Chapy 님이 재게시함

Angela Dai

@angelaqdai

. 10. 13.

📢📢📢We've released the ScanNet++ Novel View Synthesis Benchmark for iPhone data! 🥳 Test your models on RGBD video featuring real-world challenges like exposure changes & motion blur! Download the newest iPhone NVS test split and submit your results! ⬇️ scannetpp.mlsg.cit.tum.de/scannetpp/benc…

Chapy 님이 재게시함

DailyPapers

@HuggingPapers

. 10. 13.

New inference method TAG fights diffusion model hallucinations Introducing Tangential Amplifying Guidance (TAG): a training-free, plug-and-play method for diffusion models that significantly reduces hallucinations and boosts sample quality by steering generation to…

HuggingPapers's tweet image. New inference method TAG fights diffusion model hallucinations

Introducing Tangential Amplifying Guidance (TAG): a training-free, plug-and-play method for diffusion models that significantly reduces hallucinations and boosts sample quality by steering generation to…

Chapy 님이 재게시함

Shraddha Bharuka

@BharukaShraddha

. 10. 13.

🚀 Dear Future 𝗔𝗜 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿, If you want to break into AI in 2025, stop chasing trends and start mastering the fundamentals. I've curated a list of must-read books 📚 that every successful AI Engineer swears by, from Machine Learning to LLMs & MLOps. Ready to level…

BharukaShraddha's tweet image. 🚀 Dear Future 𝗔𝗜 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿,

If you want to break into AI in 2025, stop chasing trends and start mastering the fundamentals.

I've curated a list of must-read books 📚 that every successful AI Engineer swears by, from Machine Learning to LLMs &amp; MLOps.

Ready to level…

Chapy 님이 재게시함

MrNeRF

@janusch_patas

. 10. 14.

CLoD-GS: Continuous Level-of-Detail via 3D Gaussian Splatting Abstract (excerpt): Traditionally, Level of Detail (LoD) is implemented using discrete levels (DLoD), where multiple, distinct versions of a model are swapped out at different distances. This long-standing paradigm,…

janusch_patas's tweet image. CLoD-GS: Continuous Level-of-Detail via 3D Gaussian Splatting

Abstract (excerpt):
Traditionally, Level of Detail (LoD) is implemented using discrete levels (DLoD), where multiple, distinct versions of a model are swapped out at different distances. This long-standing paradigm,…

Chapy 님이 재게시함

JundeWu

@JundeMorsenWu

. 10. 13.

Mamba3 just silently dropped on ICLR🤯 A faster, longer-context, and more scalable LLM architecture than Transformers A few years ago, some researchers started rethinking sequence modeling from a different angle. Instead of stacking more attention layers, they went back to an…

JundeMorsenWu's tweet image. Mamba3 just silently dropped on ICLR🤯

A faster, longer-context, and more scalable LLM architecture than Transformers

A few years ago, some researchers started rethinking sequence modeling from a different angle.
Instead of stacking more attention layers, they went back to an…

Chapy 님이 재게시함

Free Education - AI | Tech | Programming

@DAIEvolutionHub

. 10. 11.

"Introduction to Machine Learning Systems" - FREE from MIT Press - Authored by Harvard Professor - 2048 Pages To Get It Simply: 1. Retweet & Reply "ML" 2. Follow so that I will DM you.

DAIEvolutionHub's tweet image. "Introduction to Machine Learning Systems"

- FREE from MIT Press
- Authored by Harvard Professor
- 2048 Pages

To Get It Simply:

1. Retweet &amp; Reply "ML"
2. Follow so that I will DM you.

Chapy 님이 재게시함

Wenhu Chen

@WenhuChen

. 10. 12.

Our general-reasoner (arxiv.org/abs/2505.14652) came out in March this year and has been accepted by NeurIPS. We are among the first few works to extract QA from pre-training data for RL. No comparison, no citation to our paper at all 😂

WenhuChen's tweet image. Our general-reasoner (arxiv.org/abs/2505.14652) came out in March this year and has been accepted by NeurIPS. We are among the first few works to extract QA from pre-training data for RL.

No comparison, no citation to our paper at all 😂

Zhepeng Cen

@ZhepengCen

. 10. 10.

🚀 Scaling RL to Pretraining Levels with Webscale-RL RL for LLMs has been bottlenecked by tiny datasets (<10B tokens) vs pretraining (>1T). Our Webscale-RL pipeline converts pretraining text into diverse RL-ready QA data — scaling RL to pretraining levels! All codes and…

ZhepengCen's tweet image. 🚀 Scaling RL to Pretraining Levels with Webscale-RL

RL for LLMs has been bottlenecked by tiny datasets (&lt;10B tokens) vs pretraining (&gt;1T).
Our Webscale-RL pipeline converts pretraining text into diverse RL-ready QA data — scaling RL to pretraining levels!

All codes and…

Chapy 님이 재게시함

Rohan Paul

@rohanpaul_ai

. 10. 12.

This survey paper argues Small language models can handle most agent tasks, and big models step in only when needed. This setup cuts cost by 10x to 30x for common tool tasks. Agent work is mostly calling tools and producing structured outputs, not recalling vast facts. So a…

rohanpaul_ai's tweet image. This survey paper argues Small language models can handle most agent tasks, and big models step in only when needed.

This setup cuts cost by 10x to 30x for common tool tasks.

Agent work is mostly calling tools and producing structured outputs, not recalling vast facts.

So a…

Chapy 님이 재게시함

JundeWu

@JundeMorsenWu

. 10. 11.

Segment Anything 3 just silently dropped on ICLR 🤯 The first SAM let you click on an object to segment it. SAM 2 added video and memory. Now SAM 3 says: just describe what you want — “yellow school bus”, “striped cat”, “red apple” — and it will find and segment every instance…

JundeMorsenWu's tweet image. Segment Anything 3 just silently dropped on ICLR 🤯

The first SAM let you click on an object to segment it.
SAM 2 added video and memory.
Now SAM 3 says: just describe what you want — “yellow school bus”, “striped cat”, “red apple” — and it will find and segment every instance…

Chapy 님이 재게시함

Kaiwen Zheng

@zkwthu

. 10. 10.

🚀Try out rCM—the most advanced diffusion distillation! ✅First to scale up sCM/MeanFlow to 10B+ video models ✅Open-sourced FlashAttention-2 JVP kernel & FSDP/CP support ✅High quality & diversity videos in 2~4 steps Paper: arxiv.org/abs/2510.08431 Code: github.com/NVlabs/rcm

zkwthu's tweet image. 🚀Try out rCM—the most advanced diffusion distillation!
✅First to scale up sCM/MeanFlow to 10B+ video models
✅Open-sourced FlashAttention-2 JVP kernel &amp; FSDP/CP support
✅High quality &amp; diversity videos in 2~4 steps
Paper: arxiv.org/abs/2510.08431
Code: github.com/NVlabs/rcm

Danielle

@DaniellsBarrett

Ailene (´▽) ♥️ (deleted at 1.6k)

@SebersO8139

Mohnish

@mohnish9620

Haotong Lin

@HaotongLin

Emmet Peppers.

@EmwetPeppers

Dreamy_Siren 🧜‍♀️

@Secret_qqa

nana🦄

@ds_nana_

Daily Dose ®️

@0DailyDose

A Jabri

@ajabri

ℏεsam

@Hesamation

Tolga Birdal

@tolga_birdal

九原客

@9hills

Sumanth

@Sumanth_077

Zhen Wu

@zhenkirito123

Fan Nie

@FanNie1208

Miri Zilka

@MiriZilka

Rohan Paul

@rohanpaul_ai

Nathan Calvin

@_NathanCalvin

Kaiwen Zheng

@zkwthu

Guowei Xu

@Kevin_GuoweiXu

Neel Nanda

@NeelNanda5

Rodney Brooks

@rodneyabrooks

Andrew Davison

@AjdDavison

Peyman Moghadam

@peymanm

Davide Scaramuzza

@davsca1

Feras Dayoub

@feras_dayoub

Torsten Sattler

@SattlerTorsten

Frank Dellaert

@fdellaert

Animesh Garg

@animesh_garg

Keerthana Gopalakrishnan

@keerthanpg

Luca Carlone

@lucacarlone1

Sourav Garg

@sourav_garg_

Messiah🇦🇷𓃵

@Messiah_SZN

𝐤𝐥𝐚𝐮𝐝𝐢𝐚

@loyaImessi

Karolina/Messi lives in my head/

@Karolina_LM10

Barça Universal

@BarcaUniversal

BarçaTimes

@BarcaTimes

Managing Barça

@ManagingBarca

BeksFCB

@Joshua__Ubeku

All About Argentina 🛎🇦🇷

@AlbicelesteTalk

TalkFCB ©

@talkfcb_

Barça Worldwide

@BarcaWorldwide

$techwith_ram's profile picture. Sr. DS. AI news. Memes. Φ² = Φ + 1 → My models strive for elegance & efficiency. 🥦 https://t.co/k0P7ZvFfde 🍐 https://t.co/dTfpCYfVmM {soon}$

𝗿𝗮𝗺𝗮𝗸𝗿𝘂𝘀𝗵𝗻𝗮— 𝗲/𝗮𝗰𝗰

@techwith_ram

Ziwei Liu

@liuziwei7

Gary Marcus

@GaryMarcus

Erik Bernhardsson

@bernhardsson

rohan anil

@_arohan_

Christoph Molnar 🦋 christophmolnar.bsky.social

@ChristophMolnar

Chris Albon

@chrisalbon

Edward Grefenstette

@egrefen

Talia Ringer 🕊

@TaliaRinger

Tom Goldstein

@tomgoldsteincs

⿻ Andrew Trask

@iamtrask

Pedro Domingos

@pmddomingos

Awni Hannun

@awnihannun

MedARC

@MedARC_AI

Jay Hack

@mathemagic1an

Nathan Benaich

@nathanbenaich

typedfemale

@typedfemale

United States 트렌드

1. Prince Andrew 30.6K posts
2. No Kings 274K posts
3. Duke of York 14.4K posts
4. #BostonBlue N/A
5. Chandler Smith N/A
6. Zelensky 73.2K posts
7. Andrea Bocelli 20.1K posts
8. Strasbourg 26K posts
9. #DoritosF1 N/A
10. zendaya 9,586 posts
11. #SELFIESFOROLIVIA N/A
12. #FursuitFriday 17.3K posts
13. Arc Raiders 7,183 posts
14. trisha paytas 4,157 posts
15. #CashAppFriday N/A
16. Apple TV 12.3K posts
17. TPOT 20 SPOILERS 11.9K posts
18. Trevon Diggs 1,554 posts
19. Louisville 4,441 posts
20. My President 50.8K posts

Something went wrong.

Chapy

@ChapyCode

ℏεsam

Yacine Mahdid

Thomas Fel

Chris Offner

Chris Offner

Akanksha

九原客

Alex Prompter

Alexis Ross

Kang Liao

Vivek Galatage

Vivek Galatage

Angela Dai

DailyPapers

Shraddha Bharuka

MrNeRF

JundeWu

Free Education - AI | Tech | Programming

Wenhu Chen

Zhepeng Cen

Rohan Paul

JundeWu

Kaiwen Zheng

Danielle

Ailene (*´▽*) ♥️ (deleted at 1.6k)

Mohnish

Haotong Lin

Emmet Peppers.

Dreamy_Siren 🧜‍♀️

nana🦄

Daily Dose ®️

A Jabri

ℏεsam

Tolga Birdal

九原客

Sumanth

Zhen Wu

Fan Nie

Miri Zilka

Rohan Paul

Nathan Calvin

Kaiwen Zheng

Guowei Xu

Neel Nanda

Rodney Brooks

Andrew Davison

Peyman Moghadam

Davide Scaramuzza

Feras Dayoub

Torsten Sattler

Frank Dellaert

Animesh Garg

Keerthana Gopalakrishnan

Luca Carlone

Sourav Garg

Messiah🇦🇷𓃵

𝐤𝐥𝐚𝐮𝐝𝐢𝐚

Karolina/Messi lives in my head/

Barça Universal

BarçaTimes

Managing Barça

BeksFCB

All About Argentina 🛎🇦🇷

TalkFCB ©

Barça Worldwide

𝗿𝗮𝗺𝗮𝗸𝗿𝘂𝘀𝗵𝗻𝗮— 𝗲/𝗮𝗰𝗰

Ziwei Liu

Gary Marcus

Erik Bernhardsson

rohan anil

Christoph Molnar 🦋 christophmolnar.bsky.social

Chris Albon

Edward Grefenstette

Talia Ringer 🕊

Tom Goldstein

⿻ Andrew Trask

Pedro Domingos

Awni Hannun

Ailene (´▽) ♥️ (deleted at 1.6k)