Sourav

@srvmshr

ML University of Tokyo. Prev: Microsoft Research RF, @virginia_tech. Personal opinions. Coasting life with @jnchrltte

Science & Technology

Tokyo-to, Japan

4월 2013에 가입

1천게시물 609팔로워 971팔로우 중

내가 좋아할 만한 콘텐츠

@Pavel_Izmailov

@tanmay2099

@DhruvBatra_

@neu_rips

@AmlabUva

@avt_im

@andrew_n_carr

@maxjaderberg

@NagraniArsha

@_sam_sinha_

@SingularMattrix

@yubai01

@RickyTQChen

@yash2kant

@misovalko

Sourav

@srvmshr

24 분

The #Gemini3 model is soo good at coding. Pairing #codex and Gemini together, I could squash almost all my bugs in just 2 passes max. Thanks @GoogleDeepMind I was a non-believer in LLM based coding for serious work, but I'm rapidly changing my opinion about it. 🥲🥲

Sourav

@srvmshr

. 11. 18.

RIP @burgerbecky You'll be missed and the world is forever a little darker with your light that's gone 😔 May your next journey be as exciting as infinite scroll of Space Invaders

Sourav

@srvmshr

. 11. 18.

Beyond all other failures this week VS @code deprecating Intellicode in favor of @GitHubCopilot is the most sad thing to happen 😔 Not everything needed a subscription account - Intellicode had a long and healthy run being bundled as the default

Sourav

@srvmshr

. 11. 17.

I'd love to try this out on some math heavy papers, but it makes me uncomfortable to know that hallucinations could completely paint a different picture from what paper claims. Does anyone have any better recommendation. Wasn't there something in this direction from @allen_ai

alphaXiv

@askalphaxiv

. 11. 13.

Introducing quickarXiv Papers are often written in convoluted language that is hard to understand. We're fixing that. Swap arxiv → quickarxiv on any paper URL to get an instant blog with figures, insights, and explanations. Now extracted with DeepSeek OCR 🚀

Sourav

@srvmshr

. 11. 16.

I want to migrate away from GH to @codeberg_org but the biggest hurdle is the dozens of integrations & the ecosystem access overall. Codeberg is a solid choice & feels like the good ol' Github without the bloat & incessant spam/scam vectors

Sourav 님이 재게시함

Philipp Schmid

@_philschmid

. 11. 13.

Interesting thread on 6 months of "hardcore" usage of coding agents (rewriting ~300k LOC). The meta-learning is ironic: The user stopped hard "vibe coding" and return to disciplined context engineering. reddit.com/r/ClaudeAI/com…

_philschmid's tweet card. Explore this post and more from the ClaudeAI community

From the ClaudeAI community on Reddit

출처: reddit.com

Sourav 님이 재게시함

elie

@eliebakouch

. 11. 11.

Very cool blog by @character_ai diving into how they trained their proprietary model Kaiju (13B, 34B, 110B), before switching to OSS model, and spoiler: it has Noam Shazeer written all over it. Most of the choices for model design (MQA, SWA, KV Cache, Quantization) are not to…

eliebakouch's tweet image. Very cool blog by @character_ai diving into how they trained their proprietary model Kaiju (13B, 34B, 110B), before switching to OSS model, and spoiler: it has Noam Shazeer written all over it.

Most of the choices for model design (MQA, SWA, KV Cache, Quantization) are not to…

Sourav

@srvmshr

. 11. 10.

💯 FOSS models are the way to go. Making things accessible & pushing the boundaries of what is doable

Okara

@askOkara

. 11. 10.

for every closed model, there's an open source alternative sonnet 4.5 → glm 4.6 / minimax m2 grok code fast → gpt-oss 120b / qwen 3 coder gpt 5 → kimi k2 / kimi k2 thinking gemini 2.5 flash → qwen 2.5 image gemini 2.5…

Sourav 님이 재게시함

Christian S. Perone

@tarantulae

. 11. 7.

Gemma3n was released a few months ago, I wasn't able to find more info and I found it a *very interesting* architecture with a lot of innovations (Matryoshka Transformer, MobileNetV5, etc), so I decided to dig further, here you are the slides of this talk: drive.google.com/file/d/15hbh03…

tarantulae's tweet image. Gemma3n was released a few months ago, I wasn't able to find more info and I found it a *very interesting* architecture with a lot of innovations (Matryoshka Transformer, MobileNetV5, etc), so I decided to dig further, here you are the slides of this talk: drive.google.com/file/d/15hbh03…

Sourav

@srvmshr

. 11. 7.

I struggled to explain agents & tools in layterms. This one does a fairly good job about concepts with examples fly.io/blog/everyone-…

srvmshr's tweet card. They're like riding a bike: easy, and you don't get it until you try.

You Should Write An Agent

출처: fly.io

Sourav 님이 재게시함

Jack Merullo

@jack_merullo_

. 11. 6.

How is memorized data stored in a model? We disentangle MLP weights in LMs and ViTs into rank-1 components based on their curvature in the loss, and find representational signatures of both generalizing structure and memorized training data

jack_merullo_'s tweet image. How is memorized data stored in a model? We disentangle MLP weights in LMs and ViTs into rank-1 components based on their curvature in the loss, and find representational signatures of both generalizing structure and memorized training data

Sourav 님이 재게시함

Bo Zhao

@BoZhao__

. 11. 5.

There’s lots of symmetry in neural networks! 🔍 We survey where they appear, how they shape loss landscapes and learning dynamics, and applications in optimization, weight space learning, and much more. ➡️ Symmetry in Neural Network Parameter Spaces arxiv.org/abs/2506.13018

BoZhao__'s tweet image. There’s lots of symmetry in neural networks!

🔍 We survey where they appear, how they shape loss landscapes and learning dynamics, and applications in optimization, weight space learning, and much more.

➡️ Symmetry in Neural Network Parameter Spaces arxiv.org/abs/2506.13018

Sourav

@srvmshr

. 11. 6.

HLE of 45%. Wow! 💣

Kimi.ai

@Kimi_Moonshot

. 11. 6.

🚀 Hello, Kimi K2 Thinking! The Open-Source Thinking Agent Model is here. 🔹 SOTA on HLE (44.9%) and BrowseComp (60.2%) 🔹 Executes up to 200 – 300 sequential tool calls without human interference 🔹 Excels in reasoning, agentic search, and coding 🔹 256K context window Built…

Kimi_Moonshot's tweet image. 🚀 Hello, Kimi K2 Thinking!
The Open-Source Thinking Agent Model is here.

🔹 SOTA on HLE (44.9%) and BrowseComp (60.2%)
🔹 Executes up to 200 – 300 sequential tool calls without human interference
🔹 Excels in reasoning, agentic search, and coding
🔹 256K context window

Built…

Sourav

@srvmshr

. 11. 5.

I jumped into the @windsurf camp today. Some of the features they integrate are so cool - like deepwiki and codemaps. Only feedback: please integrate other models via Openrouter for e.g... BYOK is great but limited only to Anthropic

Sourav 님이 재게시함

Claude

@claudeai

. 10. 31.

Claude Code's native installer is now generally available. It's simpler, more stable, and doesn't require Node.js. We recommend this as the default installation method for all Claude Code users going forward.

claudeai's tweet image. Claude Code's native installer is now generally available.

It's simpler, more stable, and doesn't require Node.js. We recommend this as the default installation method for all Claude Code users going forward.

Sourav 님이 재게시함

Shane Gu

@shaneguML

. 10. 31.

Hot take: DAgger (Ross 2011) should be the first paper you read to get into RL, instead of Sutton's book. Maybe also read scheduled sampling (Bengio 2015). And before RL, study supervised learning thoroughly.

shaneguML's tweet image. Hot take: DAgger (Ross 2011) should be the first paper you read to get into RL, instead of Sutton's book. Maybe also read scheduled sampling (Bengio 2015). And before RL, study supervised learning thoroughly.

Sourav 님이 재게시함

Simons Institute for the Theory of Computing

@SimonsInstitute

. 10. 30.

Ever wondered about graph learning? Watch Ameya Velingker (@ameya_pa) and Haggai Maron (@HaggaiMaron) give a masterful introduction at the Simons Institute's workshop on Graph Learning Meets Theoretical Computer Science. Video: simons.berkeley.edu/talks/ameya-ve…

SimonsInstitute's tweet image. Ever wondered about graph learning? Watch Ameya Velingker (@ameya_pa) and Haggai Maron (@HaggaiMaron) give a masterful introduction at the Simons Institute's workshop on Graph Learning Meets Theoretical Computer Science. Video: simons.berkeley.edu/talks/ameya-ve…

Sourav

@srvmshr

. 10. 29.

Good deep dive after you digest 😀 1. sander.ai/2023/07/20/per… 2. arxiv.org/abs/2406.08929 3. lilianweng.github.io/posts/2021-07-…

Chieh-Hsin (Jesse) Lai

@JCJesseLai

. 10. 29.

Tired to go back to the original papers again and again? Our monograph: a systematic and fundamental recipe you can rely on! 📘 We’re excited to release 《The Principles of Diffusion Models》— with @DrYangSong, @gimdong58085414, @mittu1204, and @StefanoErmon. It traces the core…

JCJesseLai's tweet image. Tired to go back to the original papers again and again? Our monograph: a systematic and fundamental recipe you can rely on!

📘 We’re excited to release 《The Principles of Diffusion Models》— with @DrYangSong, @gimdong58085414, @mittu1204, and @StefanoErmon.

It traces the core…

Sourav

@srvmshr

. 10. 28.

Guide-coded a high throughput document pipeline using @allen_ai OLMO ocr + LayoutLM3 today. Combining these two proved sticky - especially for runs in parallel. LayoutLM has natural proclivity to use easyocr tesseract type of OCR engine. Too many rough edges to patch over

Sourav 님이 재게시함

Sebastian Raschka

@rasbt

. 10. 21.

DeepSeek finally released a new model and paper. And because this DeepSeek-OCR release is a bit different from what everyone expected, and DeepSeek releases are generally a big deal, I wanted to do a brief explainer of what it is all about. In short, they explore how vision…

rasbt's tweet image. DeepSeek finally released a new model and paper. And because this DeepSeek-OCR release is a bit different from what everyone expected, and DeepSeek releases are generally a big deal, I wanted to do a brief explainer of what it is all about.

In short, they explore how vision…

Sebastian Raschka

@rasbt

Michael Bronstein

@mmbronstein

Talia Ringer 🕊

@TaliaRinger

rohan anil

@_arohan_

$sarahookr's profile picture. Adaptive Intelligence. Built @Cohere_Labs, @GoogleBrain, @GoogleDeepmind. ML Efficiency, Multimodal\lingual. Changing spaces where breakthroughs happen.$

Sara Hooker

@sarahookr

Nathan Benaich

@nathanbenaich

Ulugbek S. Kamilov

@ukmlv

Symmetry and Geometry in Neural Representations

@neur_reps

Dalet Aleph

@triplepixel

Raj Pabnani

@waiting4AGI_

Jeje

@Jj54761863

Woody Lee

@writerwoody

Aayush Karan

@aakaran31

Zara

@ZaraZetlin

alexmolas

@molasalex

Robert Scoble

@Scobleizer

Christian S. Perone

@tarantulae

Adhit

@5_4dh1t

dwasf

@dwasf79850

Luigui Sánchez

@LuiguiSnchez3

Ali

@AliAlmu02285303

Guaki

@Guaki306970

Eflercut

@Eflercut086

Xiang Yue

@xiangyue96

Marco Fumero

@marco_fumero

云舒的AI实践笔记

@wuhao8480867921

Hashir Omer Farooqi

@hashiromer621

Sandeep Sharma

@Sandeep1066116

Thad Bogan

@ThadBogan8510

🌝

@mathphysicsquit

Wolf Rowell

@wolfrowell

Lin May insulinpumplife.com 🧃💉

@Lin62866960

Sairam Ravu

@ravusairam

Ali S

@AliS1535131

ZKIWU

@zkiwu

Tatiset

@Tatiset23JbS

Nismesrare

@NismesrareavmS

Kaoss D.

@AreX_CorSa

Reausue

@Reausuer2zeACx

Therpe

@Therpei5C

pratyush kumar karna

@PKKARNA11

McTisue

@McTisueV90aBcq

Direct Handle

@DirectHandle12

wen👩🏻‍💻

@ds_wen_

ADAM

@noadm18

Searknea

@SearkneaYsc

Yogesh

@yogesh_s_danu

.

@e____no_

Thomas Miller

@livingsoulz

Dario Salvati

@dw4rez

Kulendu

@cool_endu

Sachin bapat

@bapat_sach29684

Wang Ma

@WangMa70190365

Whawdror

@Whawdrorxtjeai

Emile van Krieken

@EmilevanKrieken

DΞΞP in JΛPΛN ♨️

@DogePunk2077

Andrej Karpathy

@karpathy

Sebastian Raschka

@rasbt

Behnam Neyshabur

@bneyshabur

NeurIPS Conference

@NeurIPSConf

Michael Bronstein

@mmbronstein

Tom Goldstein

@tomgoldsteincs

Jason Wei

@_jasonwei

Dileep George

@dileeplearning

Aran Komatsuzaki

@arankomatsuzaki

Soumith Chintala

@soumithchintala

Sasha Rush

@srush_nlp

Talia Ringer 🕊

@TaliaRinger

rohan anil

@_arohan_

Ferenc Huszár

@fhuszar

$sarahookr's profile picture. Adaptive Intelligence. Built @Cohere_Labs, @GoogleBrain, @GoogleDeepmind. ML Efficiency, Multimodal\lingual. Changing spaces where breakthroughs happen.$