Elias
@notes_own
The past is closer to the future than now 支那猪滚
Potrebbero piacerti
GEPA featured in @OpenAI and @BainandCompany new cookbook tutorial, showing how to build self-evolving agents that move beyond static prompts. See how GEPA dynamically enables agents to autonomously reflect, learn from feedback, and evolve their own instructions.
Breaking: we release a fully synthetic generalist dataset for pretraining, SYNTH and two new SOTA reasoning models exclusively trained on it. Despite having seen only 200 billion tokens, Baguettotron is currently best-in-class in its size range.
I'm constantly irritated that I don't have time to read the torrent of cool papers coming faster and faster from amazing people in relevant fields. Other scientists have the same issue and have no time to read most of my lengthy conceptual papers either. So whom are we writing…
wtf, a 80-layers 321M model???
Synthetic playgrounds enabled a series of controlled experiments that brought us to favor extreme depth design. We selected a 80-layers architecture for Baguettotron, with improvements across the board on memorization of logical reasoning: huggingface.co/PleIAs/Baguett…
ERNIE-5.0-Preview-1022 shines in Creative Writing, Longer Query & Instruction Following. Click through to see all the leaderboard details by key categories: lmarena.ai/leaderboard/te…
LLMSys 我愿意称之为NvidiaSys,谁懂NV toolchain stack,谁就掌握了LLM。 但是NV已经到了顶点,在不远的将来就只有下坡路了。
Edison Scientific (a brand-new company spun out of FutureHouse) releases Kosmos: An AI Scientist for Autonomous Discovery "Our beta users estimate that Kosmos can do in one day what would take them 6 months, and we find that 79.4% of its conclusions are accurate." The paper…
important research paper from google... "LLMs don't just memorize, they build a geometric map that helps them reason" according to the paper: – builds a global map from only local pairs – plans full unseen paths when knowledge is in weights; fails in context – turns a…
1/2) Subham Sahoo (@ssahoo_, @cornell PhD) presented his amozing work on "The Diffusion Duality" in our Generative Memory Lab channel. A must watch for people interested in discrete diffusion! Link below:
The Illustrated NeurIPS 2025: A Visual Map of the AI Frontier New blog post! NeurIPS 2025 papers are out—and it’s a lot to take in. This visualization lets you explore the entire research landscape interactively, with clusters, summaries, and @cohere LLM-generated explanations…
Holy shit... this might be the next big paradigm shift in AI. 🤯 Tencent + Tsinghua just dropped a paper called Continuous Autoregressive Language Models (CALM) and it basically kills the “next-token” paradigm every LLM is built on. Instead of predicting one token at a time,…
近日,日本參議院議員石平在自己的個人YouTube頻道中,痛罵中國國家主席習近平29分鐘! 包括但不限於「小學生」「傻X」「蠢豬」「不是個東西」等金句! 火力之猛,實屬日本現役參議員中的罕見! 影片連結放到下方第一條評論中 👇👇👇
笑出猪叫 近日,日本参议院议员石平在自己的个人YouTube頻道中,对习包子全程开大,多达29分鐘! 包括但不限于「小学生」「傻X」「蠢猪」等金句! 骂得好 发出来给小粉们破防下🤪
Samsung's Tiny Recursive Model (TRM) masters complex reasoning With just 7M parameters, TRM outperforms large LLMs on hard puzzles like Sudoku & ARC-AGI. This "Less is More" approach redefines efficiency in AI, using less than 0.01% of competitors' parameters!
Transfusion combines autoregressive with diffusion to train a single transformer, but what if we combine Flow with Flow? 🤔 🌊OneFlow🌊 the first non-autoregressive model to generate text and images concurrently using a single transformer—unifying Edit Flow (text) with Flow…
Reinforcement Learning (RL) has long been the dominant method for fine-tuning, powering many state-of-the-art LLMs. Methods like PPO and GRPO explore in action space. But can we instead explore directly in parameter space? YES we can. We propose a scalable framework for…
🤔Can we train RL on LLMs with extremely stale data? 🚀Our latest study says YES! Stale data can be as informative as on-policy data, unlocking more scalable, efficient asynchronous RL for LLMs. We introduce M2PO, an off-policy RL algorithm that keeps training stable and…
Introducing Pretraining with Hierarchical Memories: Separating Knowledge & Reasoning for On-Device LLM Deployment 💡We propose dividing LLM parameters into 1) anchor (always used, capturing commonsense) and 2) memory bank (selected per query, capturing world knowledge). [1/X]🧵
United States Tendenze
- 1. White Sox 14.3K posts
- 2. Ugarte 12.5K posts
- 3. #AskFFT N/A
- 4. Murakami 17K posts
- 5. Happy Winter Solstice 11.4K posts
- 6. Joan Garcia 8,671 posts
- 7. Good Sunday 75K posts
- 8. Sesko 8,266 posts
- 9. #sundayvibes 5,495 posts
- 10. Estime 6,371 posts
- 11. Full PPR 1,677 posts
- 12. Villarreal 20.7K posts
- 13. Michael Wilson 1,185 posts
- 14. #FFPregame N/A
- 15. #AVLMUN 3,745 posts
- 16. Sutton 2,312 posts
- 17. #FirstDayOfWinter N/A
- 18. Dowdle 1,443 posts
- 19. Judkins 1,681 posts
- 20. Brissett 1,087 posts
Potrebbero piacerti
-
subho
@subhobrata1 -
Greg Kamradt
@GregKamradt -
COSMIC 🐙 SLOP
@afrocosmist -
Jorge Hernandez 🇺🇦 🏳️🌈
@braneloop -
Richard Nadler
@RichardNadler1 -
Marketing AI Institute
@MktgAi -
Exa
@ExaAILabs -
soulo
@soulosaint -
Aleksa Gordić (水平问题)
@gordic_aleksa -
The Spaceshipper 🚀
@TheSpaceshipper -
T Tian
@gdsttian -
Daily AI Papers
@papers_daily -
らうじぃ
@Rauziii -
Tejas Kulkarni
@tejasdkulkarni -
Mia ✦彡
@mia_novakova
Something went wrong.
Something went wrong.