
rasdani
@rasdani_
Founding Engineer @ellamindAI open-source LLMs @DiscoResearchAI
Anda mungkin suka
🤣
I don't know what labs are doing to these poor LLMs during RL but they are mortally terrified of exceptions, in any infinitesimally likely case. Exceptions are a normal part of life and healthy dev process. Sign my LLM welfare petition for improved rewards in cases of exceptions.
an ai product???? no no no no product why would release a product? if you show a product people will ask about benchmarks and it will never be enough frontier labs that were the 100X becomes the 2X saas dog but if you have no product you can say you building…

SSI strategy of not releasing a product is probably a good one. The minute one releases a product, one will be dragged into so fierce competition with OAI, gemini, ... that the original goal will be forgotten. Maybe it would have been wiser for Anthropic to never release Claude…
infinite money glitch
Matt Levine on the AMD x OpenAI deal

Finally had a chance to listen through this pod with Sutton, which was interesting and amusing. As background, Sutton's "The Bitter Lesson" has become a bit of biblical text in frontier LLM circles. Researchers routinely talk about and ask whether this or that approach or idea…
.@RichardSSutton, father of reinforcement learning, doesn’t think LLMs are bitter-lesson-pilled. My steel man of Richard’s position: we need some new architecture to enable continual (on-the-job) learning. And if we have continual learning, we don't need a special training…
Unitree G1 has mastered more quirky skills 🤩 Unitree G1 has learned the "Anti-Gravity" mode: stability is greatly improved under any action sequence, and even if it falls, it can quickly get back up.
Impressive, highly agile and robust. 20 kg payload per hand.
one of the things that have held true for my entire life is any technical problem is just a matter of time and effort. if you just don't stop, you eventually crack it
Dynamic control trained at SUSTech’s ACT Lab in Shenzhen.
This is incredible
Holy shit they’re doing on-policy RL by just deploying the model to prod lmao that’s so baller. also 2 hrs for a training step makes our 10 minute steps feel lightning fast @hamishivi … they probably have a bigger batch size though 😅

We've trained a new Tab model that is now the default in Cursor. This model makes 21% fewer suggestions than the previous model while having a 28% higher accept rate for the suggestions it makes. Learn more about how we improved Tab with online RL.
All of these behaviors can be explained as subtle artifacts of imperfect rewards during RL training 🔎 Inline imports: likely a scaffold thing (files are read in chunks so edits are done where the model has read the file) but probably also a form of turn-reduction. If you can…
‼️PSA on common modes of bad code that codex / claude code produce that I've come across. Keep an eye out for these patterns to avoid getting shamed in code review.
Have you also come across these? Are there any other recurring failure modes you've seen?
4. Comments on moved/deleted code: when code is removed or moved, you will often see leftover comments. Useless slop that bloats your codebase and can only stand to confuse people. Imagine you move this code a second time, now the pointer is not only useless but also wrong!

3. Backwards compatibility: especially codex tends to want to keep things "backwards compatible" which standalone is a good thing but often leads to leftover/unused code and higher maintenance burden.

2. Unnecessary Fallbacks: likely as an artifact of RL training with tests as rewards, models (esp. gpt-5), tend to go for some safety fallbacks, often not needed and not properly logged. Sometimes these can be helpful but it is prone to introducing unwanted behavior.

United States Tren
- 1. Chiefs 70.7K posts
- 2. LaPorta 9,001 posts
- 3. #TNABoundForGlory 33.6K posts
- 4. Goff 11.1K posts
- 5. Butker 7,415 posts
- 6. Kelce 12K posts
- 7. #OnePride 5,193 posts
- 8. Baker 49.4K posts
- 9. Bryce Miller 2,540 posts
- 10. #DETvsKC 3,589 posts
- 11. #SNFonNBC N/A
- 12. Collinsworth 1,861 posts
- 13. Gibbs 5,036 posts
- 14. Dan Campbell 2,020 posts
- 15. #ALCS 7,947 posts
- 16. Polanco 6,078 posts
- 17. Pacheco 4,385 posts
- 18. Patrick Mahomes 6,406 posts
- 19. Leon Slater 2,505 posts
- 20. Cal Raleigh 4,503 posts
Something went wrong.
Something went wrong.