rasdani_'s profile picture. Founding Engineer @ellamindAI
open-source LLMs @DiscoResearchAI

rasdani

@rasdani_

Founding Engineer @ellamindAI open-source LLMs @DiscoResearchAI

🤣

new vlm benchmark just popped.

Dorialexander's tweet image. new vlm benchmark just popped.


rasdani memposting ulang

I don't know what labs are doing to these poor LLMs during RL but they are mortally terrified of exceptions, in any infinitesimally likely case. Exceptions are a normal part of life and healthy dev process. Sign my LLM welfare petition for improved rewards in cases of exceptions.


rasdani memposting ulang

an ai product???? no no no no product why would release a product? if you show a product people will ask about benchmarks and it will never be enough frontier labs that were the 100X becomes the 2X saas dog but if you have no product you can say you building…

yacinelearning's tweet image. an ai product????

no no no no product

why would release a product?

if you show a product people will ask about benchmarks and it will never be enough

frontier labs that were the 100X becomes the 2X saas dog

but if you have no product you can say you building…

SSI strategy of not releasing a product is probably a good one. The minute one releases a product, one will be dragged into so fierce competition with OAI, gemini, ... that the original goal will be forgotten. Maybe it would have been wiser for Anthropic to never release Claude…



infinite money glitch

Matt Levine on the AMD x OpenAI deal

buccocapital's tweet image. Matt Levine on the AMD x OpenAI deal


rasdani memposting ulang

Finally had a chance to listen through this pod with Sutton, which was interesting and amusing. As background, Sutton's "The Bitter Lesson" has become a bit of biblical text in frontier LLM circles. Researchers routinely talk about and ask whether this or that approach or idea…

.@RichardSSutton, father of reinforcement learning, doesn’t think LLMs are bitter-lesson-pilled. My steel man of Richard’s position: we need some new architecture to enable continual (on-the-job) learning. And if we have continual learning, we don't need a special training…



rasdani memposting ulang

rasdani memposting ulang

rasdani memposting ulang

Unitree G1 has mastered more quirky skills 🤩 Unitree G1 has learned the "Anti-Gravity" mode: stability is greatly improved under any action sequence, and even if it falls, it can quickly get back up.


rasdani memposting ulang

Impressive, highly agile and robust. 20 kg payload per hand.

Dari WUJI TECH

rasdani memposting ulang

one of the things that have held true for my entire life is any technical problem is just a matter of time and effort. if you just don't stop, you eventually crack it


rasdani memposting ulang

Dynamic control trained at SUSTech’s ACT Lab in Shenzhen.


rasdani memposting ulang

This is incredible


rasdani memposting ulang

Holy shit they’re doing on-policy RL by just deploying the model to prod lmao that’s so baller. also 2 hrs for a training step makes our 10 minute steps feel lightning fast @hamishivi … they probably have a bigger batch size though 😅

saurabh_shah2's tweet image. Holy shit they’re doing on-policy RL by just deploying the model to prod lmao that’s so baller. 

also 2 hrs for a training step makes our 10 minute steps feel lightning fast @hamishivi 
… they probably have a bigger batch size though 😅

We've trained a new Tab model that is now the default in Cursor. This model makes 21% fewer suggestions than the previous model while having a 28% higher accept rate for the suggestions it makes. Learn more about how we improved Tab with online RL.



rasdani memposting ulang

All of these behaviors can be explained as subtle artifacts of imperfect rewards during RL training 🔎 Inline imports: likely a scaffold thing (files are read in chunks so edits are done where the model has read the file) but probably also a form of turn-reduction. If you can…

‼️PSA on common modes of bad code that codex / claude code produce that I've come across. Keep an eye out for these patterns to avoid getting shamed in code review.



rasdani memposting ulang

Have you also come across these? Are there any other recurring failure modes you've seen?


rasdani memposting ulang

4. Comments on moved/deleted code: when code is removed or moved, you will often see leftover comments. Useless slop that bloats your codebase and can only stand to confuse people. Imagine you move this code a second time, now the pointer is not only useless but also wrong!

bjoern_pl's tweet image. 4. Comments on moved/deleted code: when code is removed or moved, you will often see leftover comments. Useless slop that bloats your codebase and can only stand to confuse people. Imagine you move this code a second time, now the pointer is not only useless but also wrong!

rasdani memposting ulang

3. Backwards compatibility: especially codex tends to want to keep things "backwards compatible" which standalone is a good thing but often leads to leftover/unused code and higher maintenance burden.

bjoern_pl's tweet image. 3. Backwards compatibility: especially codex tends to want to keep things "backwards compatible" which standalone is a good thing but often leads to leftover/unused code and higher maintenance burden.

rasdani memposting ulang

2. Unnecessary Fallbacks: likely as an artifact of RL training with tests as rewards, models (esp. gpt-5), tend to go for some safety fallbacks, often not needed and not properly logged. Sometimes these can be helpful but it is prone to introducing unwanted behavior.

bjoern_pl's tweet image. 2. Unnecessary Fallbacks: likely as an artifact of RL training with tests as rewards, models (esp. gpt-5), tend to go for some safety fallbacks, often not needed and not properly logged. Sometimes these can be helpful but it is prone to introducing unwanted behavior.

United States Tren

Loading...

Something went wrong.


Something went wrong.