rasdani

@rasdani_

Founding Engineer @ellamindAI open-source LLMs @DiscoResearchAI

~/.cache/huggingface

Bergabung pada April 2022

427Postingan 475Pengikut 3KMengikuti

Anda mungkin suka

@Ruesavatar

@student201012

@luka_spahija

@derekisnt

@az0o0z_97

@sbkobaidze

Alexander Doria

@Dorialexander

9 Okt

new vlm benchmark just popped.

rasdani memposting ulang

I don't know what labs are doing to these poor LLMs during RL but they are mortally terrified of exceptions, in any infinitesimally likely case. Exceptions are a normal part of life and healthy dev process. Sign my LLM welfare petition for improved rewards in cases of exceptions.

rasdani memposting ulang

Yacine Mahdid

@yacinelearning

7 Okt

an ai product???? no no no no product why would release a product? if you show a product people will ask about benchmarks and it will never be enough frontier labs that were the 100X becomes the 2X saas dog but if you have no product you can say you building…

yacinelearning's tweet image. an ai product????

no no no no product

why would release a product?

if you show a product people will ask about benchmarks and it will never be enough

frontier labs that were the 100X becomes the 2X saas dog

but if you have no product you can say you building…

Mishig Davaadorj

@mishig25

7 Okt

SSI strategy of not releasing a product is probably a good one. The minute one releases a product, one will be dragged into so fierce competition with OAI, gemini, ... that the original goal will be forgotten. Maybe it would have been wiser for Anthropic to never release Claude…

rasdani

@rasdani_

7 Okt

infinite money glitch

BuccoCapital Bloke

@buccocapital

6 Okt

Matt Levine on the AMD x OpenAI deal

rasdani memposting ulang

Andrej Karpathy

@karpathy

1 Okt

Finally had a chance to listen through this pod with Sutton, which was interesting and amusing. As background, Sutton's "The Bitter Lesson" has become a bit of biblical text in frontier LLM circles. Researchers routinely talk about and ask whether this or that approach or idea…

Dwarkesh Patel

@dwarkesh_sp

26 Sep

.@RichardSSutton, father of reinforcement learning, doesn’t think LLMs are bitter-lesson-pilled. My steel man of Richard’s position: we need some new architecture to enable continual (on-the-job) learning. And if we have continual learning, we don't need a special training…

rasdani memposting ulang

Aaron

@Norapom04

1 Okt

rasdani memposting ulang

Aaron

@Norapom04

28 Sep

rasdani memposting ulang

Unitree

@UnitreeRobotics

22 Sep

Unitree G1 has mastered more quirky skills 🤩 Unitree G1 has learned the "Anti-Gravity" mode: stability is greatly improved under any action sequence, and even if it falls, it can quickly get back up.

rasdani memposting ulang

CyberRobo

@CyberRobooo

17 Sep

Impressive, highly agile and robust. 20 kg payload per hand.

Dari WUJI TECH

rasdani memposting ulang

kache

@yacineMTB

16 Sep

one of the things that have held true for my entire life is any technical problem is just a matter of time and effort. if you just don't stop, you eventually crack it

rasdani memposting ulang

The Humanoid Hub

@TheHumanoidHub

15 Sep

Dynamic control trained at SUSTech’s ACT Lab in Shenzhen.

rasdani memposting ulang

Harrison Kinsley

@Sentdex

15 Sep

This is incredible

rasdani memposting ulang

Saurabh Shah

@saurabh_shah2

12 Sep

Holy shit they’re doing on-policy RL by just deploying the model to prod lmao that’s so baller. also 2 hrs for a training step makes our 10 minute steps feel lightning fast @hamishivi … they probably have a bigger batch size though 😅

saurabh_shah2's tweet image. Holy shit they’re doing on-policy RL by just deploying the model to prod lmao that’s so baller.

also 2 hrs for a training step makes our 10 minute steps feel lightning fast @hamishivi
… they probably have a bigger batch size though 😅

Cursor

@cursor_ai

11 Sep

We've trained a new Tab model that is now the default in Cursor. This model makes 21% fewer suggestions than the previous model while having a 28% higher accept rate for the suggestions it makes. Learn more about how we improved Tab with online RL.

rasdani memposting ulang

Björn Plüster

@bjoern_pl

10 Sep

All of these behaviors can be explained as subtle artifacts of imperfect rewards during RL training 🔎 Inline imports: likely a scaffold thing (files are read in chunks so edits are done where the model has read the file) but probably also a form of turn-reduction. If you can…

Björn Plüster

@bjoern_pl

10 Sep

‼️PSA on common modes of bad code that codex / claude code produce that I've come across. Keep an eye out for these patterns to avoid getting shamed in code review.

rasdani memposting ulang

Björn Plüster

@bjoern_pl

10 Sep

Have you also come across these? Are there any other recurring failure modes you've seen?

rasdani memposting ulang

Björn Plüster

@bjoern_pl

10 Sep

4. Comments on moved/deleted code: when code is removed or moved, you will often see leftover comments. Useless slop that bloats your codebase and can only stand to confuse people. Imagine you move this code a second time, now the pointer is not only useless but also wrong!

bjoern_pl's tweet image. 4. Comments on moved/deleted code: when code is removed or moved, you will often see leftover comments. Useless slop that bloats your codebase and can only stand to confuse people. Imagine you move this code a second time, now the pointer is not only useless but also wrong!

rasdani memposting ulang

Björn Plüster

@bjoern_pl

10 Sep

3. Backwards compatibility: especially codex tends to want to keep things "backwards compatible" which standalone is a good thing but often leads to leftover/unused code and higher maintenance burden.

bjoern_pl's tweet image. 3. Backwards compatibility: especially codex tends to want to keep things "backwards compatible" which standalone is a good thing but often leads to leftover/unused code and higher maintenance burden.

rasdani memposting ulang

Björn Plüster

@bjoern_pl

10 Sep

2. Unnecessary Fallbacks: likely as an artifact of RL training with tests as rewards, models (esp. gpt-5), tend to go for some safety fallbacks, often not needed and not properly logged. Sometimes these can be helpful but it is prone to introducing unwanted behavior.