ascii0216's profile picture.

ASC

@ascii0216

ASC reposteó

a controversial opinion i hold deeply is that AI is not superhuman at writing (and isn't close) there are 10x and 100x human writers. here's a random excerpt from David Foster Wallace, widely agreed to be one of the greatest modern writers if you sincerely think anything…

jxmnop's tweet image. a controversial opinion i hold deeply is that AI is not superhuman at writing  (and isn't close)

there are 10x and 100x human writers. here's a random excerpt from David Foster Wallace,  widely agreed to be one of the greatest modern writers

 if you sincerely think anything…
jxmnop's tweet image. a controversial opinion i hold deeply is that AI is not superhuman at writing  (and isn't close)

there are 10x and 100x human writers. here's a random excerpt from David Foster Wallace,  widely agreed to be one of the greatest modern writers

 if you sincerely think anything…

ASC reposteó

Everyone vote for o3-mini type model to be open-sourced please 🥺🥺🥺 We can distill or quantize a phone sized model dw the open-source community will work its magic!!

for our next open source project, would it be more useful to do an o3-mini level model that is pretty small but still needs to run on GPUs, or the best phone-sized model we can do?

o3-mini %53.9
phone-sized model %46.1

128108 voto · Resultados finales



ASC reposteó

I don’t think I’m better than other people but I definitely think my *taste* is just objectively superior I have so much confidence in my aesthetic opinions it’s frankly delusional

I think anyone who has ever made anything great needs to have a smidgen of subclinical narcissism, that when asked “do you think you’re better than everyone else”, the answer is “yes, I do and I am.”



ASC reposteó

xAI Deep Research for summarizing all the bookmarks you never got around to reading. generates a comprehensive report that you can bookmark for later


ASC reposteó

I think over the next decade we’ll see the idea of different body parts having different specialties (eg gut, heart, head) experiencing a dewooification. today they’re chakras and energy centers and stuff and I think we’ll start talking about them more as computational ASICs


ASC reposteó

I just keep on coming back to Claude


ASC reposteó

no thank you but we will buy twitter for $9.74 billion if you want


ASC reposteó

To an LLM, a novel discovery is indistinguishable from an error.

This question is even more puzzling and salient given the existence of Deep Research



ASC reposteó

Transformers can overcome easy-to-hard and length generalization challenges through recursive self-improvement. Paper on arxiv coming on Monday. Link to a talk I gave on this below 👇 Super excited about this work!

DimitrisPapail's tweet image. Transformers can overcome easy-to-hard and length generalization challenges through recursive self-improvement. 

Paper on arxiv coming on Monday.
Link to a talk I gave on this below 👇

Super excited about this work!
DimitrisPapail's tweet image. Transformers can overcome easy-to-hard and length generalization challenges through recursive self-improvement. 

Paper on arxiv coming on Monday.
Link to a talk I gave on this below 👇

Super excited about this work!
DimitrisPapail's tweet image. Transformers can overcome easy-to-hard and length generalization challenges through recursive self-improvement. 

Paper on arxiv coming on Monday.
Link to a talk I gave on this below 👇

Super excited about this work!

ASC reposteó

Inspired by @karpathy and the idea of using games to compare LLMs, I've built a version of the game Codenames where different models are paired in teams to play the game with each other. Fun to see o3-mini team with R1 against Grok and Gemini! Link and repo below.

I quite like the idea using games to evaluate LLMs against each other, instead of fixed evals. Playing against another intelligent entity self-balances and adapts difficulty, so each eval (/environment) is leveraged a lot more. There's some early attempts around. Exciting area.



ASC reposteó

Finished a run (R1 style) GRPO on Qwen-2.5-0.5B (base model) yield +10 accuracy points on GSM8K. Literally just works. Base model scores 41.6% as reported on qwen paper vs 51%~ GRPO

abacaj's tweet image. Finished a run (R1 style) GRPO on Qwen-2.5-0.5B (base model) yield +10 accuracy points on GSM8K. Literally just works. Base model scores 41.6% as reported on qwen paper vs 51%~ GRPO

United States Tendencias

Loading...

Something went wrong.


Something went wrong.