Anda mungkin suka
Can AI solve math research problems that have eluded human mathematicians? Our new benchmark, FrontierMath: Open Problems, is designed to help find out. AI hasn’t solved any of these yet, but the game is young!
AI data center buildouts already rival the Manhattan Project in scale, but there’s little public info about them. So we spent the last few months reading legal permits, staring at satellite images, and scouring news sources. Here’s what you need to know. 🧵
GPT-5 on Sudoku-Bench 🧩 Since releasing Sudoku-Bench in May 2025, when no LLM could solve a classic 9x9 puzzle, we've been evaluating the latest generation of models. GPT-5 now leads our leaderboard with 33% puzzles solved--approximately 2x the previous leader--and is the first
A new eval, Remote Labor Index, measures AI's ability to automate real-world, economically valuable projects from remote work platforms. Currently entirely unsaturated (max score of 2.5%) A great collaboration between @scale_AI and @cais!
Can AI automate jobs? We created the Remote Labor Index to test AI’s ability to automate hundreds of long, real-world, economically valuable projects from remote work platforms. While AIs are smart, they are not yet that useful: the current automation rate is less than 3%.
This is coming from the guy who made jQuery. If I got a compliment like this I would either faint or immediately retire. Highest honor in JS imo
I've been using @tan_stack Start for a new project and it's super good. The server functions completely replace the need for TRPC/GraphQL/REST, the middleware is composable and fully typed, and having TSRouter's nice typing and stateful search params is icing on the cake. A+!
Andrej Karpathy calls AI Agents slop "Overall, the models they are not there. And I feel like the industry [...] it's making too big of a jump and it's trying to pretend that this is amazing. And it's not—it's slop! And I think they are not coming to terms with it. And maybe
So, reminder: the quality of code output by these systems is *very low* and the AIs themselves don't understand the output. This is obvious to anyone who knows how to program. There are still use cases, for example, to output a large volume of low-quality code that is not
United States Tren
- 1. Alysa Liu N/A
- 2. Megan Keller N/A
- 3. Canada N/A
- 4. Bears N/A
- 5. #USAHockey N/A
- 6. Kaori N/A
- 7. Hilary Knight N/A
- 8. #Olympics2026 N/A
- 9. Gold N/A
- 10. Sony N/A
- 11. Bluepoint N/A
- 12. Punch N/A
- 13. USA USA USA N/A
- 14. Amber N/A
- 15. #bucciovertimechallenge N/A
- 16. Andrew N/A
- 17. Hammond N/A
- 18. Laila Edwards N/A
- 19. Board of Peace N/A
- 20. Toy Story 5 N/A
Anda mungkin suka
Something went wrong.
Something went wrong.