Jiashuo Liu
@liujiashuo77
Research Scientist at ByteDance Seed | Advanced & Interesting LLM/Agent Evaluation. Opinions are my own.
Вам может понравиться
We built FutureX, the world’s first live benchmark for real future prediction — politics, economy, culture, sports, etc. Among 23 AI agents, #Grok4 ranked #1 🏆 Elon didn’t lie. @elonmusk your model sees further 🚀🍀 LeaderBoard: futurex-ai.github.io
Fantastic work done by @zhiyuan_nlper on RL, with insightful analysis and simple but effective solutions!
🚀 Thrilled to share our new work, "RLoop: A Self-Improving Framework for Reinforcement Learning"! arxiv.org/pdf/2511.04285
It's cool to see our FinSearchComp used!
We’re open-sourcing MiniMax M2 — Agent & Code Native, at 8% Claude Sonnet price, ~2x faster ⚡ Global FREE for a limited time via MiniMax Agent & API - Advanced Coding Capability: Engineered for end-to-end developer workflows. Strong capability on a wide-range of applications…
Glad to see this is open-sourced! This agent really outperforms on our FutureX live bench. It's super cool and needs more attention!
I have published my #1 Winning FutureX research agentic framework with full competition pipeline that resulted in a 23% performance boost over GPT-5 Pro and an incredible 47% performance boost over Grok-4 Search in FutureX competition: - Full Agent Python code for rapidly…
Agreed...
Anthropic CEO Dario Amodei on Open-Source AI Models. "I don't think open source works the same way in AI that it has worked in other areas. Primarily because with open source you can see the source code of the model. Here we can't see inside the model, it's often called open…
Grok has been underestimated for long, but clearly, it's among top-tier models now. We've investigated and analyzed Grok 4's search and reasoning patterns, which're really impressive! Looking forward to Grok 5 now:)
Introducing Grok 4 Fast, a multimodal reasoning model with a 2M context window that sets a new standard for cost-efficient intelligence. Available for free on grok.com, grok.x.com, iOS and Android apps, and OpenRouter. x.ai/news/grok-4-fa…
Week 2 Update of FutureX: - In our latest weekly leaderboard, surprisingly, MiroMind's open-source deep research agent plus GPT-5 got 1st place! Congrats @miromind_ai - ChatGPT-Agent slightly outperformed Grok4. Competition is tough! - Claude Opus 4.1 (submitted by independent…
We built FutureX, the world’s first live benchmark for real future prediction — politics, economy, culture, sports, etc. Among 23 AI agents, #Grok4 ranked #1 🏆 Elon didn’t lie. @elonmusk your model sees further 🚀🍀 LeaderBoard: futurex-ai.github.io
Check this ⬇️ super cool pipeline!
FutureX Results I am now officially an AI researcher. Something interesting in the results: I beat on level 1 and 2 but lost on level 3 and 4. In this first week, I fed everything into a single context which likely reduced the amount of searches per query required to best…
United States Тренды
- 1. Peggy 31.8K posts
- 2. Sonic 06 2,330 posts
- 3. Berseria 4,753 posts
- 4. Zeraora 14.3K posts
- 5. Cory Mills 29K posts
- 6. #ComunaONada 3,263 posts
- 7. $NVDA 43.9K posts
- 8. Randy Jones N/A
- 9. Dearborn 374K posts
- 10. Ryan Wedding 4,084 posts
- 11. Luxray 2,406 posts
- 12. Cooks 9,936 posts
- 13. Saudi Investment Forum 3,339 posts
- 14. #CurrysPurpleFriday 13.2K posts
- 15. Rick Hendrick 1,094 posts
- 16. Encyclopedia Galactica 5,799 posts
- 17. #wednesdaymotivation 8,157 posts
- 18. International Men's Day 83.3K posts
- 19. #Wednesdayvibe 2,974 posts
- 20. Xillia 2 N/A
Вам может понравиться
-
Jindong Wang
@jd92wang -
Violet Peng
@VioletNPeng -
Dinghuai Zhang 张鼎怀
@zdhnarsil -
Huaxiu Yao
@HuaxiuYaoML -
Shuang Li
@ShuangL13799063 -
Wei Jin
@weisshelter -
Shuyan Zhou
@syz0x1 -
Haoran Liu
@Haoran89332647 -
Haoxiang Wang
@Haoxiang__Wang -
Difan Zou
@difanzou -
Zixin Wen
@Zixin_Wen -
Shanda Li 黎善达
@Shanda_Li_2000 -
Yifei Wang
@yifeiwang77 -
Tianyu Pang
@TianyuPang1 -
Yifang Chen
@cloudwaysX
Something went wrong.
Something went wrong.