Jiashuo Liu

@liujiashuo77

Research Scientist at ByteDance Seed | Advanced & Interesting LLM/Agent Evaluation. Opinions are my own.

Academic

Beijing

ljsthu.github.io

Se unió en Agosto de 2021

385Posts 2KSeguidores 680Siguiendo

Tal vez te guste

@jd92wang

@VioletNPeng

@zdhnarsil

@HuaxiuYaoML

@ShuangL13799063

@weisshelter

@syz0x1

@Yikang_Shen

@Haoxiang__Wang

@lphLeo623

@difanzou

@Zixin_Wen

@Roger98079446

@Shanda_Li_2000

@yifeiwang77

Fijado

Jiashuo Liu

@liujiashuo77

20 ago

We built FutureX, the world’s first live benchmark for real future prediction — politics, economy, culture, sports, etc. Among 23 AI agents, #Grok4 ranked #1 🏆 Elon didn’t lie. @elonmusk your model sees further 🚀🍀 LeaderBoard: futurex-ai.github.io

liujiashuo77's tweet image. We built FutureX, the world’s first live benchmark for real future prediction — politics, economy, culture, sports, etc.
Among 23 AI agents, #Grok4 ranked #1 🏆
Elon didn’t lie.
@elonmusk your model sees further 🚀🍀

LeaderBoard: futurex-ai.github.io

Jiashuo Liu

@liujiashuo77

3 dic

Again, @grok stands out on CryptoBench, our recent expert-level live bench on cryptocurrency.

Jiacheng Guo

@JiachengGu50887

2 dic

We built CryptoBench, the world's first expert-level dynamic benchmark for evaluating LLM Agents in cryptocurrency—from real-time retrieval to future predictions, spanning on-chain intel to DeFi risk forecasting. Among 10 top models, #Grok4thinking ranks #1 🏆 @elonmusk, your…

JiachengGu50887's tweet image. We built CryptoBench, the world's first expert-level dynamic benchmark for evaluating LLM Agents in cryptocurrency—from real-time retrieval to future predictions, spanning on-chain intel to DeFi risk forecasting.

Among 10 top models, #Grok4thinking ranks #1 🏆
@elonmusk, your…

Jiashuo Liu

@liujiashuo77

8 nov

Fantastic work done by @zhiyuan_nlper on RL, with insightful analysis and simple but effective solutions!

zeng zhiyuan

@zhiyuan_nlper

7 nov

🚀 Thrilled to share our new work, "RLoop: A Self-Improving Framework for Reinforcement Learning"! arxiv.org/pdf/2511.04285

Jiashuo Liu

@liujiashuo77

27 oct

It's cool to see our FinSearchComp used!

MiniMax (official)

@MiniMax__AI

27 oct

We’re open-sourcing MiniMax M2 — Agent & Code Native, at 8% Claude Sonnet price, ~2x faster ⚡ Global FREE for a limited time via MiniMax Agent & API - Advanced Coding Capability: Engineered for end-to-end developer workflows. Strong capability on a wide-range of applications…

MiniMax__AI's tweet image. We’re open-sourcing MiniMax M2 — Agent &amp; Code Native, at 8% Claude Sonnet price, ~2x faster
⚡ Global FREE for a limited time via MiniMax Agent &amp; API
- Advanced Coding Capability: Engineered for end-to-end developer workflows. Strong capability on a wide-range of applications…

Jiashuo Liu

@liujiashuo77

17 oct

Glad to see this is open-sourced! This agent really outperforms on our FutureX live bench. It's super cool and needs more attention!

BeyondBacktesting

@BBacktesting

17 oct

I have published my #1 Winning FutureX research agentic framework with full competition pipeline that resulted in a 23% performance boost over GPT-5 Pro and an incredible 47% performance boost over Grok-4 Search in FutureX competition: - Full Agent Python code for rapidly…

BBacktesting's tweet image. I have published my #1 Winning FutureX research agentic framework with full competition pipeline that resulted in a 23% performance boost over GPT-5 Pro and an incredible 47% performance boost over Grok-4 Search in FutureX competition:

- Full Agent Python code for rapidly…

Jiashuo Liu

@liujiashuo77

24 sept

Agreed...

Rohan Paul

@rohanpaul_ai

23 sept

Anthropic CEO Dario Amodei on Open-Source AI Models. "I don't think open source works the same way in AI that it has worked in other areas. Primarily because with open source you can see the source code of the model. Here we can't see inside the model, it's often called open…

Jiashuo Liu

@liujiashuo77

20 sept

Grok has been underestimated for long, but clearly, it's among top-tier models now. We've investigated and analyzed Grok 4's search and reasoning patterns, which're really impressive! Looking forward to Grok 5 now:)

xAI

@xai

19 sept

Introducing Grok 4 Fast, a multimodal reasoning model with a 2M context window that sets a new standard for cost-efficient intelligence. Available for free on grok.com, grok.x.com, iOS and Android apps, and OpenRouter. x.ai/news/grok-4-fa…

x.ai

Grok 4 Fast | xAI

Pushing the Frontier of Cost-Efficient Intelligence

Fuente: x.ai

Jiashuo Liu

@liujiashuo77

11 sept

Week 2 Update of FutureX: - In our latest weekly leaderboard, surprisingly, MiroMind's open-source deep research agent plus GPT-5 got 1st place! Congrats @miromind_ai - ChatGPT-Agent slightly outperformed Grok4. Competition is tough! - Claude Opus 4.1 (submitted by independent…

liujiashuo77's tweet image. Week 2 Update of FutureX:
- In our latest weekly leaderboard, surprisingly, MiroMind's open-source deep research agent plus GPT-5 got 1st place! Congrats @miromind_ai
- ChatGPT-Agent slightly outperformed Grok4. Competition is tough!
- Claude Opus 4.1 (submitted by independent…

Jiashuo Liu

@liujiashuo77

20 ago

Jiashuo Liu

@liujiashuo77

6 sept

Check this ⬇️ super cool pipeline!

BeyondBacktesting

@BBacktesting

6 sept

FutureX Results I am now officially an AI researcher. Something interesting in the results: I beat on level 1 and 2 but lost on level 3 and 4. In this first week, I fed everything into a single context which likely reduced the amount of searches per query required to best…