polaris_7369's profile picture. PhD student @Harvard | Undergrad @IllinoisCS

Zidi Xiong

@polaris_7369

PhD student @Harvard | Undergrad @IllinoisCS

Zidi Xiong 已轉發

Great project from the AI safety class! See all projects and notes on lesswrong.com/w/cs-2881r

What do agents do when the only path to a goal requires harmful action? Do they choose harm or accept failure? We explore these questions with experiments in an agentic coding environment. Code: github.com/ItamarRocha/sc… Blog Post: lesswrong.com/posts/AJANBeJb… 1/n



Zidi Xiong 已轉發

Thread about the mini-project we did for @boazbaraktcs AI Safety class. The results are interesting, and I had a laugh seeing the unhinged things these frontier models can come up with under such a simple task. 🤣

What do agents do when the only path to a goal requires harmful action? Do they choose harm or accept failure? We explore these questions with experiments in an agentic coding environment. Code: github.com/ItamarRocha/sc… Blog Post: lesswrong.com/posts/AJANBeJb… 1/n



Zidi Xiong 已轉發

What do agents do when the only path to a goal requires harmful action? Do they choose harm or accept failure? We explore these questions with experiments in an agentic coding environment. Code: github.com/ItamarRocha/sc… Blog Post: lesswrong.com/posts/AJANBeJb… 1/n


Zidi Xiong 已轉發

1/ Multilinguality & RL folks: Previously, we found LMs often fail to produce reasoning traces in the user's language; prompting/SFT helps, but hurts accuracy. (To be presented on Fri Nov 7, 12:30-13:30 #EMNLP2025 ) ⚠️ More importantly, we already tested an RL fix! Thread below.

Jirui_Qi's tweet image. 1/ Multilinguality & RL folks: Previously, we found LMs often fail to produce reasoning traces in the user's language; prompting/SFT helps, but hurts accuracy. (To be presented on Fri Nov 7, 12:30-13:30 #EMNLP2025 )

⚠️ More importantly, we already tested an RL fix! Thread below.

2/ many previous works, including ours, showed that prompting does not work... We present some“budget alignment” recipes: • SFT : 817 multilingual chains to teach in-language reasoning • GRPO (math500-only RL): recover/boost accuracy while keeping the language policy…



Zidi Xiong 已轉發

Reasoning models do not think in user's query langauge, our work will be presented by @Jirui_Qi at #EMNLP2025! Now we dive a bit more into the potential solution! We set a goal: to make models reason in the user’s language without losing accuracy. huggingface.co/blog/shanchen/…


Check out our new results on multilingual reasoning!

此推文已無法使用。

Zidi Xiong 已轉發

We’re super excited to introduce DIRT: The Distributed Intelligent Replicator Toolkit: github.com/aaronwalsman/d… DIRT is a GPU-accelerated multi-agent simulation platform enabling artificial life research in dynamic, open-ended environments at unprecedented scales. 1/n

jbejjani2022's tweet image. We’re super excited to introduce DIRT: The Distributed Intelligent Replicator Toolkit: github.com/aaronwalsman/d…

DIRT is a GPU-accelerated multi-agent simulation platform enabling artificial life research in dynamic, open-ended environments at unprecedented scales.

1/n

Zidi Xiong 已轉發

🧠 How faithfully does AI think? Join @ceciletamura of @ploutosai & @polaris_7369 @Harvard (author) as they explore Measuring the Faithfulness of Thinking Drafts in Large Reasoning Models. 📅 Oct 15 · 4 PM PDT 🎥 [world.ploutos.dev/stream/elated-…]()

ceciletamura's tweet image. 🧠 How faithfully does AI think?

Join @ceciletamura of @ploutosai  & @polaris_7369 @Harvard (author) as they explore Measuring the Faithfulness of Thinking Drafts in Large Reasoning Models.

📅 Oct 15 · 4 PM PDT
🎥 [world.ploutos.dev/stream/elated-…]()

Zidi Xiong 已轉發

Our paper on multilingual LRM was featured in State of AI Report 2025! The report notes: forcing reasoning in the user's language boosts match to ~98% but drops accuracy by 9-13 % — the core trade-off we studied. 📄arxiv.org/abs/2505.22888 Thanks @nathanbenaich @stateofaireport

🪩The one and only @stateofaireport 2025 is live! 🪩 It’s been a monumental 12 months for AI. Our 8th annual report is the most comprehensive it's ever been, covering what you *need* to know about research, industry, politics, safety and our new usage data. My highlight reel:



Just accepted by #NeurIPS2025 🎉

(1/n) Large Reasoning Models (LRMs) enhance complex problem-solving by generating multi-path "Thinking Drafts." But how reliable are these drafts? Can we trust the intermediate reasoning steps and final conclusions, and effectively monitor or control them? 🤔

polaris_7369's tweet image. (1/n) Large Reasoning Models (LRMs) enhance complex problem-solving by generating multi-path "Thinking Drafts."
But how reliable are these drafts? Can we trust the intermediate reasoning steps and final conclusions, and effectively monitor or control them? 🤔


Zidi Xiong 已轉發

Are there conceptual directions in VLMs that transcend modality? Check out our COLM spotlight🔦 paper! We analyze how linear concepts interact with multimodality in VLM embeddings using SAEs with @Huangyu58589918, @napoolar, @ShamKakade6 and Stephanie Gil arxiv.org/abs/2504.11695

isabelpapad's tweet image. Are there conceptual directions in VLMs that transcend modality? Check out our COLM spotlight🔦 paper! We analyze how linear concepts interact with multimodality in VLM embeddings using SAEs

with @Huangyu58589918, @napoolar, @ShamKakade6 and Stephanie Gil
arxiv.org/abs/2504.11695

Zidi Xiong 已轉發

Our paper on multilingual reasoning is accepted to Findings of #EMNLP2025! 🎉 (OA: 3/3/3.5/4) We show SOTA LMs struggle with reasoning in non-English languages; prompt-hack & post-training improve alignment but trade off accuracy. 📄 arxiv.org/abs/2505.22888 See you in Suzhou!

[1/]💡New Paper Large reasoning models (LRMs) are strong in English — but how well do they reason in your language? Our latest work uncovers their limitation and a clear trade-off: Controlling Thinking Trace Language Comes at the Cost of Accuracy 📄Link: arxiv.org/abs/2505.22888

Jirui_Qi's tweet image. [1/]💡New Paper
Large reasoning models (LRMs) are strong in English — but how well do they reason in your language?

Our latest work uncovers their limitation and a clear trade-off:
Controlling Thinking Trace Language Comes at the Cost of Accuracy

📄Link: arxiv.org/abs/2505.22888


Zidi Xiong 已轉發

What precision should we use to train large AI models effectively? Our latest research probes the subtle nature of training instabilities under low precision formats like MXFP8 and ways to mitigate them. Thread 🧵👇

Huangyu58589918's tweet image. What precision should we use to train large AI models effectively? Our latest research probes the subtle nature of training instabilities under low precision formats like MXFP8 and ways to mitigate them. Thread 🧵👇

Zidi Xiong 已轉發

‼️ 1/n Ask your reasoning model to think in lower resource language does degrade models’ performance at the moment. My awesome Co-author already communicated the main points in the thread, I will just communicate some random things we learned in my 🧵

[1/]💡New Paper Large reasoning models (LRMs) are strong in English — but how well do they reason in your language? Our latest work uncovers their limitation and a clear trade-off: Controlling Thinking Trace Language Comes at the Cost of Accuracy 📄Link: arxiv.org/abs/2505.22888

Jirui_Qi's tweet image. [1/]💡New Paper
Large reasoning models (LRMs) are strong in English — but how well do they reason in your language?

Our latest work uncovers their limitation and a clear trade-off:
Controlling Thinking Trace Language Comes at the Cost of Accuracy

📄Link: arxiv.org/abs/2505.22888


Check out our new work on multilingual reasoning in LRMs!

[1/]💡New Paper Large reasoning models (LRMs) are strong in English — but how well do they reason in your language? Our latest work uncovers their limitation and a clear trade-off: Controlling Thinking Trace Language Comes at the Cost of Accuracy 📄Link: arxiv.org/abs/2505.22888

Jirui_Qi's tweet image. [1/]💡New Paper
Large reasoning models (LRMs) are strong in English — but how well do they reason in your language?

Our latest work uncovers their limitation and a clear trade-off:
Controlling Thinking Trace Language Comes at the Cost of Accuracy

📄Link: arxiv.org/abs/2505.22888


United States 趨勢

Loading...

Something went wrong.


Something went wrong.