alan_ritter's profile picture. Computing professor at Georgia Tech - natural language processing, language models, machine learning

Alan Ritter

@alan_ritter

Computing professor at Georgia Tech - natural language processing, language models, machine learning

What if LLMs could predict their own accuracy on a new task before running a single experiment? We introduce PRECOG, built from real papers, to study description→performance forecasting. On both static and streaming tasks, GPT-5 beats human NLP researchers and simple baselines.

What if LLMs can forecast their own scores on unseen benchmarks from just a task description? We are the first to study text description→performance prediction, giving practitioners an early read on outcomes so they can plan what to build—before paying full price 💸

jungsoo___park's tweet image. What if LLMs can forecast their own scores on unseen benchmarks from just a task description?

We are the first to study text description→performance prediction, giving practitioners an early read on outcomes so they can plan what to build—before paying full price 💸


Had a great time visiting Sungkyunkwan University this week! Lots of interesting conversations, and very insightful questions from students. Thanks to @NoSyu for hosting!

We are honored to host @alan_ritter at SKKU for his talk: "Towards Cost-Efficient Use of Pre-trained Models" He explored cost-utility tradeoffs in LLM development, fine-tuning vs. preference optimization, which led to more efficient and scalable AI. Thanks a lot!

NoSyu's tweet image. We are honored to host @alan_ritter at SKKU for his talk:
"Towards Cost-Efficient Use of Pre-trained Models"

He explored cost-utility tradeoffs in LLM development, fine-tuning vs. preference optimization, which led to more efficient and scalable AI.

Thanks a lot!


Alan Ritter 님이 재게시함

We are honored to host @alan_ritter at SKKU for his talk: "Towards Cost-Efficient Use of Pre-trained Models" He explored cost-utility tradeoffs in LLM development, fine-tuning vs. preference optimization, which led to more efficient and scalable AI. Thanks a lot!

NoSyu's tweet image. We are honored to host @alan_ritter at SKKU for his talk:
"Towards Cost-Efficient Use of Pre-trained Models"

He explored cost-utility tradeoffs in LLM development, fine-tuning vs. preference optimization, which led to more efficient and scalable AI.

Thanks a lot!

Alan Ritter 님이 재게시함

For people attending @naaclmeeting, I created a quick script to generate ics files for all your presentations (or presentations of interest) that you can import into your Google Calendar or other calendar software: gist.github.com/neubig/b16376d…


Want to learn about Llama's pre-training? Mike Lewis will be giving a Keynote at NAACL 2025 in Albuquerque, NM on May 1. 2025.naacl.org @naaclmeeting

alan_ritter's tweet image. Want to learn about Llama's pre-training?  Mike Lewis will be giving a Keynote at NAACL 2025 in Albuquerque, NM on May 1.
2025.naacl.org
@naaclmeeting

Alan Ritter 님이 재게시함

🚨o3-mini vastly outperforms DeepSeek-R1 on an unseen probabilistic reasoning task! Introducing k-anonymity estimation: a novel task to assess privacy risks in sensitive texts Unlike conventional math and logical reasoning, this is difficult for both humans and AI models. 1/7

JonathanQZheng's tweet image. 🚨o3-mini vastly outperforms DeepSeek-R1 on an unseen probabilistic reasoning task!

Introducing k-anonymity estimation: a novel task to assess privacy risks in sensitive texts

Unlike conventional math and logical reasoning, this is difficult for both humans and AI models.

1/7
JonathanQZheng's tweet image. 🚨o3-mini vastly outperforms DeepSeek-R1 on an unseen probabilistic reasoning task!

Introducing k-anonymity estimation: a novel task to assess privacy risks in sensitive texts

Unlike conventional math and logical reasoning, this is difficult for both humans and AI models.

1/7
JonathanQZheng's tweet image. 🚨o3-mini vastly outperforms DeepSeek-R1 on an unseen probabilistic reasoning task!

Introducing k-anonymity estimation: a novel task to assess privacy risks in sensitive texts

Unlike conventional math and logical reasoning, this is difficult for both humans and AI models.

1/7

Very excited about this new work by @EthanMendes3 on self-imprving state value estimation for more efficient search without labels or rewards.

🚨New Paper: Better search for reasoning (e.g., web tasks) usually requires costly💰demos/rewards What if we only self-improve LLMs on state transitions—capturing a classic RL method in natural language? Spoiler: It works (⬆️39% over base model) & enables efficient search!🚀 🧵

EthanMendes3's tweet image. 🚨New Paper: Better search for reasoning (e.g., web tasks) usually requires costly💰demos/rewards

What if we only self-improve LLMs on state transitions—capturing a classic RL method in natural language?

Spoiler: It works (⬆️39% over base model) & enables efficient search!🚀
🧵


Alan Ritter 님이 재게시함

🚨New Paper: Better search for reasoning (e.g., web tasks) usually requires costly💰demos/rewards What if we only self-improve LLMs on state transitions—capturing a classic RL method in natural language? Spoiler: It works (⬆️39% over base model) & enables efficient search!🚀 🧵

EthanMendes3's tweet image. 🚨New Paper: Better search for reasoning (e.g., web tasks) usually requires costly💰demos/rewards

What if we only self-improve LLMs on state transitions—capturing a classic RL method in natural language?

Spoiler: It works (⬆️39% over base model) & enables efficient search!🚀
🧵

Alan Ritter 님이 재게시함

🚨 Just Out Can LLMs extract experimental data about themselves from scientific literature to improve understanding of their behavior? We propose a semi-automated approach for large-scale, continuously updatable meta-analysis to uncover intriguing behaviors in frontier LLMs. 🧵

jungsoo___park's tweet image. 🚨 Just Out

Can LLMs extract experimental data about themselves from scientific literature to improve understanding of their behavior?

We propose a semi-automated approach for large-scale, continuously updatable meta-analysis to uncover intriguing behaviors in frontier LLMs. 🧵

Check out @mohit_rag18's recent work analyzing data annotation costs associated with SFT vs. Preference Fine-Tuning.

🚨Just out Targeted data curation for SFT and RLHF is a significant cost factor 💰for improving LLM performance during post-training. How should you allocate your data annotation budgets between SFT and Preference Data? We ran 1000+ experiments to find out! 1/7

mohit_r9a's tweet image. 🚨Just out

Targeted data curation for SFT and RLHF is a significant cost factor 💰for improving LLM performance during post-training.

How should you allocate your data annotation budgets between SFT and Preference Data?

We ran 1000+ experiments to find out!

1/7


Alan Ritter 님이 재게시함

🚨Just out Targeted data curation for SFT and RLHF is a significant cost factor 💰for improving LLM performance during post-training. How should you allocate your data annotation budgets between SFT and Preference Data? We ran 1000+ experiments to find out! 1/7

mohit_r9a's tweet image. 🚨Just out

Targeted data curation for SFT and RLHF is a significant cost factor 💰for improving LLM performance during post-training.

How should you allocate your data annotation budgets between SFT and Preference Data?

We ran 1000+ experiments to find out!

1/7

Awesome talk yesterday by @jacobandreas! Interesting to hear how models can make better predictions by asking users the right questions, and update themselves based on implied information. @mlatgt

alan_ritter's tweet image. Awesome talk yesterday by @jacobandreas! 
Interesting to hear how models can make better predictions by asking users the right questions, and update themselves based on implied information.  @mlatgt

Alan Ritter 님이 재게시함

📢 Working on something exciting for NAACL? 🗓️ Remember the commitment deadline is December 16th! 2025.naacl.org/calls/papers/ #NLProc


Come see @EthanMendes3 talk on the surprising image geolocation capabilities of VLMs today at 2:45 in Flagler (down the escalator on the first floor). #emnlp2024 @mlatgt


Please take a moment to fill out the form below to volunteer as a reviewer or AC for NAACL!

📢 NAACL needs Reviewers & Area Chairs! 📝 If you haven't received an invite for ARR Oct 2024 & want to contribute, sign up by Oct 22nd! ➡️AC form: forms.office.com/r/8j6jXLfASt ➡️Reviewer form: forms.office.com/r/cjPNtL9gPE Please RT 🔁 and help spread the word! 🗣️ #NLProc @ReviewAcl



Alan Ritter 님이 재게시함

Great news! The @aclmeeting awarded IARPA HIATUS program performers two "Best Social Impact Paper Awards" and an "Outstanding Paper Award." Congratulations to the awardees! Access the winning papers below: aclanthology.org/2024.acl-long.… aclanthology.org/2024.acl-long.… aclanthology.org/2024.acl-short…

IARPAnews's tweet image. Great news! The @aclmeeting awarded IARPA HIATUS program performers two "Best Social Impact Paper Awards" and an "Outstanding Paper Award." Congratulations to the awardees! Access the winning papers below:
aclanthology.org/2024.acl-long.…
aclanthology.org/2024.acl-long.…
aclanthology.org/2024.acl-short…

Loading...

Something went wrong.


Something went wrong.