Kaj Bostrom

@alephic2

NLP geek with a PhD from @utcompsci, now @ AWS. I like generative modeling but not in an evil way I promise. Also at http://bsky.app/profile/bostromk.net He/him

New York, NY

bostromk.net

Joined December 2018

106Posts 362Followers 446Following

You might like

@tanyaagoyal

@ssgrn

@alexisjross

@jessyjli

@b_niranjan

@yanaiela

@anmarasovic

@TuhinChakr

@gregd_nlp

@xiye_nlp

@valentina__py

@JonathanBerant

@xiamengzhou

@prasann_singhal

@syz0x1

Kaj Bostrom reposted

Alex Mordvintsev

@zzznah

Sep 15

rule 2182

Kaj Bostrom reposted

For inputs involving many steps, the operands for each step remain important until an identical depth. This indicates that the model is *not* breaking down the computation, solving subproblems, and composing their results together. 2/6

robert_csordas's tweet image. For inputs involving many steps, the operands for each step remain important until an identical depth. This indicates that the model is *not* breaking down the computation, solving subproblems, and composing their results together. 2/6

Kaj Bostrom

@alephic2

Sep 19, 2024

Definitely updated my mental model of CoT based on these results - give it a read, the paper delivers right off the bat and then keeps following up with more!

Zayne Sprague

@ZayneSprague

Sep 19, 2024

To CoT or not to CoT?🤔 300+ experiments with 14 LLMs & systematic meta-analysis of 100+ recent papers 🤯Direct answering is as good as CoT except for math and symbolic reasoning 🤯You don’t need CoT for 95% of MMLU! CoT mainly helps LLMs track and execute symbolic computation

ZayneSprague's tweet image. To CoT or not to CoT?🤔

300+ experiments with 14 LLMs &amp; systematic meta-analysis of 100+ recent papers

🤯Direct answering is as good as CoT except for math and symbolic reasoning
🤯You don’t need CoT for 95% of MMLU!

CoT mainly helps LLMs track and execute symbolic computation

Kaj Bostrom reposted

Zayne Sprague

@ZayneSprague

Sep 13, 2024

🍓 still has a way to go for solving murder mysteries. We ran o1 on our dataset MuSR (ICLR ’24). It doesn’t beat Claude-3.5 Sonnet with CoT. MuSR requires a lot of commonsense reasoning and less math/logic (where 🍓 shines) MuSR is still a challenge! More to come soon 😎

ZayneSprague's tweet image. 🍓 still has a way to go for solving murder mysteries.

We ran o1 on our dataset MuSR (ICLR ’24). It doesn’t beat Claude-3.5 Sonnet with CoT. MuSR requires a lot of commonsense reasoning and less math/logic (where 🍓 shines)

MuSR is still a challenge! More to come soon 😎

Kaj Bostrom reposted

Zayne Sprague

@ZayneSprague

Jan 19, 2024

Super excited to bring ChatGPT Murder Mysteries to #ICLR2024 from our dataset MuSR as a spotlight presentation! A big shout-out goes to my coauthors, @xiye_nlp @alephic2 @swarat and @gregd_nlp See you all there 😀

Zayne Sprague

@ZayneSprague

Oct 25, 2023

GPT-4 can write murder mysteries that it can’t solve. 🕵️ We use GPT-4 to build a dataset, MuSR, to test the limits of LLMs’ textual reasoning abilities (commonsense, ToM, & more) 📃 arxiv.org/abs/2310.16049 🌐 zayne-sprague.github.io/MuSR/ w/ @xiye_nlp @alephic2 @swarat @gregd_nlp

ZayneSprague's tweet image. GPT-4 can write murder mysteries that it can’t solve. 🕵️

We use GPT-4 to build a dataset, MuSR, to test the limits of LLMs’ textual reasoning abilities (commonsense, ToM, &amp; more)

📃 arxiv.org/abs/2310.16049
🌐 zayne-sprague.github.io/MuSR/

w/ @xiye_nlp @alephic2 @swarat @gregd_nlp

Kaj Bostrom reposted

samim

@samim

Dec 22, 2023

After extensive training with various music generation neural networks and dedicating countless hours to prompting them, it's become even more evident to me that relying solely on text prompts as interface for music creation significantly limits the creative process.

samim

@samim

Apr 27, 2023

Reducing all interfaces to text prompts is a failure of imagination and inhumane.

Kaj Bostrom reposted

Zayne Sprague

@ZayneSprague

Oct 25, 2023

Kaj Bostrom reposted

Ari Holtzman

@universeinanegg

Jul 8, 2023

While demand for generative model training soars 📈, I think a new field is coalescing that’s focused on trying to make sense of generative models _once they’re already trained_: characterizing their behaviors, differences, and underlying mechanisms…so we wrote a paper about it!

universeinanegg's tweet image. While demand for generative model training soars 📈, I think a new field is coalescing that’s focused on trying to make sense of generative models _once they’re already trained_: characterizing their behaviors, differences, and underlying mechanisms…so we wrote a paper about it!

Kaj Bostrom reposted

Zayne Sprague

@ZayneSprague

Jul 8, 2023

LLMs are used for reasoning tasks in NL but lack explicit planning abilities. In arxiv.org/abs/2307.02472, we see if vector spaces can enable planning by choosing statements to combine to reach a conclusion. Joint w/ @alephic2 @swarat & @gregd_nlp NLRSE workshop at #ACL2023NLP

ZayneSprague's tweet image. LLMs are used for reasoning tasks in NL but lack explicit planning abilities. In arxiv.org/abs/2307.02472, we see if vector spaces can enable planning by choosing statements to combine to reach a conclusion. Joint w/ @alephic2 @swarat &amp; @gregd_nlp NLRSE workshop at #ACL2023NLP

Kaj Bostrom reposted

Ai2

@allen_ai

Apr 4, 2023

📣Call for papers! The Natural Language Reasoning and Structured Explanations Workshop will be the first of its kind at ACL 2023, and the deadline for paper submissions is April 24. Learn more and submit here: nl-reasoning-workshop.github.io

nl-reasoning-workshop.github.io

Natural Language Reasoning and Structured Explanations Workshop

Natural Language Reasoning and Structured Explanations Workshop ---

Source: nl-reasoning-workshop.github.io

Kaj Bostrom reposted

Anna Ivanova

@neuranna

Jan 18, 2023

Three years in the making - our big review/position piece on the capabilities of large language models (LLMs) from the cognitive science perspective. Thread below! 1/ arxiv.org/abs/2301.06627