alephic2's profile picture. NLP geek with a PhD from @utcompsci, now @ AWS. I like generative modeling but not in an evil way I promise. Also at http://bsky.app/profile/bostromk.net He/him

Kaj Bostrom

@alephic2

NLP geek with a PhD from @utcompsci, now @ AWS. I like generative modeling but not in an evil way I promise. Also at http://bsky.app/profile/bostromk.net He/him

Kaj Bostrom reposted

rule 2182


Kaj Bostrom reposted

For inputs involving many steps, the operands for each step remain important until an identical depth. This indicates that the model is *not* breaking down the computation, solving subproblems, and composing their results together. 2/6

robert_csordas's tweet image. For inputs involving many steps, the operands for each step remain important until an identical depth. This indicates that the model is *not* breaking down the computation, solving subproblems, and composing their results together. 2/6

Definitely updated my mental model of CoT based on these results - give it a read, the paper delivers right off the bat and then keeps following up with more!

To CoT or not to CoT?🤔 300+ experiments with 14 LLMs & systematic meta-analysis of 100+ recent papers 🤯Direct answering is as good as CoT except for math and symbolic reasoning 🤯You don’t need CoT for 95% of MMLU! CoT mainly helps LLMs track and execute symbolic computation

ZayneSprague's tweet image. To CoT or not to CoT?🤔

300+ experiments with 14 LLMs & systematic meta-analysis of 100+ recent papers

🤯Direct answering is as good as CoT except for math and symbolic reasoning
🤯You don’t need CoT for 95% of MMLU!

CoT mainly helps LLMs track and execute symbolic computation
ZayneSprague's tweet image. To CoT or not to CoT?🤔

300+ experiments with 14 LLMs & systematic meta-analysis of 100+ recent papers

🤯Direct answering is as good as CoT except for math and symbolic reasoning
🤯You don’t need CoT for 95% of MMLU!

CoT mainly helps LLMs track and execute symbolic computation
ZayneSprague's tweet image. To CoT or not to CoT?🤔

300+ experiments with 14 LLMs & systematic meta-analysis of 100+ recent papers

🤯Direct answering is as good as CoT except for math and symbolic reasoning
🤯You don’t need CoT for 95% of MMLU!

CoT mainly helps LLMs track and execute symbolic computation


Kaj Bostrom reposted

🍓 still has a way to go for solving murder mysteries. We ran o1 on our dataset MuSR (ICLR ’24). It doesn’t beat Claude-3.5 Sonnet with CoT. MuSR requires a lot of commonsense reasoning and less math/logic (where 🍓 shines) MuSR is still a challenge! More to come soon 😎

ZayneSprague's tweet image. 🍓 still has a way to go for solving murder mysteries.

We ran o1 on our dataset MuSR (ICLR ’24). It doesn’t beat Claude-3.5 Sonnet with CoT. MuSR requires a lot of commonsense reasoning and less math/logic (where 🍓 shines)

MuSR is still a challenge! More to come soon 😎

Kaj Bostrom reposted

Super excited to bring ChatGPT Murder Mysteries to #ICLR2024 from our dataset MuSR as a spotlight presentation! A big shout-out goes to my coauthors, @xiye_nlp @alephic2 @swarat and @gregd_nlp See you all there 😀

GPT-4 can write murder mysteries that it can’t solve. 🕵️ We use GPT-4 to build a dataset, MuSR, to test the limits of LLMs’ textual reasoning abilities (commonsense, ToM, & more) 📃 arxiv.org/abs/2310.16049 🌐 zayne-sprague.github.io/MuSR/ w/ @xiye_nlp @alephic2 @swarat @gregd_nlp

ZayneSprague's tweet image. GPT-4 can write murder mysteries that it can’t solve. 🕵️

We use GPT-4 to build a dataset, MuSR, to test the limits of LLMs’ textual reasoning abilities (commonsense, ToM, & more)

📃  arxiv.org/abs/2310.16049
🌐  zayne-sprague.github.io/MuSR/

w/ @xiye_nlp @alephic2 @swarat @gregd_nlp


Kaj Bostrom reposted

After extensive training with various music generation neural networks and dedicating countless hours to prompting them, it's become even more evident to me that relying solely on text prompts as interface for music creation significantly limits the creative process.

Reducing all interfaces to text prompts is a failure of imagination and inhumane.

samim's tweet image. Reducing all interfaces to text prompts is a failure of imagination and inhumane.


Kaj Bostrom reposted

GPT-4 can write murder mysteries that it can’t solve. 🕵️ We use GPT-4 to build a dataset, MuSR, to test the limits of LLMs’ textual reasoning abilities (commonsense, ToM, & more) 📃 arxiv.org/abs/2310.16049 🌐 zayne-sprague.github.io/MuSR/ w/ @xiye_nlp @alephic2 @swarat @gregd_nlp

ZayneSprague's tweet image. GPT-4 can write murder mysteries that it can’t solve. 🕵️

We use GPT-4 to build a dataset, MuSR, to test the limits of LLMs’ textual reasoning abilities (commonsense, ToM, & more)

📃  arxiv.org/abs/2310.16049
🌐  zayne-sprague.github.io/MuSR/

w/ @xiye_nlp @alephic2 @swarat @gregd_nlp

Kaj Bostrom reposted

While demand for generative model training soars 📈, I think a new field is coalescing that’s focused on trying to make sense of generative models _once they’re already trained_: characterizing their behaviors, differences, and underlying mechanisms…so we wrote a paper about it!

universeinanegg's tweet image. While demand for generative model training soars 📈, I think a new field is coalescing that’s focused on trying to make sense of generative models _once they’re already trained_: characterizing their behaviors, differences, and underlying mechanisms…so we wrote a paper about it!

Kaj Bostrom reposted

LLMs are used for reasoning tasks in NL but lack explicit planning abilities. In arxiv.org/abs/2307.02472, we see if vector spaces can enable planning by choosing statements to combine to reach a conclusion. Joint w/ @alephic2 @swarat & @gregd_nlp NLRSE workshop at #ACL2023NLP

ZayneSprague's tweet image. LLMs are used for reasoning tasks in NL but lack explicit planning abilities. In arxiv.org/abs/2307.02472, we see if vector spaces can enable planning by choosing statements to combine to reach a conclusion. Joint w/ @alephic2 @swarat & @gregd_nlp NLRSE workshop at #ACL2023NLP

Kaj Bostrom reposted

📣Call for papers! The Natural Language Reasoning and Structured Explanations Workshop will be the first of its kind at ACL 2023, and the deadline for paper submissions is April 24. Learn more and submit here: nl-reasoning-workshop.github.io

nl-reasoning-workshop.github.io

Natural Language Reasoning and Structured Explanations Workshop

Natural Language Reasoning and Structured Explanations Workshop ---


Kaj Bostrom reposted

Three years in the making - our big review/position piece on the capabilities of large language models (LLMs) from the cognitive science perspective. Thread below! 1/ arxiv.org/abs/2301.06627


Loading...

Something went wrong.


Something went wrong.