Adam Ibrahim

@ai_phd

Paris

adamibrahim.fr

Joined June 2019

74Posts 541Followers 426Following

You might like

@ShivaSujit

@RyanDOrazio

@gopeshh1

@m_amin_mansouri

@miguelSaaRuiz

@mhrnz_m

@SachMorin

@thevineetjain

@creus_roger

@dnllvy

@jackhtstanley

@CluelessAndrew

@MShakerinava

@nikitasaxena02

@reza_byt

Pinned

Adam Ibrahim

@ai_phd

May 28, 2024

Our tech report for Zamba-7B-v1 is out. We manage to come close to Llama 3 8B, Mistral 7B and others' level of performance, with only 1T tokens, with faster inference and less memory usage at a fixed context length. Read up to learn about our not-so-secret sauce!

Quentin Anthony

@QuentinAnthon15

May 28, 2024

Zyphra is dropping the tech report for Zamba-7B, along with: - Model weights (phase 1 and final annealed) at huggingface.co/Zyphra - Inference/generation code (both pure PyTorch and HuggingFace) at github.com/Zyphra/Zamba-t… and github.com/huggingface/tr… Tech report:…

QuentinAnthon15's tweet card. What does this PR do? Please include support for Zamba architecture created by Zyphra Technologies. Before submitting This PR fixes a typo or improves the docs (you can dismiss the other checks i...

Add Zamba by pglorio · Pull Request #30950 · huggingface/transformers

Source: github.com

Adam Ibrahim reposted

Rylan Schaeffer

@RylanSchaeffer

Jun 26

Another #ICML2025 paper! Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive? TLDR: Predicting language model performance with scale on multiple choice question-answer (MCQA) benchmarks is made difficult b/c ... 1/3

RylanSchaeffer's tweet image. Another #ICML2025 paper!

Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?

TLDR: Predicting language model performance with scale on multiple choice question-answer (MCQA) benchmarks is made difficult b/c ...

1/3

Adam Ibrahim reposted

Rylan Schaeffer

@RylanSchaeffer

Jul 9, 2024

Excited to announce our paper ⬇️ was selected as an **Outstanding** paper at @TiFA_ICML2024 🔥🔥🔥 What did the paper show? Let's try to summarize the paper in a single tweet!! 1/3

Rylan Schaeffer

@RylanSchaeffer

Jun 10, 2024

❤️‍🔥❤️‍🔥Excited to share our new paper ❤️‍🔥❤️‍🔥 **Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?** w/ @haileysch__ @BrandoHablando @gabemukobi @varunrmadan @herbiebradley @ai_phd @BlancheMinerva @sanmikoyejo arxiv.org/abs/2406.04391 1/N

RylanSchaeffer's tweet image. ❤️‍🔥❤️‍🔥Excited to share our new paper ❤️‍🔥❤️‍🔥

**Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?**

w/ @haileysch__ @BrandoHablando @gabemukobi @varunrmadan @herbiebradley @ai_phd @BlancheMinerva @sanmikoyejo

arxiv.org/abs/2406.04391

1/N

Adam Ibrahim reposted

Rylan Schaeffer

@RylanSchaeffer

Jun 10, 2024

Adam Ibrahim reposted

Timothée Lesort

@TLesort

Mar 15, 2024

Look at our preprint on Continual Learning for increasing the scalability of LLMs pretraining. A great piece of work led by @ai_phd @benjamintherien and @kshitijkgupta 🔥

Benjamin Thérien

@benjamintherien

Mar 14, 2024

Interested in seamlessly updating your #LLM on new datasets to avoid wasting previous efforts & compute, all while maintaining performance on past data? Excited to present Simple and Scalable Strategies to Continually Pre-train Large Language Models! 🧵arxiv.org/abs/2403.08763 1/N

benjamintherien's tweet image. Interested in seamlessly updating your #LLM on new datasets to avoid wasting previous efforts &amp; compute, all while maintaining performance on past data? Excited to present Simple and Scalable Strategies to Continually Pre-train Large Language Models! 🧵arxiv.org/abs/2403.08763 1/N

Adam Ibrahim

@ai_phd

Mar 14, 2024

Here is the full paper of the continual pretraining project I have been working on last year. I encourage you to check it out if you pretrain LLMs (in particular, I recommend to start with takeaways in Section 2 and the Table of Contents at the start of the appendix).

Benjamin Thérien

@benjamintherien

Mar 14, 2024

Adam Ibrahim reposted

AK

@_akhaliq

Mar 14, 2024

Simple and Scalable Strategies to Continually Pre-train Large Language Models Large language models (LLMs) are routinely pre-trained on billions of tokens, only to start the process over again once new data becomes available. A much more efficient solution is to continually

_akhaliq's tweet image. Simple and Scalable Strategies to Continually Pre-train Large Language Models

Large language models (LLMs) are routinely pre-trained on billions of tokens, only to start the process over again once new data becomes available. A much more efficient solution is to continually

Adam Ibrahim reposted

Aran Komatsuzaki

@arankomatsuzaki

Mar 14, 2024

Mila presents Simple and Scalable Strategies to Continually Pre-train Large Language Models Shows efficient updates to LLMs using simple strategies, achieving re-training results with less compute arxiv.org/abs/2403.08763

arankomatsuzaki's tweet image. Mila presents Simple and Scalable Strategies to Continually Pre-train Large Language Models

Shows efficient updates to LLMs using simple strategies, achieving re-training results with less compute

arxiv.org/abs/2403.08763

Adam Ibrahim reposted

Quentin Anthony

@QuentinAnthon15

Feb 3, 2024

State-space models (SSMs) like Mamba and mixture-of-experts (MoE) models like Mixtral both seek to reduce the computational cost to train/infer compared to transformers, while maintaining generation quality. Learn more in our paper: zyphra.com/blackmamba

QuentinAnthon15's tweet image. State-space models (SSMs) like Mamba and mixture-of-experts (MoE) models like Mixtral both seek to reduce the computational cost to train/infer compared to transformers, while maintaining generation quality.

Learn more in our paper: zyphra.com/blackmamba

Adam Ibrahim

@ai_phd

Dec 16, 2023

Looking forward to see you at the #NeurIPS2023 #NeurIPS23 ENLSP workshop (rooms 206-207), where we'll have a poster about this work at 16:15 !

Adam Ibrahim

@ai_phd

Sep 27, 2023

1 Ever wondered how to keep pretraining your LLM as new datasets continue to become available, instead of pretraining from scratch every time, wasting prior effort and compute ? A thread 🧵

Adam Ibrahim reposted

Irina Rish

@irinarish

Dec 13, 2023

Hi-NOLIN Hindi model will be presented by our @NolanoOrg team (@imtejas13 @_AyushKaushal) and collaborators from our CERC-AAI team (@kshitijkgupta @benjamintherien @ai_phd) at the #NeurIPS2023 this Fri, at this workshop: sites.google.com/mila.quebec/6t…

Adam Ibrahim reposted

Mats L. Richter

@M_L_Richter

Oct 3, 2023

Rarely been so excited about a paper. Our model has a quality level higher than Stable Diffusion 2.1 at a fraction (less than 12%) of the training cost, less than 20% of the carbon footprint, and it is twice as fast at inference too! That's what I call a leap forward.

Marc Aubreville

@MAubreville

Oct 2, 2023

Würstchen is a high-fidelity text2image model working at a fraction of the compute needed for StableDiffusion achieving similar/better results. Now the preprint to v2 is out. Thanks @M_L_Richter, @pabloppp, @dome_271, @chrisjpal for the great collab!

Irina Rish

@irinarish

Alex Hernandez-Garcia

@alexhdezgcia

Mahta 💻🧠

@Mahtaao

Ioannis Mitliagkas (Γιάννης Μητλιάγκας)

@bouzoukipunks

Shahab Bakhtiari

@ShahabBakht

Koustuv Sinha

@koustuvsinha

Mehrnaz Mofakhami

@mhrnz_m

Benno Krojer

@benno_krojer

Arnav Jain

@arnavkj95

Arna Ghosh

@arna_ghosh

Joseph Viviano

@josephdviviano

Timothée Lesort

@TLesort

Jason Hartford

@jasonhartford

Arnab

@ArnabMondal96

The Sai Krishna

@saikrishna_gvs

Nithya Nadig Shikarpur

@NithyaIsMe

$sarahookr's profile picture. Adaptive Intelligence. Built @Cohere_Labs, @GoogleBrain, @GoogleDeepmind. ML Efficiency, Multimodal\lingual. Changing spaces where breakthroughs happen.$