#fineweb search results

alby13

@alby13

Dec 26

I'm just laughing at these guys... "The Finest Collection of Data that The Web Has To Offer 🍷 #FineWeb

Yes!Online

@yesonline

Jul 11

» Eric Lam - 感謝律果科技陳啟桐律師及中央社黃兆徽董事的協助，已順利與中央社達成和解，以下是我的聲明各界朋友好：... | Facebook facebook.com/eric.lam.74467… #fineweb #dataset

Vlad Ruso PhD

@vlruso

Jun 3, 2024

HuggingFace Releases 🍷 FineWeb: A New Large-Scale (15-Trillion Tokens, 44TB Disk Space) Dataset for LLM Pretraining itinai.com/huggingface-re… #HuggingFace #FineWeb #LLMPretraining #AI #PracticalSolutions #ai #news #llm #ml #research #ainews #innovation #artificialintelligence …

vlruso's tweet image. HuggingFace Releases 🍷 FineWeb: A New Large-Scale (15-Trillion Tokens, 44TB Disk Space) Dataset for LLM Pretraining

itinai.com/huggingface-re…

#HuggingFace #FineWeb #LLMPretraining #AI #PracticalSolutions #ai #news #llm #ml #research #ainews #innovation #artificialintelligence …

Descubre cómo #FineWeb de @huggingface está redefiniendo la creación de conjuntos de datos de IA 🌐. Optimiza el entrenamiento, mejora la precisión y explora su impacto en la educación personalizada con #FineWebEdu 🎓.Más detalles aquí: t.ly/ithHN #IA

budynere's tweet image. Descubre cómo #FineWeb de @huggingface está redefiniendo la creación de conjuntos de datos de IA 🌐.
Optimiza el entrenamiento, mejora la precisión y explora su impacto en la educación personalizada con #FineWebEdu 🎓.Más detalles aquí: t.ly/ithHN #IA

FoundationModels

@ramanfr

Jul 2, 2024

#FineWeb from @huggingface is a great filtered dataset to learn and try to pre-tain foundation models from scratch

Multiplatform.AI

@MultiplatformAI

Jun 5, 2024

HuggingFace Unveils FineWeb: A Cutting-Edge Large-Scale Dataset for LLM Training #AI #artificialintelligence #FineWeb #HuggingFace #llm #machinelearning multiplatform.ai/huggingface-un…

MultiplatformAI's tweet image. HuggingFace Unveils FineWeb: A Cutting-Edge Large-Scale Dataset for LLM Training

#AI #artificialintelligence #FineWeb #HuggingFace #llm #machinelearning
multiplatform.ai/huggingface-un…

Fredy 💻🧠

@FredyRiveraai

Aug 12, 2024

a proporcionar los datasets que voy a usar que son #OpenWebText, #BookCorpus y Spanish Billion Words. En dado caso se pueda escalar y entrenar a más escala Zeus, estoy pensando usar #FineWeb. Pero igual eso en un futuro tal vez :b. Todos estos están disponibles en #Huggingface

FineWeb

@Fine_Web

Nov 19, 2012

Plein de services seront présents pour contribuer à la création de votre site ! #FineWeb

FineWeb

@Fine_Web

Nov 19, 2012

Vous revez de créer une grosse plateforme d'hebergement de fichiers ? C'est pour bientôt avec l'offre #EStock de #FineWeb !

LLM360

@llm360

May 26

🌟 BestOfWeb, is a highly refined subset of the TxT360 CC dataset! 📊 It undergoes filtration using the ProX document filtering model, which use quality signals similar to the FineWeb-Edu classifier, and also adds additional format signals. #DataQuality #WebData #FineWeb

M4RCA_TV

@M4RCA_TV

Feb 7, 2013

Fineweb.fr - #FineWeb | FineWeb.fr, Votre serveur virtuel à bas webwiki.fr/fineweb.fr

Woojin Kim

@woojinrad

Jun 4, 2024

🤗Terrific work! @huggingface introduced #FineWeb, a comprehensive dataset designed to enhance the training of #LLMs. It demonstrates improved performance through meticulous data curation and innovative filtering techniques.

Guilherme Penedo

@gui_penedo

Jun 2, 2024

We are (finally) releasing the 🍷 FineWeb technical report! In it, we detail and explain every processing decision we took, and we also introduce our newest dataset: 📚 FineWeb-Edu, a (web only) subset of FW filtered for high educational content. Link: hf.co/spaces/Hugging…

gui_penedo's tweet image. We are (finally) releasing the 🍷 FineWeb technical report!

In it, we detail and explain every processing decision we took, and we also introduce our newest dataset: 📚 FineWeb-Edu, a (web only) subset of FW filtered for high educational content.

Link: hf.co/spaces/Hugging…

Bony Bean

@bonybean

Jun 3, 2024

🚀 Exciting news in the world of language models! Hugging Face has just released FineWeb, a groundbreaking 15-trillion token dataset designed to enhance large language model pretraining. Dive into the details here: ift.tt/AZkmXpJ #HuggingFace #FineWeb #LLMPretraining

marktechpost.com

HuggingFace Releases 🍷 FineWeb: A New Large-Scale (15-Trillion Tokens, 44TB Disk Space) Dataset...

HuggingFace Releases 🍷 FineWeb: A New Large-Scale (15-Trillion Tokens, 44TB Disk Space) Dataset for LLM Pretraining

Source: marktechpost.com

Vinay Kumar Sahu

@vinay_ku_sahu

Jun 2, 2024

As dataset always the crucial aspect for any #LLMModel, getting quality dataset is a challenge. Internet is filled with garbage. So this particular #FineWeb pipeline is built on top of #CommonCrawl (open-source web-crawled dataset) huggingface.co/spaces/Hugging…

Bart de Witte

@OpenMedFuture

May 2, 2024

+ data alignment! -> Hugging Face's #FineWeb is a good step in the right direction, however we need much more data commons.

Saifullahi Habib

@Saifullahi78791

Mar 30, 2024

@isaiahthomas @AbbathiS #fineWeb 3.0 brawlstars? Pre-register now!!!! marketplace.affyn.com/ba-pre-registr… #Web3

Jesse Ben Israel

@jessebenisrael

Jun 2, 2024

Exciting news from FineWeb - they're revolutionizing text data collection on a large scale, making it easier to access high-quality information from the web. #FineWeb #TextData #Innovation ift.tt/j546ZFq