upb_nlp's profile picture. The Natural Language Processing Group at the National University of Science and Technology Politehnica Bucharest @upb1818
#NLProc

Politehnica Bucharest NLP Group

@upb_nlp

The Natural Language Processing Group at the National University of Science and Technology Politehnica Bucharest @upb1818 #NLProc

Politehnica Bucharest NLP Group reposted

Check out our paper "The Strawberry Problem: Emergence of Character-level Understanding in Tokenized Language Models", accepted at #EMNLP2025 main for more insights into why this happens in LLMs! 🍓 arxiv.org/abs/2505.14172

Sora-2 counting the R's



So dope

Looking for a change in wardrobe🧤? I have a new pair of GloVes for you! With meanings for 𝐜𝐡𝐚𝐭𝐠𝐩𝐭, 𝐫𝐢𝐳𝐳, 𝐜𝐨𝐯𝐢𝐝, 𝐛𝐫𝐚𝐢𝐧𝐫𝐨𝐭, 𝐜𝐫𝐲𝐩𝐭𝐨𝐜𝐮𝐫𝐫𝐞𝐧𝐜𝐲, 𝐱𝐠𝐛𝐨𝐨𝐬𝐭, and many more, I present to you 🥳 𝟐𝟎𝟐𝟓 𝐆𝐥𝐨𝐕𝐞!! 🥳 Trained on Wikipedia,…



Politehnica Bucharest NLP Group reposted

We introduce CURIE, a scientific long-Context Understanding, Reasoning and Information Extraction benchmark to measure the potential of large language models in scientific problem-solving and assisting scientists in realistic workflows. Learn more at goo.gle/4jah5Ds

GoogleAI's tweet image. We introduce CURIE, a scientific long-Context Understanding, Reasoning and Information Extraction benchmark to measure the potential of large language models in scientific problem-solving and assisting scientists in realistic workflows. Learn more at goo.gle/4jah5Ds

Uhh 🥹 finally

TL;DR: we are excited to release a powerful new open-weight language model with reasoning in the coming months, and we want to talk to devs about how to make it maximally useful: openai.com/open-model-fee… we are excited to make this a very, very good model! __ we are planning to…



Congrats for people at University Babes Bolyai for being featured in this! 👏👏 Amazing paper

9). Synthetic Data Generation Using LLMs Highlights benefits like cost and coverage, while addressing issues such as factual errors and bias, and suggests mitigations and future research in prompt automation and evaluation. arxiv.org/abs/2503.14023



Most famous NLP course of all 🎓👏

Our latest CS224N Natural Language Processing with Deep Learning lectures are now available! View the playlist here: youtube.com/playlist?list=…



Very interesting take

Very interesting finding from our group. LLM training (SFT) improves from having helpful relevant info in the context, even though training does not involve predicting those tokens. Paper includes some theory showing even exponential benefit of additional info.



Lots of reasoning datasets being released 👏

GeneralThought-195K: Diverse Reasoning Dataset - 195K reasoning traces from 7+ models - Expanded beyond math to sciences, humanities & conversations - Full reasoning traces with verification scores - MIT licensed with community contributions

vanstriendaniel's tweet image. GeneralThought-195K: Diverse Reasoning Dataset

- 195K reasoning traces from 7+ models
- Expanded beyond math to sciences, humanities & conversations
- Full reasoning traces with verification scores
- MIT licensed with community contributions


Cool thing to take a look 👀

New Course on building reasoning models like Deepseek R1! It’s called The Reasoning Course and it's FREE and CERTIFIED. To sign up just follow the org. info in the thread

ben_burtenshaw's tweet image. New Course on building reasoning models like Deepseek R1!

It’s called The Reasoning Course and it's FREE and CERTIFIED. To sign up just follow the org. 

info in the thread


As our lab mates want to spend more time on social media😶‍🌫️ we also created an account on Bluesky 💙 follow us at bsky.app/profile/upb-nl…


Politehnica Bucharest NLP Group reposted

Introducing "AI deadlines" An easy app for quickly seeing upcoming deadlines of top AI conferences like @NeurIPSConf, @CVPR and @iclr_conf Running at huggingface.co/spaces/hugging… and open-source at github.com/huggingface/ai…


Yeeeyy! Open-source & research ftw!

Hugging Face just entered the top 10 organizations on @github Close to 500,000 GitHub stars across our open-source libraries! Couldn't be more proud of what this 220-person team is accomplishing

Thom_Wolf's tweet image. Hugging Face just entered the top 10 organizations on @github 

Close to 500,000 GitHub stars across our open-source libraries!

Couldn't be more proud of what this 220-person team is accomplishing


Super excited for this!

After 6+ months in the making and burning over a year of GPU compute time, we're super excited to finally release the "Ultra-Scale Playbook" Check it out here: hf.co/spaces/nanotro… A free, open-source, book to learn everything about 5D parallelism, ZeRO, fast CUDA kernels,…

Thom_Wolf's tweet image. After 6+ months in the making and burning over a year of GPU compute time, we're super excited to finally release the "Ultra-Scale Playbook"

Check it out here: hf.co/spaces/nanotro…

A free, open-source, book to learn everything about 5D parallelism, ZeRO, fast CUDA kernels,…


Can't wait for us to start this 🎓 Thx @Jthmas404, @ben_burtenshaw & @ThomasSimonini for this gem 👏

upb_nlp's tweet image. Can't wait for us to start this 🎓 Thx   @Jthmas404, @ben_burtenshaw & @ThomasSimonini for this gem 👏

Piece of gold🥳 Now it's easy to support "AI has shaped the Educational landscape" 😂 with concrete citations, easily found in structured form

Found this incredible database on AI x K12 Education research from @NSSAccelerator You can also filter by study design, user, age, and more! scale.stanford.edu/genai/reposito… This will be making my lit reviews a bit easier :-)

rose_e_wang's tweet image. Found this incredible database on AI x K12 Education research from @NSSAccelerator 

You can also filter by study design, user, age, and more! 
scale.stanford.edu/genai/reposito…

This will be making my lit reviews a bit easier :-)


United States Trends

Loading...

Something went wrong.


Something went wrong.