Politehnica Bucharest NLP Group
@upb_nlp
The Natural Language Processing Group at the National University of Science and Technology Politehnica Bucharest @upb1818 #NLProc
Check out our paper "The Strawberry Problem: Emergence of Character-level Understanding in Tokenized Language Models", accepted at #EMNLP2025 main for more insights into why this happens in LLMs! 🍓 arxiv.org/abs/2505.14172
So dope
Looking for a change in wardrobe🧤? I have a new pair of GloVes for you! With meanings for 𝐜𝐡𝐚𝐭𝐠𝐩𝐭, 𝐫𝐢𝐳𝐳, 𝐜𝐨𝐯𝐢𝐝, 𝐛𝐫𝐚𝐢𝐧𝐫𝐨𝐭, 𝐜𝐫𝐲𝐩𝐭𝐨𝐜𝐮𝐫𝐫𝐞𝐧𝐜𝐲, 𝐱𝐠𝐛𝐨𝐨𝐬𝐭, and many more, I present to you 🥳 𝟐𝟎𝟐𝟓 𝐆𝐥𝐨𝐕𝐞!! 🥳 Trained on Wikipedia,…
We introduce CURIE, a scientific long-Context Understanding, Reasoning and Information Extraction benchmark to measure the potential of large language models in scientific problem-solving and assisting scientists in realistic workflows. Learn more at goo.gle/4jah5Ds
Uhh 🥹 finally
TL;DR: we are excited to release a powerful new open-weight language model with reasoning in the coming months, and we want to talk to devs about how to make it maximally useful: openai.com/open-model-fee… we are excited to make this a very, very good model! __ we are planning to…
Congrats for people at University Babes Bolyai for being featured in this! 👏👏 Amazing paper
9). Synthetic Data Generation Using LLMs Highlights benefits like cost and coverage, while addressing issues such as factual errors and bias, and suggests mitigations and future research in prompt automation and evaluation. arxiv.org/abs/2503.14023
Most famous NLP course of all 🎓👏
Our latest CS224N Natural Language Processing with Deep Learning lectures are now available! View the playlist here: youtube.com/playlist?list=…
Very interesting take
Very interesting finding from our group. LLM training (SFT) improves from having helpful relevant info in the context, even though training does not involve predicting those tokens. Paper includes some theory showing even exponential benefit of additional info.
Lots of reasoning datasets being released 👏
GeneralThought-195K: Diverse Reasoning Dataset - 195K reasoning traces from 7+ models - Expanded beyond math to sciences, humanities & conversations - Full reasoning traces with verification scores - MIT licensed with community contributions
Cool thing to take a look 👀
New Course on building reasoning models like Deepseek R1! It’s called The Reasoning Course and it's FREE and CERTIFIED. To sign up just follow the org. info in the thread
As our lab mates want to spend more time on social media😶🌫️ we also created an account on Bluesky 💙 follow us at bsky.app/profile/upb-nl…
Introducing "AI deadlines" An easy app for quickly seeing upcoming deadlines of top AI conferences like @NeurIPSConf, @CVPR and @iclr_conf Running at huggingface.co/spaces/hugging… and open-source at github.com/huggingface/ai…
Yeeeyy! Open-source & research ftw!
Hugging Face just entered the top 10 organizations on @github Close to 500,000 GitHub stars across our open-source libraries! Couldn't be more proud of what this 220-person team is accomplishing
Super excited for this!
After 6+ months in the making and burning over a year of GPU compute time, we're super excited to finally release the "Ultra-Scale Playbook" Check it out here: hf.co/spaces/nanotro… A free, open-source, book to learn everything about 5D parallelism, ZeRO, fast CUDA kernels,…
Can't wait for us to start this 🎓 Thx @Jthmas404, @ben_burtenshaw & @ThomasSimonini for this gem 👏
Piece of gold🥳 Now it's easy to support "AI has shaped the Educational landscape" 😂 with concrete citations, easily found in structured form
Found this incredible database on AI x K12 Education research from @NSSAccelerator You can also filter by study design, user, age, and more! scale.stanford.edu/genai/reposito… This will be making my lit reviews a bit easier :-)
United States Trends
- 1. #SmackDown 39.1K posts
- 2. #BUNCHITA N/A
- 3. Giulia 12.9K posts
- 4. #BostonBlue 4,039 posts
- 5. Caleb Wilson 4,859 posts
- 6. #OPLive 1,824 posts
- 7. Supreme Court 173K posts
- 8. Rockets 19.7K posts
- 9. Tulane 3,021 posts
- 10. #TheLastDriveIn 2,483 posts
- 11. Northwestern 4,436 posts
- 12. Podz 1,654 posts
- 13. Lash Legend 5,320 posts
- 14. Justice Jackson 3,623 posts
- 15. Chelsea Green 5,567 posts
- 16. NBA Cup 9,018 posts
- 17. Harrison Barnes N/A
- 18. Reed 23.8K posts
- 19. Justice Ketanji Brown Jackson 2,198 posts
- 20. SCOTUS 22.7K posts
Something went wrong.
Something went wrong.