Yotam Perlitz 👾
@yotamperlitz
Research Scientist at @ibmresearch, Practicing #NLProc, #RL. Opinions are my own.
คุณอาจชื่นชอบ
✨ Developed a new benchmark or dataset for language models? ✨ Want the community to trust and adopt it? 🤔 So, demonstrate its validity by comparing it to established benchmarks! BenchBench makes it easy. Check it out: 👉 huggingface.co/spaces/ibm/ben…
What a GEM!
GEM is so back! Our workshop for Generation, Evaluation, and Metrics is coming to an ACL near you. Evaluation in the world of GenAI is more important than ever, so please consider submitting your amazing work. CfP can be found at gem-benchmark.com/workshop
A big thank you to the Unitxt team, collaborators, and our community for an incredible 2024! Together, we pushed boundaries in AI evaluation and set new standards for the field. Read our End-of-Year Summary here: unitxt.ai/en/latest/blog… #unitxt #llmevaluation
Another cool benchmarking paper published yesterday. In "JuStRank: Benchmarking LLM Judges for System Ranking", researchers from @IBMResearch introduced JuStRank, the first large-scale benchmark for evaluating LLM judges for ranking target systems: arxiv.org/abs/2412.09569
Prompting LMs is not enough to quantify their linguistic competence! Meet the Holmes🔎 benchmark at #EMNLP2024 #TACL or 👉🧵 💠Meta-study of current literature 💠Coverage of LMs and phenomena 💠Analysis of LM size, architecture, and instruction tuning holmes-benchmark.github.io
RAG Developer Attention! 🔔 Docling is a new library from @IBM that efficiently parses PDF, DOCX, and PPTX and exports them to Markdown and JSON. It supports advanced PDF understanding and seamless integration with @llama_index and @LangChainAI. TL;DR: 🗂️ Parses numerous…
Can lowercase really make or break an AI’s answer? 🤔 And what happens when an LLM ‘sees’ old TV-style white noise? 📺 We put these quirks to the test using Unitxt, showcasing the powerful, flexible testing it enables. Read on: unitxt.ai/en/latest/blog… #AI #MachineLearning #LLM
Scaling laws predict🦣large models by training🦟small ones, cool right? Fortunately, they are not that complicated or costly at least they don't have to be We have collected 400+ models fitted 1000+ scaling laws and created 1 guide for cheap & more reliable scaling law fitting:
It's been a great collaboration journey with my wonderful co-authors: @AndreasWaldis, @yotamperlitz @LChoshen, and @IGurevych. Please check it out and let us know if you'd like to see any additional functions or analyses added to the benchmark.
Get your benchmark game on: huggingface.co/spaces/ibm/ben…
✨ Developed a new benchmark or dataset for language models? ✨ Want the community to trust and adopt it? 🤔 So, demonstrate its validity by comparing it to established benchmarks! BenchBench makes it easy. Check it out: 👉 huggingface.co/spaces/ibm/ben…
1/ Into Image Captioning? Don’t miss this! Struggling to keep up with the influx of new metrics but still see the same 5 (BLEU, METEOR, ROUGE, CIDEr, SPICE) leading? Read our recent Captioning evaluation survey! arxiv.org/abs/2408.04909 w\ @GabiStanovsky @AbendOmri @leafrermann >
Me trying to choose the right LLM benchmark without BenchBench: x.com/yotamperlitz/s…
✨ Developed a new benchmark or dataset for language models? ✨ Want the community to trust and adopt it? 🤔 So, demonstrate its validity by comparing it to established benchmarks! BenchBench makes it easy. Check it out: 👉 huggingface.co/spaces/ibm/ben…
Shoutout to @streamlit, our framework of choice! Shoutout to @huggingface for hosting our space 🤗
@karpathy only trusts @lmsysorg chatbot arena! x.com/karpathy/statu… Hold on! What if you don't have an army of annotators at your disposal? 🤔
United States เทรนด์
- 1. Lando 192K posts
- 2. Pearl Harbor 17.9K posts
- 3. #AbuDhabiGP 243K posts
- 4. McLaren 73K posts
- 5. Oscar 159K posts
- 6. Yuki 133K posts
- 7. Charles 106K posts
- 8. Good Sunday 69.6K posts
- 9. Zak Brown 5,131 posts
- 10. Verstappen 156K posts
- 11. #Formula1 8,101 posts
- 12. World Champion 76.8K posts
- 13. #sundayvibes 5,278 posts
- 14. #AskFFT N/A
- 15. Duke 64.5K posts
- 16. Tulane 18.7K posts
- 17. Walt 10.3K posts
- 18. Muhammad Qasim 28.5K posts
- 19. Antonelli 13.5K posts
- 20. #SundayMotivation 1,714 posts
คุณอาจชื่นชอบ
-
Ariel Gera
@ArielGera2 -
Gili Lior
@GiliLior -
Asaf Yehudai
@AsafYehudai -
Esther Shizgal
@EstherShizgal -
Shachar Don-Yehiya
@Shachar_Don -
Uri Berger
@uriberger88 -
Elad Granot
@eladgranot -
HUJI NLP
@nlphuji -
Arie Cattan
@ArieCattan -
Yoel Shoshan
@YoelShoshan -
ISCOL 2025
@iscol_meeting -
Ben Bogin
@ben_bogin -
Avihu Dekel
@AvihuDkl -
Aviv Slobodkin @NeurIPS
@lovodkin93
Something went wrong.
Something went wrong.