Yotam Perlitz 👾
@yotamperlitz
Research Scientist at @ibmresearch, Practicing #NLProc, #RL. Opinions are my own.
你可能會喜歡
✨ Developed a new benchmark or dataset for language models? ✨ Want the community to trust and adopt it? 🤔 So, demonstrate its validity by comparing it to established benchmarks! BenchBench makes it easy. Check it out: 👉 huggingface.co/spaces/ibm/ben…
huggingface.co
BenchBench Leaderboad - a Hugging Face Space by ibm
BenchBench Leaderboad - a Hugging Face Space by ibm
What a GEM!
GEM is so back! Our workshop for Generation, Evaluation, and Metrics is coming to an ACL near you. Evaluation in the world of GenAI is more important than ever, so please consider submitting your amazing work. CfP can be found at gem-benchmark.com/workshop
A big thank you to the Unitxt team, collaborators, and our community for an incredible 2024! Together, we pushed boundaries in AI evaluation and set new standards for the field. Read our End-of-Year Summary here: unitxt.ai/en/latest/blog… #unitxt #llmevaluation
Another cool benchmarking paper published yesterday. In "JuStRank: Benchmarking LLM Judges for System Ranking", researchers from @IBMResearch introduced JuStRank, the first large-scale benchmark for evaluating LLM judges for ranking target systems: arxiv.org/abs/2412.09569
Prompting LMs is not enough to quantify their linguistic competence! Meet the Holmes🔎 benchmark at #EMNLP2024 #TACL or 👉🧵 💠Meta-study of current literature 💠Coverage of LMs and phenomena 💠Analysis of LM size, architecture, and instruction tuning holmes-benchmark.github.io
RAG Developer Attention! 🔔 Docling is a new library from @IBM that efficiently parses PDF, DOCX, and PPTX and exports them to Markdown and JSON. It supports advanced PDF understanding and seamless integration with @llama_index and @LangChainAI. TL;DR: 🗂️ Parses numerous…
Can lowercase really make or break an AI’s answer? 🤔 And what happens when an LLM ‘sees’ old TV-style white noise? 📺 We put these quirks to the test using Unitxt, showcasing the powerful, flexible testing it enables. Read on: unitxt.ai/en/latest/blog… #AI #MachineLearning #LLM
Scaling laws predict🦣large models by training🦟small ones, cool right? Fortunately, they are not that complicated or costly at least they don't have to be We have collected 400+ models fitted 1000+ scaling laws and created 1 guide for cheap & more reliable scaling law fitting:
It's been a great collaboration journey with my wonderful co-authors: @AndreasWaldis, @yotamperlitz @LChoshen, and @IGurevych. Please check it out and let us know if you'd like to see any additional functions or analyses added to the benchmark.
Get your benchmark game on: huggingface.co/spaces/ibm/ben…
✨ Developed a new benchmark or dataset for language models? ✨ Want the community to trust and adopt it? 🤔 So, demonstrate its validity by comparing it to established benchmarks! BenchBench makes it easy. Check it out: 👉 huggingface.co/spaces/ibm/ben…
huggingface.co
BenchBench Leaderboad - a Hugging Face Space by ibm
BenchBench Leaderboad - a Hugging Face Space by ibm
1/ Into Image Captioning? Don’t miss this! Struggling to keep up with the influx of new metrics but still see the same 5 (BLEU, METEOR, ROUGE, CIDEr, SPICE) leading? Read our recent Captioning evaluation survey! arxiv.org/abs/2408.04909 w\ @GabiStanovsky @AbendOmri @leafrermann >
Me trying to choose the right LLM benchmark without BenchBench: x.com/yotamperlitz/s…
✨ Developed a new benchmark or dataset for language models? ✨ Want the community to trust and adopt it? 🤔 So, demonstrate its validity by comparing it to established benchmarks! BenchBench makes it easy. Check it out: 👉 huggingface.co/spaces/ibm/ben…
huggingface.co
BenchBench Leaderboad - a Hugging Face Space by ibm
BenchBench Leaderboad - a Hugging Face Space by ibm
Shoutout to @streamlit, our framework of choice! Shoutout to @huggingface for hosting our space 🤗
@karpathy only trusts @lmsysorg chatbot arena! x.com/karpathy/statu… Hold on! What if you don't have an army of annotators at your disposal? 🤔
United States 趨勢
- 1. Jalen 37.8K posts
- 2. Eagles 76.8K posts
- 3. Eagles 76.8K posts
- 4. #WWERaw 39.6K posts
- 5. AJ Brown 4,782 posts
- 6. Saquon 8,761 posts
- 7. LA Knight 7,579 posts
- 8. #PHIvsLAC 2,469 posts
- 9. Derik Queen 3,210 posts
- 10. Greg Roman N/A
- 11. Covey 1,138 posts
- 12. Bozeman 3,541 posts
- 13. Patullo 4,267 posts
- 14. Tanner McKee 1,053 posts
- 15. Philip Rivers 14.4K posts
- 16. Raquel 7,926 posts
- 17. Nakobe Dean 1,478 posts
- 18. #BoltUp 2,951 posts
- 19. #MondayNightFootball 1,047 posts
- 20. Sirianni 3,235 posts
你可能會喜歡
-
Ariel Gera
@ArielGera2 -
Gili Lior
@GiliLior -
Asaf Yehudai
@AsafYehudai -
Esther Shizgal
@EstherShizgal -
Shachar Don-Yehiya
@Shachar_Don -
Uri Berger
@uriberger88 -
Elad Granot
@eladgranot -
HUJI NLP
@nlphuji -
Arie Cattan
@ArieCattan -
Yoel Shoshan
@YoelShoshan -
ISCOL 2025
@iscol_meeting -
Ben Bogin
@ben_bogin -
Avihu Dekel
@AvihuDkl -
Aviv Slobodkin @NeurIPS
@lovodkin93
Something went wrong.
Something went wrong.