Yotam Perlitz 👾

@yotamperlitz

Research Scientist at @ibmresearch, Practicing #NLProc, #RL. Opinions are my own.

perlitz.github.io

เข้าร่วมเมื่อ มกราคม 2015

95โพสต์ 88ผู้ติดตาม 161กําลังติดตาม

คุณอาจชื่นชอบ

@ArielGera2

@GiliLior

@AsafYehudai

@EstherShizgal

@Shachar_Don

@uriberger88

@eladgranot

@nlphuji

@ArieCattan

@YoelShoshan

@iscol_meeting

@ben_bogin

@AvihuDkl

@lovodkin93

ปักหมุด

Yotam Perlitz 👾

@yotamperlitz

16 ก.ย. 2024

✨ Developed a new benchmark or dataset for language models? ✨ Want the community to trust and adopt it? 🤔 So, demonstrate its validity by comparing it to established benchmarks! BenchBench makes it easy. Check it out: 👉 huggingface.co/spaces/ibm/ben…

BenchBench Leaderboad - a Hugging Face Space by ibm

แหล่งที่มา: huggingface.co

Yotam Perlitz 👾

@yotamperlitz

13 ก.พ.

What a GEM!

Sebastian Gehrmann

@sebgehr

12 ก.พ.

GEM is so back! Our workshop for Generation, Evaluation, and Metrics is coming to an ACL near you. Evaluation in the world of GenAI is more important than ever, so please consider submitting your amazing work. CfP can be found at gem-benchmark.com/workshop

sebgehr's tweet image. GEM is so back! Our workshop for Generation, Evaluation, and Metrics is coming to an ACL near you.
Evaluation in the world of GenAI is more important than ever, so please consider submitting your amazing work.

CfP can be found at gem-benchmark.com/workshop

Yotam Perlitz 👾 รีโพสต์แล้ว

Elron Bandel

@ElronBandel

21 ม.ค.

A big thank you to the Unitxt team, collaborators, and our community for an incredible 2024! Together, we pushed boundaries in AI evaluation and set new standards for the field. Read our End-of-Year Summary here: unitxt.ai/en/latest/blog… #unitxt #llmevaluation

Yotam Perlitz 👾 รีโพสต์แล้ว

LayerLens

@layerlens_ai

13 ธ.ค.

Another cool benchmarking paper published yesterday. In "JuStRank: Benchmarking LLM Judges for System Ranking", researchers from @IBMResearch introduced JuStRank, the first large-scale benchmark for evaluating LLM judges for ranking target systems: arxiv.org/abs/2412.09569

Yotam Perlitz 👾 รีโพสต์แล้ว

UKP Lab

@UKPLab

4 พ.ย. 2024

Prompting LMs is not enough to quantify their linguistic competence! Meet the Holmes🔎 benchmark at #EMNLP2024 #TACL or 👉🧵 💠Meta-study of current literature 💠Coverage of LMs and phenomena 💠Analysis of LM size, architecture, and instruction tuning holmes-benchmark.github.io

UKPLab's tweet image. Prompting LMs is not enough to quantify their linguistic competence! Meet the Holmes🔎 benchmark at #EMNLP2024 #TACL or 👉🧵

💠Meta-study of current literature
💠Coverage of LMs and phenomena
💠Analysis of LM size, architecture, and instruction tuning

holmes-benchmark.github.io

Yotam Perlitz 👾 รีโพสต์แล้ว

Philipp Schmid

@_philschmid

1 พ.ย. 2024

RAG Developer Attention! 🔔 Docling is a new library from @IBM that efficiently parses PDF, DOCX, and PPTX and exports them to Markdown and JSON. It supports advanced PDF understanding and seamless integration with @llama_index and @LangChainAI. TL;DR: 🗂️ Parses numerous…

_philschmid's tweet image. RAG Developer Attention! 🔔 Docling is a new library from @IBM that efficiently parses PDF, DOCX, and PPTX and exports them to Markdown and JSON. It supports advanced PDF understanding and seamless integration with @llama_index and @LangChainAI.

TL;DR:
🗂️ Parses numerous…

Yotam Perlitz 👾 รีโพสต์แล้ว

Elron Bandel

@ElronBandel

1 พ.ย. 2024

Can lowercase really make or break an AI’s answer? 🤔 And what happens when an LLM ‘sees’ old TV-style white noise? 📺 We put these quirks to the test using Unitxt, showcasing the powerful, flexible testing it enables. Read on: unitxt.ai/en/latest/blog… #AI #MachineLearning #LLM

Yotam Perlitz 👾 รีโพสต์แล้ว

Leshem (Legend) Choshen 🤖🤗 @NeurIPS

@LChoshen

23 ต.ค. 2024

Scaling laws predict🦣large models by training🦟small ones, cool right? Fortunately, they are not that complicated or costly at least they don't have to be We have collected 400+ models fitted 1000+ scaling laws and created 1 guide for cheap & more reliable scaling law fitting:

LChoshen's tweet image. Scaling laws predict🦣large models by training🦟small ones, cool right?
Fortunately, they are not that complicated or costly
at least they don't have to be

We have collected 400+ models
fitted 1000+ scaling laws
and created 1 guide
for cheap &amp; more reliable scaling law fitting:

Yotam Perlitz 👾 รีโพสต์แล้ว

Yufang Hou

@yufanghou

19 ก.ย. 2024

It's been a great collaboration journey with my wonderful co-authors: @AndreasWaldis, @yotamperlitz @LChoshen, and @IGurevych. Please check it out and let us know if you'd like to see any additional functions or analyses added to the benchmark.

Yotam Perlitz 👾

@yotamperlitz

17 ก.ย. 2024

Get your benchmark game on: huggingface.co/spaces/ibm/ben…

Yotam Perlitz 👾

@yotamperlitz

16 ก.ย. 2024

BenchBench Leaderboad - a Hugging Face Space by ibm

แหล่งที่มา: huggingface.co

Yotam Perlitz 👾 รีโพสต์แล้ว

Uri Berger

@uriberger88

12 ก.ย. 2024

1/ Into Image Captioning? Don’t miss this! Struggling to keep up with the influx of new metrics but still see the same 5 (BLEU, METEOR, ROUGE, CIDEr, SPICE) leading? Read our recent Captioning evaluation survey! arxiv.org/abs/2408.04909 w\ @GabiStanovsky @AbendOmri @leafrermann >

$uriberger88's tweet image. 1/ Into Image Captioning? Don’t miss this! Struggling to keep up with the influx of new metrics but still see the same 5 (BLEU, METEOR, ROUGE, CIDEr, SPICE) leading? Read our recent Captioning evaluation survey! arxiv.org/abs/2408.04909 w\ @GabiStanovsky @AbendOmri @leafrermann &gt;$