Yotam Perlitz 👾

@yotamperlitz

Research Scientist at @ibmresearch, Practicing #NLProc, #RL. Opinions are my own.

perlitz.github.io

於一月 2015 加入

95貼文 88位跟隨者 161個跟隨中

你可能會喜歡

@ArielGera2

@GiliLior

@AsafYehudai

@EstherShizgal

@Shachar_Don

@uriberger88

@eladgranot

@nlphuji

@ArieCattan

@YoelShoshan

@iscol_meeting

@ben_bogin

@AvihuDkl

@lovodkin93

置頂

Yotam Perlitz 👾

@yotamperlitz

2024年9月16日

✨ Developed a new benchmark or dataset for language models? ✨ Want the community to trust and adopt it? 🤔 So, demonstrate its validity by comparing it to established benchmarks! BenchBench makes it easy. Check it out: 👉 huggingface.co/spaces/ibm/ben…

huggingface.co

BenchBench Leaderboad - a Hugging Face Space by ibm

來源: huggingface.co

Yotam Perlitz 👾

@yotamperlitz

年2月13日

What a GEM!

Sebastian Gehrmann

@sebgehr

年2月12日

GEM is so back! Our workshop for Generation, Evaluation, and Metrics is coming to an ACL near you. Evaluation in the world of GenAI is more important than ever, so please consider submitting your amazing work. CfP can be found at gem-benchmark.com/workshop

sebgehr's tweet image. GEM is so back! Our workshop for Generation, Evaluation, and Metrics is coming to an ACL near you.
Evaluation in the world of GenAI is more important than ever, so please consider submitting your amazing work.

CfP can be found at gem-benchmark.com/workshop

Yotam Perlitz 👾 已轉發

Elron Bandel

@ElronBandel

年1月21日

A big thank you to the Unitxt team, collaborators, and our community for an incredible 2024! Together, we pushed boundaries in AI evaluation and set new standards for the field. Read our End-of-Year Summary here: unitxt.ai/en/latest/blog… #unitxt #llmevaluation

Yotam Perlitz 👾 已轉發

LayerLens

@layerlens_ai

年12月13日

Another cool benchmarking paper published yesterday. In "JuStRank: Benchmarking LLM Judges for System Ranking", researchers from @IBMResearch introduced JuStRank, the first large-scale benchmark for evaluating LLM judges for ranking target systems: arxiv.org/abs/2412.09569

Yotam Perlitz 👾 已轉發

UKP Lab

@UKPLab

2024年11月4日

Prompting LMs is not enough to quantify their linguistic competence! Meet the Holmes🔎 benchmark at #EMNLP2024 #TACL or 👉🧵 💠Meta-study of current literature 💠Coverage of LMs and phenomena 💠Analysis of LM size, architecture, and instruction tuning holmes-benchmark.github.io

UKPLab's tweet image. Prompting LMs is not enough to quantify their linguistic competence! Meet the Holmes🔎 benchmark at #EMNLP2024 #TACL or 👉🧵

💠Meta-study of current literature
💠Coverage of LMs and phenomena
💠Analysis of LM size, architecture, and instruction tuning

holmes-benchmark.github.io

Yotam Perlitz 👾 已轉發

Philipp Schmid

@_philschmid

2024年11月1日

RAG Developer Attention! 🔔 Docling is a new library from @IBM that efficiently parses PDF, DOCX, and PPTX and exports them to Markdown and JSON. It supports advanced PDF understanding and seamless integration with @llama_index and @LangChainAI. TL;DR: 🗂️ Parses numerous…

_philschmid's tweet image. RAG Developer Attention! 🔔 Docling is a new library from @IBM that efficiently parses PDF, DOCX, and PPTX and exports them to Markdown and JSON. It supports advanced PDF understanding and seamless integration with @llama_index and @LangChainAI.

TL;DR:
🗂️ Parses numerous…

Yotam Perlitz 👾 已轉發

Elron Bandel

@ElronBandel

2024年11月1日

Can lowercase really make or break an AI’s answer? 🤔 And what happens when an LLM ‘sees’ old TV-style white noise? 📺 We put these quirks to the test using Unitxt, showcasing the powerful, flexible testing it enables. Read on: unitxt.ai/en/latest/blog… #AI #MachineLearning #LLM

Yotam Perlitz 👾 已轉發

Leshem (Legend) Choshen 🤖🤗 @NeurIPS

@LChoshen

2024年10月23日

Scaling laws predict🦣large models by training🦟small ones, cool right? Fortunately, they are not that complicated or costly at least they don't have to be We have collected 400+ models fitted 1000+ scaling laws and created 1 guide for cheap & more reliable scaling law fitting:

LChoshen's tweet image. Scaling laws predict🦣large models by training🦟small ones, cool right?
Fortunately, they are not that complicated or costly
at least they don't have to be

We have collected 400+ models
fitted 1000+ scaling laws
and created 1 guide
for cheap &amp; more reliable scaling law fitting:

Yotam Perlitz 👾 已轉發

Yufang Hou

@yufanghou

2024年9月19日

It's been a great collaboration journey with my wonderful co-authors: @AndreasWaldis, @yotamperlitz @LChoshen, and @IGurevych. Please check it out and let us know if you'd like to see any additional functions or analyses added to the benchmark.

Yotam Perlitz 👾

@yotamperlitz

2024年9月17日

Get your benchmark game on: huggingface.co/spaces/ibm/ben…

Yotam Perlitz 👾

@yotamperlitz

2024年9月16日

huggingface.co

BenchBench Leaderboad - a Hugging Face Space by ibm

來源: huggingface.co

Yotam Perlitz 👾 已轉發

Uri Berger

@uriberger88

2024年9月12日

1/ Into Image Captioning? Don’t miss this! Struggling to keep up with the influx of new metrics but still see the same 5 (BLEU, METEOR, ROUGE, CIDEr, SPICE) leading? Read our recent Captioning evaluation survey! arxiv.org/abs/2408.04909 w\ @GabiStanovsky @AbendOmri @leafrermann >

$uriberger88's tweet image. 1/ Into Image Captioning? Don’t miss this! Struggling to keep up with the influx of new metrics but still see the same 5 (BLEU, METEOR, ROUGE, CIDEr, SPICE) leading? Read our recent Captioning evaluation survey! arxiv.org/abs/2408.04909 w\ @GabiStanovsky @AbendOmri @leafrermann &gt;$