kylepmont's profile picture. PhD student at UC Santa Cruz

Kyle Montgomery

@kylepmont

PhD student at UC Santa Cruz

Kyle Montgomery đã đăng lại

Attending #EMNLP2025, Dr @hengjinlp is now giving the keynote! pls do dm me if you plan to do a #postdoc on #agenticai and #aisafety and are interested in working with me, we should chat! If you plan to pursue #phd in the space, also don’t hesitate to reach out (limited spots…

ChenguangWang's tweet image. Attending #EMNLP2025, Dr @hengjinlp is now giving the keynote! pls do dm me if you plan to do a #postdoc on #agenticai and #aisafety and are interested in working with me, we should chat! If you plan to pursue #phd in the space, also don’t hesitate to reach out (limited spots…

Thrilled to have been a part of this release — looking forward to what’s coming next with rLLM!

🚀 Introducing rLLM v0.2 - train arbitrary agentic programs with RL, with minimal code changes. Most RL training systems adopt the agent-environment abstraction. But what about complex workflows? Think solver-critique pairs collaborating, or planner agents orchestrating multiple…

rllm_project's tweet image. 🚀 Introducing rLLM v0.2 - train arbitrary agentic programs with RL, with minimal code changes.

Most RL training systems adopt the agent-environment abstraction. But what about complex workflows? Think solver-critique pairs collaborating, or planner agents orchestrating multiple…


Excited to share our work at #ICLR2025! JudgeBench ⚖️ tests the reliability of LLM-based judges with a focus on objective correctness. JudgeBench converts tough 🧠 datasets in knowledge, reasoning, math & code into labeled response pairs, forcing objective grading over vibes.…

kylepmont's tweet image. Excited to share our work at #ICLR2025! JudgeBench ⚖️ tests the reliability of LLM-based judges with a focus on objective correctness. JudgeBench converts tough 🧠 datasets in knowledge, reasoning, math & code into labeled response pairs, forcing objective grading over vibes.…

Kyle Montgomery đã đăng lại

Introducing JudgeBench – the ultimate benchmark designed to push LLM-based judges to their limits! 🚀 ❓Why do we need a new benchmark for LLM-based judges? As LLMs continues to evolve, their responses become more complex, demanding stronger judges to assess them accurately.…


United States Xu hướng

Loading...

Something went wrong.


Something went wrong.