Cometml's profile picture. Comet provides an end-to-end model evaluation platform for AI developers, with best in class LLM evaluations, experiment tracking, and production monitoring

Comet

@Cometml

Comet provides an end-to-end model evaluation platform for AI developers, with best in class LLM evaluations, experiment tracking, and production monitoring

For many teams, hallucinations and security concerns are top of mind when building agents. On Nov 20, join Sarah Ostermeier + the @awscloud team to learn practical ways to build reliable agents, then watch them build a customer support agent live. 🔗 luma.com/building-relia…

Cometml's tweet image. For many teams, hallucinations and security concerns are top of mind when building agents.

On Nov 20, join Sarah Ostermeier + the @awscloud team to learn practical ways to build reliable agents, then watch them build a customer support agent live.

🔗 luma.com/building-relia…

Opik just surpassed 15,000 stars on GitHub⭐ Guess we're not stopping anytime soon. 💻github.com/comet-ml/opik


Comet reposted

I've been working on agent optimization for real-world prompts (prompt is ~10k tokens) and our new algorithm is already up 17% ! Seeing some interesting differences between benchmarks and real-world performance, more to come soon

JacquesVerre's tweet image. I've been working on agent optimization for real-world prompts (prompt is ~10k tokens) and our new algorithm is already up 17% !

Seeing some interesting differences between benchmarks and real-world performance, more to come soon

Comet reposted

Do people really try to one-shot features with Claude Code ? I shipped Dark Mode for Opik in less than a day but it took no less than 3 iterations before getting to something that was ready to be merged. A thread on how I use Claude Code 🧵

JacquesVerre's tweet image. Do people really try to one-shot features with Claude Code ?

I shipped Dark Mode for Opik in less than a day but it took no less than 3 iterations before getting to something that was ready to be merged.

A thread on how I use Claude Code 🧵

Dark mode is here — easy to toggle and perfect for those late-night debugging sessions 🌙 Because great tools adapt to your preferences, not the other way around.


Comet reposted

Had a great convo with Gideon Mendels @Cometml CEO

Episode 3 of Cloudbreak is out! Tune in for a great conversation on all things AI w/ Gideon Mendels, co-founder and CEO of @Cometml and @yuvaln of Trilogy. YouTube: youtu.be/_HQoAYf0WkE?si… Or watch/listen on @Spotify, @ApplePodcasts and @amazonmusic!

TrilogyEquity's tweet card. The Comet Story: The Future of Model Observability and Evals

youtube.com

YouTube

The Comet Story: The Future of Model Observability and Evals



Our R&D team just wrapped up an incredible week in Rome 🇮🇹 Remote-first doesn't have to mean distant, when you are intentional about reconnecting as a team. Here’s some highlights of their week together!

Cometml's tweet image. Our R&D team just wrapped up an incredible week in Rome 🇮🇹 

Remote-first doesn't have to mean distant, when you are intentional about reconnecting as a team. Here’s some highlights of their week together!
Cometml's tweet image. Our R&D team just wrapped up an incredible week in Rome 🇮🇹 

Remote-first doesn't have to mean distant, when you are intentional about reconnecting as a team. Here’s some highlights of their week together!
Cometml's tweet image. Our R&D team just wrapped up an incredible week in Rome 🇮🇹 

Remote-first doesn't have to mean distant, when you are intentional about reconnecting as a team. Here’s some highlights of their week together!

Everyone’s building GenAI apps. Few are evaluating them well. On Sept 24, Claire Longo is running a live workshop on: → feedback loops for conversational agents → logging traces → LLM-as-a-judge metrics Definitely one you’ll want to check out 🔗 luma.com/uupy7jxr

Cometml's tweet image. Everyone’s building GenAI apps.
Few are evaluating them well.

On Sept 24, Claire Longo is running a live workshop on:
 → feedback loops for conversational agents
 → logging traces
 → LLM-as-a-judge metrics

Definitely one you’ll want to check out
🔗 luma.com/uupy7jxr

New Cometeers joined us this summer across the US, UK, and Greece ☀️ Beyond their impressive professional experience, they bring diverse passions: board games, golf, hiking, and professional soccer. Welcome aboard! 👋

Cometml's tweet image. New Cometeers joined us this summer across the US, UK, and Greece ☀️

Beyond their impressive professional experience, they bring diverse passions: board games, golf, hiking, and professional soccer.

Welcome aboard! 👋

Comet reposted

Today, we're building a CodeArena, where you can compare any two code-gen models side-by-side. Tech stack: - @LiteLLM for orchestration - @Cometml's Opik to build the eval pipeline - @OpenRouterAI to access cutting-edge models - @LightningAI for hosting CodeArena Let's go!🚀


The line between pretraining and fine-tuning is increasingly blurry, making “training” harder to define. In this piece, @anmorgan24 unpacks how shifting methods and terms complicate LLM behavior—and why pretraining remains key to scaling models responsibly.

The line between pretraining and fine-tuning is blurrier than ever. Here’s why that matters.🧵 (1/9)



Comet reposted

Before we dive in, here's a quick demo of what we're building! Tech stack: - @LiteLLM for orchestration - @Cometml's Opik to build the eval pipeline (open-source) - @OpenRouterAI to access the models You'll also learn about G-Eval & building custom eval metrics. Let's go! 🚀


AI coding tools are changing how we build 💡 @StatInStilettos built an AI app from scratch using Cursor — and shared a full breakdown of what worked, what didn’t, and why she thinks there’s a better alternative to “vibe coding.” Read the full breakdown 👇…


Comet reposted

🧠 Build autonomous AI agents that think, remember, and act. @akshay_pachaar’s new crash course walks through: ✅ Tool integration via MCP servers ✅ Memory with Zep’s Graphiti ✅ Tracing and observability with Comet’s Opik All orchestrated with CrewAI — and fully…


Comet reposted

A Crash Course on Building AI Agents! Here's what it covers: - What is an AI agent - Connecting agents to tools - Overview of MCP - Replacing tools with MCP servers - Setting up observability and tracing All with 100% open-source tools!


Loading...

Something went wrong.


Something went wrong.