LangWatchAI's profile picture. Open Source platform for LLM observability, evaluation and agent https://github.com/langwatch/langwatch

➡️ DSPy Optimizations
➡️ Scenario Agent Simulations

LangWatch

@LangWatchAI

Open Source platform for LLM observability, evaluation and agent https://github.com/langwatch/langwatch ➡️ DSPy Optimizations ➡️ Scenario Agent Simulations

In agent demos, everything’s smooth. In prod? You get messy inputs, long chains, weird edge cases — that’s when things snap. We treat agents like code → write scenario tests first, simulate full workflows, then iterate until green. Think TDD, but for LLMs. More on how we do it…


“Do I really need evals?” The real q: how do you know your AI agents will behave in prod? Prototypes don’t need them. Scaling products do. That’s why we built Agent Simulations; Unit tests for AI. The only way to know if you can ship reliably. OSS: github.com/langwatch/scen…


We’re hosting a Meetup in our office in Amsterdam on Sept 18 all about agentic AI. 👀 👀 👀 Talks from: • @_rchaves_ (CTO, LangWatch) → Beyond Unit Tests: why agent simulations are redefining AI agent testing. • Deepak Grewal (Kong) → Agentic AI -> powering the next wave…

LangWatchAI's tweet image. We’re hosting a Meetup in our office in Amsterdam on Sept 18 all about agentic AI. 👀 👀 👀 

Talks from:
• @_rchaves_   (CTO, LangWatch) → Beyond Unit Tests: why agent simulations are redefining AI agent testing.

• Deepak Grewal (Kong) → Agentic AI -> powering the next wave…

LangWatch reposted

In Amsterdam and want to spend an evening networking and learning all about agentic AI? Come to our @Meetup with @LangWatchAI on September 18th! RSVP to save your spot > bit.ly/3JyBwgY

kong's tweet image. In Amsterdam and want to spend an evening networking and learning all about agentic AI? Come to our @Meetup with @LangWatchAI on September 18th!

RSVP to save your spot > bit.ly/3JyBwgY

The gap between model release hype and production reality is always bigger than it looks. OpenAI’s new GPT-5 headlines focus on the measurable: fewer hallucinations, better reasoning, faster responses. All great gains. But the real story? How it works in your workflows, with…


Start tracing AI SDK 5 with LangWatch today: docs.langwatch.ai/integration/ty…

AI SDK 5 Introducing type-safe chat, agentic loop controls, data parts, speech generation and transcription, Zod 4 support, global provider, and raw request access.

aisdk's tweet image. AI SDK 5

Introducing type-safe chat, agentic loop controls, data parts, speech generation and transcription, Zod 4 support, global provider, and raw request access.


LangWatch reposted

We've won second place in the Power of Europe Hackathon in Amsterdam with this one ;) more on it soon!

First up - built with Kilo Code is an MCP tool that visually tests real applications with computer use and scenarios powered by @LangWatchAI

kilocode's tweet image. First up - built with Kilo Code is an MCP tool that visually tests real applications with computer use and scenarios powered by @LangWatchAI


LangWatch reposted

First up - built with Kilo Code is an MCP tool that visually tests real applications with computer use and scenarios powered by @LangWatchAI

kilocode's tweet image. First up - built with Kilo Code is an MCP tool that visually tests real applications with computer use and scenarios powered by @LangWatchAI

LangWatch reposted

Here is how to test Voice Agents, using Scenario simulations 👇

_rchaves_'s tweet image. Here is how to test Voice Agents, using Scenario simulations 👇

LangWatch reposted

notes on agent testing discussion with the team

_rchaves_'s tweet image. notes on agent testing discussion with the team

LangWatch reposted

always so satisfying to watch a DSPy optimization happening

_rchaves_'s tweet image. always so satisfying to watch a DSPy optimization happening

LangWatch reposted

First impressions of Grok 4 ✅ it passes all the Scenario agent simulation tests on the 13 different agent frameworks in create-agent-app ❌ probably because of the reasoning, but facing quite high latency using it as an agent 🤔 on our vibe coding test, the website it designs…

_rchaves_'s tweet image. First impressions of Grok 4

✅ it passes all the Scenario agent simulation tests on the 13 different agent frameworks in create-agent-app

❌ probably because of the reasoning, but facing quite high latency using it as an agent

🤔 on our vibe coding test, the website it designs…

Now you can ship AI agents faster with developer-first testing. LangWatch Scenario allows you to test your agents like you test your code. That’s because: ❌Manual testing doesn't scale. ❌"Vibe checking" isn't systematic. ❌Hope isn't a strategy. That's why we’re building…


United States Trends

Loading...

Something went wrong.


Something went wrong.