LangWatch
@LangWatchAI
Open Source platform for LLM observability, evaluation and agent https://github.com/langwatch/langwatch ➡️ DSPy Optimizations ➡️ Scenario Agent Simulations
In agent demos, everything’s smooth. In prod? You get messy inputs, long chains, weird edge cases — that’s when things snap. We treat agents like code → write scenario tests first, simulate full workflows, then iterate until green. Think TDD, but for LLMs. More on how we do it…
“Do I really need evals?” The real q: how do you know your AI agents will behave in prod? Prototypes don’t need them. Scaling products do. That’s why we built Agent Simulations; Unit tests for AI. The only way to know if you can ship reliably. OSS: github.com/langwatch/scen…
We’re hosting a Meetup in our office in Amsterdam on Sept 18 all about agentic AI. 👀 👀 👀 Talks from: • @_rchaves_ (CTO, LangWatch) → Beyond Unit Tests: why agent simulations are redefining AI agent testing. • Deepak Grewal (Kong) → Agentic AI -> powering the next wave…
In Amsterdam and want to spend an evening networking and learning all about agentic AI? Come to our @Meetup with @LangWatchAI on September 18th! RSVP to save your spot > bit.ly/3JyBwgY
The gap between model release hype and production reality is always bigger than it looks. OpenAI’s new GPT-5 headlines focus on the measurable: fewer hallucinations, better reasoning, faster responses. All great gains. But the real story? How it works in your workflows, with…
We've won second place in the Power of Europe Hackathon in Amsterdam with this one ;) more on it soon!
First up - built with Kilo Code is an MCP tool that visually tests real applications with computer use and scenarios powered by @LangWatchAI
First up - built with Kilo Code is an MCP tool that visually tests real applications with computer use and scenarios powered by @LangWatchAI
Here is how to test Voice Agents, using Scenario simulations 👇
notes on agent testing discussion with the team
always so satisfying to watch a DSPy optimization happening
First impressions of Grok 4 ✅ it passes all the Scenario agent simulation tests on the 13 different agent frameworks in create-agent-app ❌ probably because of the reasoning, but facing quite high latency using it as an agent 🤔 on our vibe coding test, the website it designs…
Now you can ship AI agents faster with developer-first testing. LangWatch Scenario allows you to test your agents like you test your code. That’s because: ❌Manual testing doesn't scale. ❌"Vibe checking" isn't systematic. ❌Hope isn't a strategy. That's why we’re building…
United States Trends
- 1. GTA 6 73.5K posts
- 2. #911onABC 5,520 posts
- 3. Raiders 35.1K posts
- 4. GTA VI 25.5K posts
- 5. Rockstar 59.3K posts
- 6. Antonio Brown 8,014 posts
- 7. eddie 31.5K posts
- 8. #WickedOneWonderfulNight N/A
- 9. #TNFonPrime 2,012 posts
- 10. UTSA 1,431 posts
- 11. Sidney Crosby 1,415 posts
- 12. #ShootingStar N/A
- 13. Ozempic 23.5K posts
- 14. Nancy Pelosi 144K posts
- 15. Buck 16.4K posts
- 16. #bandaids 5,606 posts
- 17. Thursday Night Football 4,093 posts
- 18. GTA 5 10K posts
- 19. Katy Perry 42K posts
- 20. $SENS $0.70 Senseonics CGM N/A
Something went wrong.
Something went wrong.