EvalOps
@EvalOpsDev
Building the coordination layer for the cognitive economy
Got tired of customers asking 'how do I know your eval results are real?' Fair question. So we made them mathematically provable.
Every release is a high‑wire act. Instead of praying for calm winds, build a net. EvalOps ties your policies, metrics and audits into a mesh that lets you scale without falling.
We open-sourced Nimbus – Firecracker-based CI for AI workloads. Multi-tenant isolation, RBAC, audit logs.
EvalOps is where evaluations meet operations — and security is no exception. “keep” shows how device posture, SSO, and OPA policies can be continuously tested and traced like any other system. Run it, break it, measure it. github.com/evalops/keep
Agents are already writing your code. The question isn't "should we use them?" It's "how do we ship them without surprises?" Provenance gives you a ledger. Every line. Every agent. Every risk. Measurable. github.com/evalops/proven…
We’re open-sourcing Smith — the Firecracker-based CI runner that powers EvalOps. Why rebuild Blacksmith? Because eval gating needs specialized infra — and we’re not forcing you onto our cloud. Run evals on EvalOps Cloud or your own. github.com/evalops/smith
Everyone wants to move fast. @EvalOpsDev makes sure you don’t break trust along the way. Governed AI releases start here.
Shipped a new home for @EvalOpsDev. No fluff, just governed AI releases. Check it out -> evalops.dev
🔥 Just dropped an evaluation‑driven LoRA loop built on Tinker from @thinkymachines! It trains, benchmarks & iterates until your model meets the mark. It auto‑spots weaknesses, spawns targeted LoRA jobs & tracks improvements. Proof‑of‑concept repo: github.com/evalops/tinker…
Sick of yak-shaving to get a clean Transformers setup? We built a stack that just works: PyTorch + HF Transformers Hydra configs FastAPI serving + Prometheus vLLM, LoRA, flash-attn, bitsandbytes Reproducible. Dockerized. CI/CD baked in. github.com/evalops/stack
Developer resumes are frozen in time. GitHub tells the real story. 7k commits, +1.4M lines → now that’s a holographic trading card worth flexing. 🚀 cards.evalops.dev
cards.evalops.dev
DevEval - Holographic GitHub Cards
Generate stunning holographic trading cards from your GitHub profile. Evaluate your dev achievements ✨
LLM vendor: “Just quantization.” Reality: reward-hacked code, broken workflows, lost week. Companies: “nbd.” Users: 🙃🔥 Making this a thing of the past.
🐙 Meet Mocktopus — multi‑armed mocks for your LLM apps! 🧪 Deterministic mocks for OpenAI‑style chat completions, tool calls & streaming. Make your evals deterministic, run CI offline, and record & replay 👉 github.com/evalops/mockto…
New @EvalOpsDev research drop: Metaethical Breach DSPy ⚖️ Can models be coaxed into breaking their own guardrails if you wrap harmful prompts in moral philosophy? Repo: github.com/evalops/metaet…
Most eval tools tell you how your model did. @EvalOpsDev + DreamCoder tells you why. By synthesizing programs from data, DreamCoder finds hidden structure, invents evals, and surfaces failure grammars you didn’t think to test. Evals that discover the evals.
United States Trends
- 1. Northern Lights 41.3K posts
- 2. #Aurora 8,574 posts
- 3. #DWTS 51.9K posts
- 4. #RHOSLC 6,776 posts
- 5. Justin Edwards 2,314 posts
- 6. Sabonis 6,101 posts
- 7. Louisville 17.8K posts
- 8. #OlandriaxHarpersBazaar 5,413 posts
- 9. #GoAvsGo 1,537 posts
- 10. Creighton 2,195 posts
- 11. Eubanks N/A
- 12. Gonzaga 2,884 posts
- 13. H-1B 31.6K posts
- 14. Andy 60.1K posts
- 15. Lowe 12.6K posts
- 16. Oweh 2,098 posts
- 17. Jamal Murray N/A
- 18. Schroder N/A
- 19. JT Toppin N/A
- 20. Zach Lavine 2,597 posts
Something went wrong.
Something went wrong.