✍️new blog post: on the consumption of AI-generated content at scale
Why Your AI Music Prompts Aren’t Working (And What To Do Instead) What I learned trying to make an album inspired by the @aiDotEngineer code conference @sunomusic bit.ly/ai-music-promp…
After repeating myself for the nth time on how to build product evals, I figured I should write it down. It's just three basic steps(i) labeling a small dataset, (ii) aligning LLM evaluators, and (iii) running the eval harness with each config change. eugeneyan.com/writing/produc…
Do you love Claude's plan-mode question asker and wish you could bring it with you everywhere? Add `AskUserQuestion` to allowed-tools in a .claude/command then explicitly tell Claude to use it. > Use the AskUserQuestion tool to ask the user... Here's me using it for a PR…
Six months ago I was but a test prompt. Today, I can file your taxes. deduction.com.
AI can code, why can't it do your taxes? Introducing: deduction.com.
A good friend and colleague told me at the start of building in AI, that a true agent is ⚡ 'lightning in a bottle'. And right now we have lightning. ↓ True human and agent collaboration. We can't wait to introduce a new way of consumer accounting very soon.
Scenarios by @LangWatchAI is saving my life while evaluating #AI multi-turn conversations 🙌
SpecFlow changed how I build with AI agents. Huge thanks to the @specstoryai team, @isaac_flath, and @intellectronica for introducing me to this game-changing workflow. 🚀 specflow.com/getting-starte…
When you deploy an LLM-as-a-Judge, you’re shipping a classifier into production. Each new version is a hypothesis about how the model interprets the world. It’s data science, just expressed in natural language. Here’s what that looked like for a recent client project where we…
🚀 DeepSeek-OCR — the new frontier of OCR from @deepseek_ai , exploring optical context compression for LLMs, is running blazingly fast on vLLM ⚡ (~2500 tokens/s on A100-40G) — powered by vllm==0.8.5 for day-0 model support. 🧠 Compresses visual contexts up to 20× while keeping…
In an AI world, it’s easy to avoid effort. That’s why students need teachers more—to push them toward the hard things now that shape who they become later. #Education #AI #TeachingMatters #FutureOfLearning
Can #GPT5 actually do taxes? We ran it on @ColumnTax’s TaxCalcBench. Full return: 30.4% strict ✅ | 53.4% lenient 🤔 Line items: 80.6% strict | 85.4% lenient 📊 Line accuracy is strong. Whole-return accuracy? Not IRS-ready yet. github.com/column-tax/tax… #TaxCalcBench #AI #tax
The most useful bit of my system prompt is this If I provide any feedback on how to improve something, suggest improvements to my prompt that I can make to avoid similar mistakes in the future. Put any prompt improvement suggestions in separate <prompt-improvement> tags.
Can't say enough good things about the AI evals course run by @sh_reya and @HamelHusain! It is informed by real production work across dozens of clients. The opportunities and challenges resonate with my experience evaluating & deploying production AI products.
2023 vs. 2024 2023: Vector search is all you need 2024: Evaluate vector/hybrid search against BM25 baseline 2023: „Look, this prompt works!“ 2024: Prompt optimization with DSPy 2023: … 2024: Evals with AI-as-a-judge We‘ve come a long way, but we’re still so early.
Getting employees to work hard and deliver really isn't a matter of mandating work-from-office and long hours. It's a matter of incentives and ownership. People do their best when they work on interesting problems, in a self-directed manner, and get rewarded for success. This…
Insecure leaders ridicule others. Secure leaders laugh at themselves. The ability to make fun of yourself opens the door to candor. It’s a mark of humility and a catalyst for learning. Great leaders take their work seriously, but they don't take themselves too seriously.
I worked this demo and found it super helpful in understanding multi-agent architecture. Especially liked the concierge and the continuation agent; I thought they made the experience more fluid for the end user. Thanks, @seldo !
This is one of the cleanest implementations of a complex multi-agent system that I've seen. Props to @seldo. All the "multi-agent" code can be implemented in a single Python class as a set of decomposable steps. You get ✅ all the benefits of an event-driven architecture (high…
United States 趋势
- 1. Spotify 1.23M posts
- 2. Chris Paul 37K posts
- 3. Clippers 50.8K posts
- 4. #WhyIChime N/A
- 5. Ty Lue 4,199 posts
- 6. Hartline 12.5K posts
- 7. Henry Cuellar 9,185 posts
- 8. ethan hawke 5,740 posts
- 9. #HappyBirthdayJin 114K posts
- 10. Apple Music 242K posts
- 11. David Corenswet 9,500 posts
- 12. GreetEat Corp 1,182 posts
- 13. Jonathan Bailey 10.6K posts
- 14. SNAP 170K posts
- 15. #NSD26 28.4K posts
- 16. Lawrence Frank 3,158 posts
- 17. Giannis 24.9K posts
- 18. South Florida 7,551 posts
- 19. Chris Henry 3,072 posts
- 20. Duncan 8,061 posts
Something went wrong.
Something went wrong.