Daniel San
@dani_avila7
Building AI Tools with LLMs @aitmpl_com prev @codegptAI | Powered by TypeScript & Pumpkin Spice Lattes ☕️
You might like
Anyone tried Antigravity yet? Google’s new IDE? Downloaded it last week, still using VSCode… Need thoughts from anyone who switched from VSCode/Cursor 👇
Reviewing this paper made me realize how crucial the calibration dataset is for LLM-as-a-Judge You can’t trust your eval scores without measuring where your judge fails. That requires ground truth labels… which means human verification 🤷♂️ Link 👇 arxiv.org/abs/2511.21140
LLM as a judge has become a dominant way to evaluate how good a model is at solving a task, since it works without a test set and handles cases where answers are not unique. But despite how widely this is used, almost all reported results are highly biased. Excited to share our…
Claude Code 2.0.55 just dropped And the fuzzy matching improvement for @ file suggestions sounds really useful. Quick context: fuzzy matching is when you type part of a filename and it finds matches even if you skip characters. Like typing “usrctr” finds “UserController.ts”…
Claude Code team ships fast! 👏 Just discovered you can add custom instructions to /compact No idea how long this has been there but it's exactly what I needed. - Before: Claude would summarize the chat and sometimes miss critical details I wanted to keep. - Now: Just add…
Actually, this could roll out faster than expected. Implementation is straightforward: - Add a beta header - Two built-in tools (tool_search and code_execution) - Then opt-in your existing tools with simple flags (defer_loading, allowed_callers, input_examples) Done! 🤷🏽♂️
Solid updates. Dynamic tool discovery is a game changer for efficiency. Can't wait to see how it affects workflows.
I’ve been waiting for something like this from Vercel! 🔥 Fully open source visual workflow builder with AI capabilities The future of building workflows just got a lot more accessible for devs
We're releasing a visual agent & workflow builder ▪️ Fully open source ▪️ Built on useworkflow.dev ▪️ Outputs "𝚞𝚜𝚎 𝚠𝚘𝚛𝚔𝚏𝚕𝚘𝚠" code ▪️ Supports AI "text to workflow" ▪️ Powered by @aisdk & AI Elements ▪️ Sample integrations (@resend, @linear, @slack) Clone &…
Opus 4.5 dropped. 80.9% on SWE-bench Curious to see how it performs with proper skills and tooling setup 🤔
Claude Code questions keep coming in. Putting together a poll to see what topic you want me to break down next I’ll write Medium articles and X posts about whichever one wins 👇
Claude Code hooks confuse everyone at first (Save this post to review in detail later) I made this guide so you actually know which one to use and when. The hook system is incredibly powerful, but the docs don't really explain when to use each one. So I built this reference…
Used Nano Banana Pro to create visuals for some old Medium articles I wrote. Just generated this image for my AWS Bedrock observability post. Link: medium.com/@dan.avila7/ob… If you have old technical posts collecting dust, this might be worth checking out.
Interesting to see the model naturally learning to game the tests during training. Worth watching for practical insights on how to evaluate and work with the models we’ll be dealing with in the future.
New Anthropic research: Natural emergent misalignment from reward hacking in production RL. “Reward hacking” is where models learn to cheat on tasks they’re given during training. Our new study finds that the consequences of reward hacking, if unmitigated, can be very serious.
Just discovered MCP Elicitation and it's an elegant way to handle human-in-the-loop in MCP servers. Form Mode: - Server sends a JSON schema - Client renders a form - User fills it - Server gets validated data. Clean way to collect missing parameters or confirmations without…
Love this debate Model in isolation? Deterministic. Model in production? Different story. But now, your prompt shares GPU with hundreds of others. Different loads = different execution = variance, even at temp=0. This is why robust prompts matter more than “perfect” ones.
LLMs aren't deterministic Even at temperature=0, you get different outputs for the same prompt. Most think it's just "GPU randomness"... It's not. The real reason: your output changes based on how many other users are on the server. Different batch sizes = different results.…
Really appreciate Cloudflare’s transparency here. This is how you write a postmortem: They had a 3hr outage today caused by a database permissions change that doubled their Bot Management config file size, hitting a hardcoded limit in their Rust proxy. The config regenerated…
We let the Internet down today. Here’s our technical post mortem on what happened. On behalf of the entire @Cloudflare team, I’m sorry. blog.cloudflare.com/18-november-20…
United States Trends
- 1. Texas 157K posts
- 2. 3-8 Florida 2,108 posts
- 3. #JimmySeaFanconD1 262K posts
- 4. Austin Reaves 12.6K posts
- 5. #HookEm 10.6K posts
- 6. Sark 5,180 posts
- 7. Jeff Sims 1,680 posts
- 8. Aggies 9,355 posts
- 9. Arch 25.6K posts
- 10. #DonCheadleDay 1,216 posts
- 11. Georgia 49.7K posts
- 12. Arizona 32.7K posts
- 13. Life is 10% 2,680 posts
- 14. #LakeShow 3,675 posts
- 15. Marcel Reed 4,465 posts
- 16. Elko 3,047 posts
- 17. Katie Miller 2,542 posts
- 18. Banana Fish 9,823 posts
- 19. Sylus 95.7K posts
- 20. SEC Championship 5,318 posts
You might like
-
FlowiseAI
@FlowiseAI -
peter! 🥷
@pwang_szn -
PromptLayer
@promptlayer -
Lance Martin
@RLanceMartin -
Mendable
@mendableai -
Activeloop
@activeloopai -
Gradio
@Gradio -
Jerry Liu
@jerryjliu0 -
shawn
@shawnbuilds -
Jonathan Frankle
@jefrankle -
Alex Albert
@alexalbert__ -
Eric Ciarla (hiring)
@ericciarla -
Synthflow
@SynthflowAI -
Mayo Oshin
@mayowaoshin -
Boris Power
@BorisMPower
Something went wrong.
Something went wrong.