LogicStar AI
@logic_star_ai
Building agentic Application Maintainance
Great dinner with @lovable_dev, @localstack, and @logic_star_ai AI (three “Lo*”s) in Zurich - discussing the future of agentic coding 🤖, cloud DevX 💻, and building delightful apps ✨. Can't wait for more collaborations and partnerships among the three “Lo*”s and beyond! 🚀
🚨 AI agents wrote 7% of all GitHub PRs in June. But can we trust their code? We built Agents in the Wild – a live dashboard tracking autonomous AI agents across GitHub to answer that question: insights.logicstar.ai Here’s what we learned from analyzing 10M+ PRs 👇 1/n 🧵
We are excited to see the community use our SWT-Bench and work on the crucial topic of test generation!
🚨 New SWT-Bench Submission! 🤖 Amazon Q Developer Agent leads the SWT-Bench leaderboard 🥇 with an impressive 49% of successfully tested issues and a coverage improvement of 57% on SWT-Bench Verified.
SOTA code agent OpenHands (top-1 for SWE-full) achieves only 22% accuracy in unit test generation on SWT-lite (half its SWE performance), only slightly outperforming SWE-agent. What is going on? We dug through the data to find a simple trick and achieve almost 30%! 👇🧵 1/9
SRI Lab at #NeurIPS2024 - 1/8 SWT-Bench: Testing and Validating Real-World Bug-Fixes with Code Agents Niels Mündler (@nielstron), Mark Niklas Mueller, Jingxuan He (@jingxuan_he), Martin Vechev (@mvechev) ⏰ /📍 Wed 11th, 11AM - 2PM, West Ballroom A-D #5406 📝 We explore software…
SRI Lab is proud to present 8 of our works on Privacy and AI Safety at #NeurIPS2024 this year (7 main conference, 1 workshop). Check out the overview below as well as individual posts for each. Looking forward to seeing you at the conference and come by to chat! Open for more ⬇️…
Exiting to see our work on benchmarking the test-generation capabilities of LLMs being picked up by the community!
Super cool work by @nielstron et al: SWT-Bench is SWE-bench for test generation! They give the model a repo and an issue and it has to write a test for the issue. They show that SWE-agent is able to write good tests for 19% of the issues in the benchmark! 🧵(1/3)
United States Trends
- 1. Doran 64.2K posts
- 2. #Worlds2025 105K posts
- 3. Good Sunday 59.6K posts
- 4. Faker 78.3K posts
- 5. #T1WIN 54.2K posts
- 6. #sundayvibes 4,330 posts
- 7. Guma 15K posts
- 8. Silver Scrapes 4,264 posts
- 9. #sundaymotivation 1,549 posts
- 10. O God 7,845 posts
- 11. Max B 1,488 posts
- 12. #T1fighting 5,446 posts
- 13. Oner 22.3K posts
- 14. Keria 26.5K posts
- 15. Pence 20.7K posts
- 16. Jeanna N/A
- 17. Option 2 4,661 posts
- 18. Blockchain 197K posts
- 19. Faye 59.6K posts
- 20. OutKast 26.1K posts
Something went wrong.
Something went wrong.