
Bạn có thể thích
Exploring Direct Tensor Manipulation in Language Models: A Case Study in Binary-Level Model Enhancement: areu01or00.github.io/Tensor-Slayer.…
Tweet sent from Android-Use. Model used : Gemini flash 2.0. Tokens used : 800
Experimental repo : github.com/areu01or00/And… Use models like 4o-mini/flash 2.0 to control your android devices. I had an old android and wanted to do something fun.
Scary number of startups built over the last 2-3 years are one dev day away from getting obsolete.
🎯 Milestone Unlocked I’m excited to share that I’ve completed the “Scratch to Scale: Large-Scale Training in the Modern World” course by @TheZachMueller on maven ! Scratch to Scale has been one of the most practical and insightful courses I’ve taken — it goes far beyond…

This is cool. Do read.
Agents for experimental research != agents for software development. This is a key lesson we've learned after several months refining agentic workflows! More takeaways on effectively using experimenter agents + a key tool we're open-sourcing to enable them: 🧵

(0) Scaling AI often lets you bypass engineering solutions to a problem. A bitter lesson! (1) It doesn’t let you bypass designing a careful problem specification. There’s no free lunch. (2) But scale can raise the level of abstraction at which you can define your problem. DSPy.
软硬协同加速落地:DeepSeek-V3.2-Exp携手TileLang,共启国产AI新周期 编者按:还记得我们上次一起分享的文章,DeepSeek带来的优化是国产AI与半导体协同的起点。这次有更多的东西开始落地。…



DeepSeek的UE8M0 FP8优化:国产AI与半导体协同的战略转折点 在人工智能训练和推理加速的竞赛中,浮点数(Floating…



The most cited paper of the 21st century is on deep residual learning with residual connections. Who invented this? Timeline: ★ 1991: @HochreiterSepp solves vanishing gradient problem through recurrent residual connections (weight 1.0) ★ 1997 LSTM: plain recurrent residual…

Task : Scale test time compute Prompt : Write a poem about Balrogs saving Morgoth from Ungoliant as if it was written by JRR Tolkien Result : Patched model being steered in desired direction, demonstrated below. --------------------------------------------------------- No…
Hacking model architecture with @DSPyOSS + GEPA. This method, in contrast to techniques like MIPROv2 and GEPA, modifies the architecture/paramters of the model to steer the model in right direction. For instance, see the classic "How many r's" question to a tiny model - Qwen3…
Hacking model architecture with @DSPyOSS + GEPA. This method, in contrast to techniques like MIPROv2 and GEPA, modifies the architecture/paramters of the model to steer the model in right direction. For instance, see the classic "How many r's" question to a tiny model - Qwen3…
Me digging myself out of veRL dependency hell : Don’t be difficult
Time to lock in and revamp for cohort 2 🫡 A few highlights: - Less speakers (more code focused workshops) - More TorchTitan (hands on with a framework for each implementation) - Analyzing torch profile traces (see our results) - My practical FP8 updates (how does one FP8 well)

^_^
oLLM: a lightweight Python library for LLM inference build on top of transformers 🔥 Run qwen3-next-80B, GPT-OSS, Llama3, on consumer hardware. Awesome work by Anuar!

United States Xu hướng
- 1. Chiefs 66.7K posts
- 2. #TNABoundForGlory 30.4K posts
- 3. LaPorta 8,020 posts
- 4. Goff 10.8K posts
- 5. Butker 7,259 posts
- 6. Kelce 11.3K posts
- 7. #OnePride 4,806 posts
- 8. Baker 48.4K posts
- 9. Bryce Miller 2,430 posts
- 10. #DETvsKC 3,007 posts
- 11. #SNFonNBC N/A
- 12. Collinsworth 1,645 posts
- 13. Dan Campbell 1,981 posts
- 14. #ALCS 7,623 posts
- 15. Gibbs 4,884 posts
- 16. Polanco 5,571 posts
- 17. Cal Raleigh 4,294 posts
- 18. Leon Slater 2,308 posts
- 19. Pacheco 4,237 posts
- 20. 49ers 42.7K posts
Something went wrong.
Something went wrong.