Ishan Gupta
@code_igx
25 🇮🇳, Hustler @RITtigers NY 🇺🇸 | RnD on Quantum AI, Superintelligence & Systems | Ex- @Broadcom @VMware
قد يعجبك
Recommender systems can improve by modeling users. TagCF uses an LLM to extract tag based logic graphs that reveal user roles and behavioral logic, then integrates them to boost ranking performance. Online and offline results show user role modeling can outperform item topic…
Google just released the SIMA2 paper on arXiv with Demis Hassabis’s name on it. SIMA 2: A Generalist Embodied Agent for Virtual Worlds Paper: arxiv.org/abs/2512.04797
Multimodal fusion is key to building AI that truly understands the world. But it’s still hard to find the right way to do it, partly because diffusion is dynamic while text is static. @AIatMeta and @AI_KAUST proposed MoS – Mixture of States, which fixes this mismatch by routing…
Qwen just won Best Paper Award at NeurIPS. And it wasn’t for a flashy new architecture. It was for fixing a problem Transformers had for years. Here’s what you need to know:
NeurIPS 2025 Best Paper Award: Attention lets language models decide which tokens matter at each position, but it has limitations—for example, a tendency to over-focus on early tokens regardless of their relevance. Gating mechanisms, which selectively suppress or amplify…
This interesting week started with DeepSeek V3.2! I just wrote up a technical tour of the predecessors and components that led up to this: 🔗 magazine.sebastianraschka.com/p/technical-de… - Multi-Head Latent Attention - RLVR - Sparse Attention - Self-Verification - GRPO Updates
Today at #NeurIPS2025, we present Titans, a new architecture that combines the speed of RNNs with the performance of Transformers. It uses deep neural memory to learn in real-time, effectively scaling to contexts larger than 2 million tokens. More at: goo.gle/3Kd5ojF
Twitter is cool. But it’s 10x better when you connect with people who like building and scaling GenAI systems. If you’re into LLMs, GenAI, Distributed Systems or backend. say hi.
Yup
Humanity has so thoroughly banished hunger that, as of this year, there are more obese kids than there are underweight kids.
Beautiful Tencent paper. Shows a language model that keeps improving itself using only 1% to 5% human labeled questions while reaching the level of systems trained on about 20 times more data. Earlier self play systems let a model write and solve its own questions, but over…
I have been fine-tuning LLMs for over 2 years now! Here are the top 5 LLM fine-tuning techniques, explained with visuals: First of all, what's so different about LLM finetuning? Traditional fine‑tuning is impractical for LLMs (billions of params; 100s GB). Since this kind of…
The paper behind DeepSeek-V3.2 Its high-compute Speciale version reaches gold medal level on top math and coding contests and competes with leading closed models. Standard attention makes the model compare every token with every other token, so compute explodes as inputs get…
Here’s to delaying gratification. The future belongs to the patient. @elonmusk
Interview with Nikhil
Can’t believe how human-like Tesla’s Optimus moves.
Running robot
.@elonmusk "One way to frame civilizational progress is the percentage completion on the Kardashev scale. Kardashev I is what percentage of a planet's energy are you successfully turning into useful work. Concept II would be, what percentage of the sun's energy are you…
Congrats @SpaceX team and thank you @USSpaceForce!
We’ve received approval to develop Space Launch Complex-37 for Starship operations at Cape Canaveral Space Force Station. Construction has started. With three launch pads in Florida, Starship will be ready to support America’s national security and Artemis goals as the world’s…
Test-time scaling of diffusions with flow maps This paper is pretty cool, providing a better to guide image generation with a reward function. The standard approach evaluates the reward function on intermediate steps to get a reward gradient to modify sampling. However the…
This Google's paper from last year came almost unnoticed by the public, but it's really an alternative architecture to the transformer that proves more parameter-efficient and effective on similar tasks. As you might know, Transformers scale quadratically with sequence length.…
United States الاتجاهات
- 1. #SmackDown 31K posts
- 2. Gunther 19.1K posts
- 3. North Texas 6,402 posts
- 4. Tulane 8,063 posts
- 5. LA Knight 10K posts
- 6. #ROHFinalBattle 13.6K posts
- 7. #OPLive 2,292 posts
- 8. Kennesaw State 3,042 posts
- 9. #TNAFinalResolution 4,858 posts
- 10. Mark Pope 3,015 posts
- 11. Wes Miller N/A
- 12. #BostonBlue 2,422 posts
- 13. UNLV 3,316 posts
- 14. Jalen Johnson 4,835 posts
- 15. Jimmy Rogers 2,330 posts
- 16. Trouba N/A
- 17. John Cena 30.8K posts
- 18. Cocona 15.8K posts
- 19. Troy 13.3K posts
- 20. SNME 6,200 posts
قد يعجبك
-
Prithvi Raj
@prithvi137 -
Daniel Walsh
@rhatdan -
OCI
@OCI_ORG -
Murat Demirbas (Distributolog)
@muratdemirbas -
Vedant Shrotria
@VedantShrotria -
Pete Cheslock
@petecheslock -
Bret Fisher
@BretFisher -
LitmusChaos | Chaos Engineering Made Easy
@LitmusChaos -
Uma Mukkara
@Uma_Mukkara -
Matt Hargett
@syke -
David Flanagan
@rawkode -
Kathy Zant
@kathyzant -
Felix Rieseberg
@felixrieseberg -
Mat Velloso
@matvelloso
Something went wrong.
Something went wrong.