Ishan Gupta
@code_igx
25 🇮🇳, Hustler @RITtigers NY 🇺🇸 | RnD on Quantum AI, Superintelligence & Systems | Ex- @Broadcom @VMware
Potrebbero piacerti
Qwen just won Best Paper Award at NeurIPS. And it wasn’t for a flashy new architecture. It was for fixing a problem Transformers had for years. Here’s what you need to know:
NeurIPS 2025 Best Paper Award: Attention lets language models decide which tokens matter at each position, but it has limitations—for example, a tendency to over-focus on early tokens regardless of their relevance. Gating mechanisms, which selectively suppress or amplify…
This interesting week started with DeepSeek V3.2! I just wrote up a technical tour of the predecessors and components that led up to this: 🔗 magazine.sebastianraschka.com/p/technical-de… - Multi-Head Latent Attention - RLVR - Sparse Attention - Self-Verification - GRPO Updates
Today at #NeurIPS2025, we present Titans, a new architecture that combines the speed of RNNs with the performance of Transformers. It uses deep neural memory to learn in real-time, effectively scaling to contexts larger than 2 million tokens. More at: goo.gle/3Kd5ojF
Twitter is cool. But it’s 10x better when you connect with people who like building and scaling GenAI systems. If you’re into LLMs, GenAI, Distributed Systems or backend. say hi.
Yup
Humanity has so thoroughly banished hunger that, as of this year, there are more obese kids than there are underweight kids.
Beautiful Tencent paper. Shows a language model that keeps improving itself using only 1% to 5% human labeled questions while reaching the level of systems trained on about 20 times more data. Earlier self play systems let a model write and solve its own questions, but over…
I have been fine-tuning LLMs for over 2 years now! Here are the top 5 LLM fine-tuning techniques, explained with visuals: First of all, what's so different about LLM finetuning? Traditional fine‑tuning is impractical for LLMs (billions of params; 100s GB). Since this kind of…
The paper behind DeepSeek-V3.2 Its high-compute Speciale version reaches gold medal level on top math and coding contests and competes with leading closed models. Standard attention makes the model compare every token with every other token, so compute explodes as inputs get…
Here’s to delaying gratification. The future belongs to the patient. @elonmusk
Interview with Nikhil
Can’t believe how human-like Tesla’s Optimus moves.
Running robot
.@elonmusk "One way to frame civilizational progress is the percentage completion on the Kardashev scale. Kardashev I is what percentage of a planet's energy are you successfully turning into useful work. Concept II would be, what percentage of the sun's energy are you…
Congrats @SpaceX team and thank you @USSpaceForce!
We’ve received approval to develop Space Launch Complex-37 for Starship operations at Cape Canaveral Space Force Station. Construction has started. With three launch pads in Florida, Starship will be ready to support America’s national security and Artemis goals as the world’s…
Test-time scaling of diffusions with flow maps This paper is pretty cool, providing a better to guide image generation with a reward function. The standard approach evaluates the reward function on intermediate steps to get a reward gradient to modify sampling. However the…
This Google's paper from last year came almost unnoticed by the public, but it's really an alternative architecture to the transformer that proves more parameter-efficient and effective on similar tasks. As you might know, Transformers scale quadratically with sequence length.…
Why has neural B-frame video compression lagged behind P-frame methods despite its potential for better performance with bi-directional references? This study addresses this gap by tackling unbalanced reference frame contributions in hierarchical coding—their BRHVC method uses…
Agentic AI Overview This report provides a comprehensive overview of architectures, applications, and future directions. Great read for AI devs and enthusiasts. It introduces a new dual-paradigm framework that categorizes agentic systems into two distinct lineages: the…
OPPO AI Agent Team introduces O-Mem, an Omni Memory System for LLM Agents It brings personalized, long-horizon, self-evolving capabilities to conversational AI by actively profiling users and supporting hierarchical retrieval for adaptive responses.
United States Tendenze
- 1. FIFA 219K posts
- 2. FINALLY DID IT 426K posts
- 3. Paraguay 17.5K posts
- 4. The Jupiter 96.6K posts
- 5. The WET 106K posts
- 6. Morocco 56.5K posts
- 7. Infantino 48.1K posts
- 8. Matt Campbell 8,994 posts
- 9. Wayne Gretzky 3,274 posts
- 10. Croatia 9,229 posts
- 11. Portugal 64.5K posts
- 12. Argentina 169K posts
- 13. Group D 13K posts
- 14. Lauryn Hill 10.2K posts
- 15. #USMNT 1,058 posts
- 16. Senegal 26.9K posts
- 17. Warner Bros 203K posts
- 18. Iowa State 7,727 posts
- 19. Norway 26.4K posts
- 20. Aaron Judge 1,960 posts
Potrebbero piacerti
-
Prithvi Raj
@prithvi137 -
Daniel Walsh
@rhatdan -
OCI
@OCI_ORG -
Murat Demirbas (Distributolog)
@muratdemirbas -
Vedant Shrotria
@VedantShrotria -
Pete Cheslock
@petecheslock -
Bret Fisher
@BretFisher -
LitmusChaos | Chaos Engineering Made Easy
@LitmusChaos -
Uma Mukkara
@Uma_Mukkara -
Matt Hargett
@syke -
David Flanagan
@rawkode -
Kathy Zant
@kathyzant -
Felix Rieseberg
@felixrieseberg -
Mat Velloso
@matvelloso
Something went wrong.
Something went wrong.