code_igx's profile picture. 25 🇮🇳, Hustler @RITtigers NY 🇺🇸 | RnD on Quantum AI, Superintelligence & Systems | Ex- @Broadcom @VMware

Ishan Gupta

@code_igx

25 🇮🇳, Hustler @RITtigers NY 🇺🇸 | RnD on Quantum AI, Superintelligence & Systems | Ex- @Broadcom @VMware

Ishan Gupta أعاد

Recommender systems can improve by modeling users. TagCF uses an LLM to extract tag based logic graphs that reveal user roles and behavioral logic, then integrates them to boost ranking performance. Online and offline results show user role modeling can outperform item topic…

jiqizhixin's tweet image. Recommender systems can improve by modeling users.

TagCF uses an LLM to extract tag based logic graphs that reveal user roles and behavioral logic, then integrates them to boost ranking performance. 

Online and offline results show user role modeling can outperform item topic…

Ishan Gupta أعاد

Google just released the SIMA2 paper on arXiv with Demis Hassabis’s name on it. SIMA 2: A Generalist Embodied Agent for Virtual Worlds Paper: arxiv.org/abs/2512.04797

jiqizhixin's tweet image. Google just released the SIMA2 paper on arXiv with Demis Hassabis’s name on it.

SIMA 2: A Generalist Embodied Agent for Virtual Worlds

Paper: arxiv.org/abs/2512.04797

Ishan Gupta أعاد

Multimodal fusion is key to building AI that truly understands the world. But it’s still hard to find the right way to do it, partly because diffusion is dynamic while text is static. @AIatMeta and @AI_KAUST proposed MoS – Mixture of States, which fixes this mismatch by routing…

TheTuringPost's tweet image. Multimodal fusion is key to building AI that truly understands the world.

But it’s still hard to find the right way to do it, partly because diffusion is dynamic while text is static.

@AIatMeta and @AI_KAUST proposed MoS – Mixture of States, which fixes this mismatch by routing…

Ishan Gupta أعاد

Qwen just won Best Paper Award at NeurIPS. And it wasn’t for a flashy new architecture. It was for fixing a problem Transformers had for years. Here’s what you need to know:

LiorOnAI's tweet image. Qwen just won Best Paper Award at NeurIPS.

And it wasn’t for a flashy new architecture.

It was for fixing a problem Transformers had for years.

Here’s what you need to know:

Ishan Gupta أعاد

NeurIPS 2025 Best Paper Award: Attention lets language models decide which tokens matter at each position, but it has limitations—for example, a tendency to over-focus on early tokens regardless of their relevance. Gating mechanisms, which selectively suppress or amplify…

burkov's tweet image. NeurIPS 2025 Best Paper Award:

Attention lets language models decide which tokens matter at each position, but it has limitations—for example, a tendency to over-focus on early tokens regardless of their relevance.

Gating mechanisms, which selectively suppress or amplify…

Ishan Gupta أعاد

This interesting week started with DeepSeek V3.2! I just wrote up a technical tour of the predecessors and components that led up to this: 🔗 magazine.sebastianraschka.com/p/technical-de… - Multi-Head Latent Attention - RLVR - Sparse Attention - Self-Verification - GRPO Updates

rasbt's tweet image. This interesting week started with DeepSeek V3.2!

I just wrote up a technical tour of the predecessors and components that led up to this: 

🔗 magazine.sebastianraschka.com/p/technical-de…

- Multi-Head Latent Attention
- RLVR
- Sparse Attention
- Self-Verification
- GRPO Updates

Ishan Gupta أعاد

Today at #NeurIPS2025, we present Titans, a new architecture that combines the speed of RNNs with the performance of Transformers. It uses deep neural memory to learn in real-time, effectively scaling to contexts larger than 2 million tokens. More at: goo.gle/3Kd5ojF

GoogleResearch's tweet image. Today at #NeurIPS2025, we present Titans, a new architecture that combines the speed of RNNs with the performance of Transformers. It uses deep neural memory to learn in real-time, effectively scaling to contexts larger than 2 million tokens. More at: goo.gle/3Kd5ojF

Ishan Gupta أعاد

Twitter is cool. But it’s 10x better when you connect with people who like building and scaling GenAI systems. If you’re into LLMs, GenAI, Distributed Systems or backend. say hi.


Ishan Gupta أعاد

Yup

Humanity has so thoroughly banished hunger that, as of this year, there are more obese kids than there are underweight kids.

cremieuxrecueil's tweet image. Humanity has so thoroughly banished hunger that, as of this year, there are more obese kids than there are underweight kids.


Ishan Gupta أعاد

Beautiful Tencent paper. Shows a language model that keeps improving itself using only 1% to 5% human labeled questions while reaching the level of systems trained on about 20 times more data. Earlier self play systems let a model write and solve its own questions, but over…

rohanpaul_ai's tweet image. Beautiful Tencent paper. 

Shows a language model that keeps improving itself using only 1% to 5% human labeled questions while reaching the level of systems trained on about 20 times more data.

Earlier self play systems let a model write and solve its own questions, but over…

Ishan Gupta أعاد

I have been fine-tuning LLMs for over 2 years now! Here are the top 5 LLM fine-tuning techniques, explained with visuals: First of all, what's so different about LLM finetuning? Traditional fine‑tuning is impractical for LLMs (billions of params; 100s GB). Since this kind of…

_avichawla's tweet image. I have been fine-tuning LLMs for over 2 years now!

Here are the top 5 LLM fine-tuning techniques, explained with visuals:

First of all, what's so different about LLM finetuning?

Traditional fine‑tuning is impractical for LLMs (billions of params; 100s GB).

Since this kind of…

Ishan Gupta أعاد

The paper behind DeepSeek-V3.2 Its high-compute Speciale version reaches gold medal level on top math and coding contests and competes with leading closed models. Standard attention makes the model compare every token with every other token, so compute explodes as inputs get…

rohanpaul_ai's tweet image. The paper behind DeepSeek-V3.2

Its high-compute Speciale version reaches gold medal level on top math and coding contests and competes with leading closed models.

Standard attention makes the model compare every token with every other token, so compute explodes as inputs get…

Ishan Gupta أعاد

Here’s to delaying gratification. The future belongs to the patient. @elonmusk


Ishan Gupta أعاد

Interview with Nikhil

Out now @elonmusk



Ishan Gupta أعاد

Can’t believe how human-like Tesla’s Optimus moves.


Ishan Gupta أعاد

Running robot

Just set a new PR in the lab



Ishan Gupta أعاد

.@elonmusk "One way to frame civilizational progress is the percentage completion on the Kardashev scale. Kardashev I is what percentage of a planet's energy are you successfully turning into useful work. Concept II would be, what percentage of the sun's energy are you…


Ishan Gupta أعاد

Congrats @SpaceX team and thank you @USSpaceForce!

We’ve received approval to develop Space Launch Complex-37 for Starship operations at Cape Canaveral Space Force Station. Construction has started. With three launch pads in Florida, Starship will be ready to support America’s national security and Artemis goals as the world’s…



Ishan Gupta أعاد

Test-time scaling of diffusions with flow maps This paper is pretty cool, providing a better to guide image generation with a reward function. The standard approach evaluates the reward function on intermediate steps to get a reward gradient to modify sampling. However the…

iScienceLuvr's tweet image. Test-time scaling of diffusions with flow maps

This paper is pretty cool, providing a better to guide image generation with a reward function. The standard approach evaluates the reward function on intermediate steps to get a reward gradient to modify sampling. However the…

Ishan Gupta أعاد

This Google's paper from last year came almost unnoticed by the public, but it's really an alternative architecture to the transformer that proves more parameter-efficient and effective on similar tasks. As you might know, Transformers scale quadratically with sequence length.…

burkov's tweet image. This Google's paper from last year came almost unnoticed by the public, but it's really an alternative architecture to the transformer that proves more parameter-efficient and effective on similar tasks.

As you might know, Transformers scale quadratically with sequence length.…

Loading...

Something went wrong.


Something went wrong.