code_igx's profile picture. 25 🇮🇳, Hustler @RITtigers NY 🇺🇸 | RnD on Quantum AI, Superintelligence & Systems | Ex- @Broadcom @VMware

Ishan Gupta

@code_igx

25 🇮🇳, Hustler @RITtigers NY 🇺🇸 | RnD on Quantum AI, Superintelligence & Systems | Ex- @Broadcom @VMware

Repost di Ishan Gupta

Qwen just won Best Paper Award at NeurIPS. And it wasn’t for a flashy new architecture. It was for fixing a problem Transformers had for years. Here’s what you need to know:

LiorOnAI's tweet image. Qwen just won Best Paper Award at NeurIPS.

And it wasn’t for a flashy new architecture.

It was for fixing a problem Transformers had for years.

Here’s what you need to know:

Repost di Ishan Gupta

NeurIPS 2025 Best Paper Award: Attention lets language models decide which tokens matter at each position, but it has limitations—for example, a tendency to over-focus on early tokens regardless of their relevance. Gating mechanisms, which selectively suppress or amplify…

burkov's tweet image. NeurIPS 2025 Best Paper Award:

Attention lets language models decide which tokens matter at each position, but it has limitations—for example, a tendency to over-focus on early tokens regardless of their relevance.

Gating mechanisms, which selectively suppress or amplify…

Repost di Ishan Gupta

This interesting week started with DeepSeek V3.2! I just wrote up a technical tour of the predecessors and components that led up to this: 🔗 magazine.sebastianraschka.com/p/technical-de… - Multi-Head Latent Attention - RLVR - Sparse Attention - Self-Verification - GRPO Updates

rasbt's tweet image. This interesting week started with DeepSeek V3.2!

I just wrote up a technical tour of the predecessors and components that led up to this: 

🔗 magazine.sebastianraschka.com/p/technical-de…

- Multi-Head Latent Attention
- RLVR
- Sparse Attention
- Self-Verification
- GRPO Updates

Repost di Ishan Gupta

Today at #NeurIPS2025, we present Titans, a new architecture that combines the speed of RNNs with the performance of Transformers. It uses deep neural memory to learn in real-time, effectively scaling to contexts larger than 2 million tokens. More at: goo.gle/3Kd5ojF

GoogleResearch's tweet image. Today at #NeurIPS2025, we present Titans, a new architecture that combines the speed of RNNs with the performance of Transformers. It uses deep neural memory to learn in real-time, effectively scaling to contexts larger than 2 million tokens. More at: goo.gle/3Kd5ojF

Repost di Ishan Gupta

Twitter is cool. But it’s 10x better when you connect with people who like building and scaling GenAI systems. If you’re into LLMs, GenAI, Distributed Systems or backend. say hi.


Repost di Ishan Gupta

Yup

Humanity has so thoroughly banished hunger that, as of this year, there are more obese kids than there are underweight kids.

cremieuxrecueil's tweet image. Humanity has so thoroughly banished hunger that, as of this year, there are more obese kids than there are underweight kids.


Repost di Ishan Gupta

Beautiful Tencent paper. Shows a language model that keeps improving itself using only 1% to 5% human labeled questions while reaching the level of systems trained on about 20 times more data. Earlier self play systems let a model write and solve its own questions, but over…

rohanpaul_ai's tweet image. Beautiful Tencent paper. 

Shows a language model that keeps improving itself using only 1% to 5% human labeled questions while reaching the level of systems trained on about 20 times more data.

Earlier self play systems let a model write and solve its own questions, but over…

Repost di Ishan Gupta

I have been fine-tuning LLMs for over 2 years now! Here are the top 5 LLM fine-tuning techniques, explained with visuals: First of all, what's so different about LLM finetuning? Traditional fine‑tuning is impractical for LLMs (billions of params; 100s GB). Since this kind of…

_avichawla's tweet image. I have been fine-tuning LLMs for over 2 years now!

Here are the top 5 LLM fine-tuning techniques, explained with visuals:

First of all, what's so different about LLM finetuning?

Traditional fine‑tuning is impractical for LLMs (billions of params; 100s GB).

Since this kind of…

Repost di Ishan Gupta

The paper behind DeepSeek-V3.2 Its high-compute Speciale version reaches gold medal level on top math and coding contests and competes with leading closed models. Standard attention makes the model compare every token with every other token, so compute explodes as inputs get…

rohanpaul_ai's tweet image. The paper behind DeepSeek-V3.2

Its high-compute Speciale version reaches gold medal level on top math and coding contests and competes with leading closed models.

Standard attention makes the model compare every token with every other token, so compute explodes as inputs get…

Repost di Ishan Gupta

Here’s to delaying gratification. The future belongs to the patient. @elonmusk


Repost di Ishan Gupta

Interview with Nikhil


Repost di Ishan Gupta

Can’t believe how human-like Tesla’s Optimus moves.


Repost di Ishan Gupta

Running robot

Just set a new PR in the lab



Repost di Ishan Gupta

.@elonmusk "One way to frame civilizational progress is the percentage completion on the Kardashev scale. Kardashev I is what percentage of a planet's energy are you successfully turning into useful work. Concept II would be, what percentage of the sun's energy are you…


Repost di Ishan Gupta

Congrats @SpaceX team and thank you @USSpaceForce!

We’ve received approval to develop Space Launch Complex-37 for Starship operations at Cape Canaveral Space Force Station. Construction has started. With three launch pads in Florida, Starship will be ready to support America’s national security and Artemis goals as the world’s…



Repost di Ishan Gupta

Test-time scaling of diffusions with flow maps This paper is pretty cool, providing a better to guide image generation with a reward function. The standard approach evaluates the reward function on intermediate steps to get a reward gradient to modify sampling. However the…

iScienceLuvr's tweet image. Test-time scaling of diffusions with flow maps

This paper is pretty cool, providing a better to guide image generation with a reward function. The standard approach evaluates the reward function on intermediate steps to get a reward gradient to modify sampling. However the…

Repost di Ishan Gupta

This Google's paper from last year came almost unnoticed by the public, but it's really an alternative architecture to the transformer that proves more parameter-efficient and effective on similar tasks. As you might know, Transformers scale quadratically with sequence length.…

burkov's tweet image. This Google's paper from last year came almost unnoticed by the public, but it's really an alternative architecture to the transformer that proves more parameter-efficient and effective on similar tasks.

As you might know, Transformers scale quadratically with sequence length.…

Repost di Ishan Gupta

Why has neural B-frame video compression lagged behind P-frame methods despite its potential for better performance with bi-directional references? This study addresses this gap by tackling unbalanced reference frame contributions in hierarchical coding—their BRHVC method uses…

jiqizhixin's tweet image. Why has neural B-frame video compression lagged behind P-frame methods despite its potential for better performance with bi-directional references? 

This study addresses this gap by tackling unbalanced reference frame contributions in hierarchical coding—their BRHVC method uses…

Repost di Ishan Gupta

Agentic AI Overview This report provides a comprehensive overview of architectures, applications, and future directions. Great read for AI devs and enthusiasts. It introduces a new dual-paradigm framework that categorizes agentic systems into two distinct lineages: the…

dair_ai's tweet image. Agentic AI Overview

This report provides a comprehensive overview of architectures, applications, and future directions.

Great read for AI devs and enthusiasts.

It introduces a new dual-paradigm framework that categorizes
agentic systems into two distinct lineages: the…

Repost di Ishan Gupta

OPPO AI Agent Team introduces O-Mem, an Omni Memory System for LLM Agents It brings personalized, long-horizon, self-evolving capabilities to conversational AI by actively profiling users and supporting hierarchical retrieval for adaptive responses.

HuggingPapers's tweet image. OPPO AI Agent Team introduces O-Mem, an Omni Memory System for LLM Agents

It brings personalized, long-horizon, self-evolving capabilities to conversational AI by actively profiling users and supporting hierarchical retrieval for adaptive responses.

Loading...

Something went wrong.


Something went wrong.