Ishan Gupta

@code_igx

25 🇮🇳, Hustler @RITtigers NY 🇺🇸 | RnD on Quantum AI, Superintelligence & Systems | Ex- @Broadcom @VMware

Software developer/Programmer/Software engineer

New York, USA

linkedin.com/in/ishangupta-…

五月 2017 加入

9千帖子 1千关注者 3千正在关注

你可能会喜欢

@prithvi137

@rhatdan

@OCI_ORG

@muratdemirbas

@VedantShrotria

@BretFisher

@LitmusChaos

@Uma_Mukkara

@syke

@rawkode

@MSFTReactor

@kathyzant

@felixrieseberg

@matvelloso

Ishan Gupta 已转帖

Xin Eric Wang

@xwang_lk

年10月22日

Meta is out of its mind.

I was laid off by Meta today. As a Research Scientist, my work was just cited by the legendary @johnschulman2 and Nicholas Carlini yesterday. I’m actively looking for new opportunities — please reach out if you have any openings!

xianjun_agi's tweet image. I was laid off by Meta today. As a Research Scientist, my work was just cited by the legendary @johnschulman2 and Nicholas Carlini yesterday.

I’m actively looking for new opportunities — please reach out if you have any openings!

Ishan Gupta 已转帖

Zhuang Liu

@liuzhuang1234

年10月22日

Excited to share our lab’s first open-source release: LLM-Distillation-JAX supports practical knowledge distillation configurations (distillation strength, temperature, top-k/top-p), built on MaxText designed for reproducible JAX/Flax training on both TPUs and GPUs

liuzhuang1234's tweet image. Excited to share our lab’s first open-source release: LLM-Distillation-JAX

supports practical knowledge distillation configurations (distillation strength, temperature, top-k/top-p), built on MaxText

designed for reproducible JAX/Flax training on both TPUs and GPUs

Ishan Gupta 已转帖

elvis

@omarsar0

年10月23日

Lookahead Routing for LLMs Proposes Lookahead, a routing framework to enable more informed routing without full inference. Achieves an average performance gain of 7.7% over the state-of-the-art. Here is why it works: Lookahead is a new framework for routing in multi-LLM…

omarsar0's tweet image. Lookahead Routing for LLMs

Proposes Lookahead, a routing framework to enable more informed routing without full inference.

Achieves an average performance gain of 7.7% over the state-of-the-art.

Here is why it works:

Lookahead is a new framework for routing in multi-LLM…

Ishan Gupta 已转帖

Lucas Beyer (bl16)

@giffmana

年10月22日

One sign of this being a really cool idea is that while reading, I had tons of follow-up ideas immediately come to mind, and only few "hmm but"s. Plz read thread and paper, but TLDR: add layer of input independent kv, and fine-tune only the high tfidf kvs for continual learning.

Jessy Lin

@realJessyLin

年10月21日

🧠 How can we equip LLMs with memory that allows them to continually learn new things? In our new paper with @AIatMeta, we show how sparsely finetuning memory layers enables targeted updates for continual learning, w/ minimal interference with existing knowledge. While full…

realJessyLin's tweet image. 🧠 How can we equip LLMs with memory that allows them to continually learn new things?

In our new paper with @AIatMeta, we show how sparsely finetuning memory layers enables targeted updates for continual learning, w/ minimal interference with existing knowledge.

While full…

Ishan Gupta 已转帖

Jessy Lin

@realJessyLin

年10月21日

Ishan Gupta 已转帖

Rohan Paul

@rohanpaul_ai

年10月22日

This paper shows a small open model solves harder math by running long self evolving reasoning loops. With this setup, the 8B DeepSeek based model solved 5 AIME problems it could not solve before. The loop treats each round as a tiny move toward right, and if improve beats…

rohanpaul_ai's tweet image. This paper shows a small open model solves harder math by running long self evolving reasoning loops.

With this setup, the 8B DeepSeek based model solved 5 AIME problems it could not solve before.

The loop treats each round as a tiny move toward right, and if improve beats…

Ishan Gupta 已转帖

Robin John

@Robincjohn

年7月22日

Your money can do more than build wealth, it can build hope. Robin John’s book reveals how to invest in ways that honor God and serve your neighbors. Make your work and investments a force for good.

Ishan Gupta 已转帖

Akshay 🚀

@akshay_pachaar

年10月22日

KV caching, clearly explained:

Akshay 🚀

@akshay_pachaar

年10月20日

You're in an ML Engineer interview at OpenAI. The interviewer asks: "Our GPT model generates 100 tokens in 42 seconds. How do you make it 5x faster?" You: "I'll optimize the model architecture and use a better GPU." Interview over. Here's what you missed:

Ishan Gupta 已转帖

elvis

@omarsar0

年10月22日

🎓Stanford CME295 Transformers & LLMs Nice to see the new release of this new course on Transformers and LLMs. Great way to catch up on the world of LLMs and AI Agents. Includes topics like the basics of attention, mixture-of-experts, to agents. Excited to see more on evals.…

omarsar0's tweet image. 🎓Stanford CME295 Transformers &amp; LLMs

Nice to see the new release of this new course on Transformers and LLMs.

Great way to catch up on the world of LLMs and AI Agents.

Includes topics like the basics of attention, mixture-of-experts, to agents.

Excited to see more on evals.…

Ishan Gupta 已转帖

elvis

@omarsar0

年10月22日

Scaling RL for Trillion-Scale Thinking Model Scaling RL is hard! But this team might have figured out something. They introduce Ring-1T, a 1T-parameter MoE reasoning model with ~50B params active per token. It’s trained with a long-CoT SFT phase, a verifiable-rewards reasoning…

omarsar0's tweet image. Scaling RL for Trillion-Scale Thinking Model

Scaling RL is hard! But this team might have figured out something.

They introduce Ring-1T, a 1T-parameter MoE reasoning model with ~50B params active per token.

It’s trained with a long-CoT SFT phase, a verifiable-rewards reasoning…

Ishan Gupta 已转帖

Rohan Paul

@rohanpaul_ai

年10月22日

This paper builds an agentic LLM that can run the whole data science workflow by itself. It is an 8B model that plans work, reads structured files, writes and runs code, checks results, and iterates. Standard “workflow agents” break here because fixed scripts do not adapt well…

rohanpaul_ai's tweet image. This paper builds an agentic LLM that can run the whole data science workflow by itself.

It is an 8B model that plans work, reads structured files, writes and runs code, checks results, and iterates.

Standard “workflow agents” break here because fixed scripts do not adapt well…

Ishan Gupta 已转帖

elvis

@omarsar0

年10月21日

People are sleeping on Deep Agents. Start using them now. This is a fun paper showcasing how to put together advanced deep agents for enterprise use cases. Uses the best techniques: task decomposition, planning, specialized subagents, MCP for NL2SQL, file analysis, and more.

omarsar0's tweet image. People are sleeping on Deep Agents.

Start using them now.

This is a fun paper showcasing how to put together advanced deep agents for enterprise use cases.

Uses the best techniques: task decomposition, planning, specialized subagents, MCP for NL2SQL, file analysis, and more.

Ishan Gupta 已转帖

Andrej Karpathy

@karpathy

年10月20日

I quite like the new DeepSeek-OCR paper. It's a good OCR model (maybe a bit worse than dots), and yes data collection etc., but anyway it doesn't matter. The more interesting part for me (esp as a computer vision at heart who is temporarily masquerading as a natural language…

vLLM

@vllm_project

年10月20日

🚀 DeepSeek-OCR — the new frontier of OCR from @deepseek_ai , exploring optical context compression for LLMs, is running blazingly fast on vLLM ⚡ (~2500 tokens/s on A100-40G) — powered by vllm==0.8.5 for day-0 model support. 🧠 Compresses visual contexts up to 20× while keeping…

vllm_project's tweet image. 🚀 DeepSeek-OCR — the new frontier of OCR from @deepseek_ai , exploring optical context compression for LLMs, is running blazingly fast on vLLM ⚡ (~2500 tokens/s on A100-40G) — powered by vllm==0.8.5 for day-0 model support.

🧠 Compresses visual contexts up to 20× while keeping…

Ishan Gupta 已转帖

Rohan Paul

@rohanpaul_ai

年10月20日

Great paper on AI's recursive self-improvement. Builds a single loop that lets a search agent teach itself. One part writes new tasks, one part tries to solve them, and one part judges the answers. A 3-role loop can keep improving a search agent without human labels. The…

rohanpaul_ai's tweet image. Great paper on AI's recursive self-improvement.

Builds a single loop that lets a search agent teach itself.

One part writes new tasks, one part tries to solve them, and one part judges the answers.

A 3-role loop can keep improving a search agent without human labels.

The…

Ishan Gupta 已转帖

Robert Youssef

@rryssf_

年10月20日

Holy shit… Harvard just proved your base model might secretly be a genius. 🤯 Their new paper “Reasoning with Sampling” shows that you don’t need reinforcement learning to make LLMs reason better. They used a 'Markov chain sampling trick' that simply re-samples from the…

rryssf_'s tweet image. Holy shit… Harvard just proved your base model might secretly be a genius. 🤯

Their new paper “Reasoning with Sampling” shows that you don’t need reinforcement learning to make LLMs reason better.

They used a 'Markov chain sampling trick' that simply re-samples from the…

Ishan Gupta 已转帖

Elon Musk

@elonmusk

年10月20日

A permanently crewed lunar science base would be far more impressive than a repeat of what was already done incredibly well by Apollo in 1969

Truthful🛰️

@Truthful_ast

年10月20日

Blue Origin could likely land on the Moon before SpaceX but this is Blue Moon Mk1, a small lander testbed for Blue Moon Mk2 that would provide no value if man rated. So why bother man rating it to beat China? China doesn’t plan on flags and footprints, they plan on permanent…

Truthful_ast's tweet image. Blue Origin could likely land on the Moon before SpaceX but this is Blue Moon Mk1, a small lander testbed for Blue Moon Mk2 that would provide no value if man rated.

So why bother man rating it to beat China? China doesn’t plan on flags and footprints, they plan on permanent…

Ishan Gupta 已转帖

Elon Musk

@elonmusk

年10月20日

SpaceX will carry ~90% of the world’s payload mass to space this year, so it is pretty much Earth’s space program

Robin

@xdNiBoR

年10月19日

SpaceX is essentially the US space program

Ishan Gupta 已转帖

TuringPost

@TheTuringPost

年10月18日

This new paper defines AGI as: "AI that can match or exceed the cognitive versatility and proficiency of a well-educated adult." Researchers translate this theory into 10 cognitive domains, like general knowledge, speed, on-the-spot reasoning, working memory, each representing…

TheTuringPost's tweet image. This new paper defines AGI as:

"AI that can match or exceed the cognitive versatility and proficiency of a well-educated adult."

Researchers translate this theory into 10 cognitive domains, like general knowledge, speed, on-the-spot reasoning, working memory, each representing…

Ishan Gupta 已转帖

ℏεsam

@Hesamation

年10月18日

fantastic simple visualization of the self attention formula. this was one of the hardest things for me to deeply understand about LLMs. the formula seems easy. you can even memorize it fast. but to really get an intuition of what the Q,K,V represent and interact, that’s hard.

Ishan Gupta 已转帖

Akshay 🚀

@akshay_pachaar

年10月19日

if you're looking for a comprehensive guide to LLM finetuning, check this! a free 115-page book on arxiv, covering: > fundamentals of LLM > peft (lora, qlora, dora, hft) > alignment methods (ppo, dpo, grpo) > mixture of experts (MoE) > 7-stage fine-tuning pipeline > multimodal…