Upesh_AI's profile picture. Data Scientist @FICO_corp
AI Enthusiast 
B. Tech in Mathematics & Computing @IITGuwahati
Dakshanite 
Navodayan
#Letsconnect

Upesh

@Upesh_AI

Data Scientist @FICO_corp AI Enthusiast B. Tech in Mathematics & Computing @IITGuwahati Dakshanite Navodayan #Letsconnect

Pinned

Optimizers in Deep Learning, a thread🧵 First of all let's talk about why we need optimizer if we have SGD(stochastic gradient descent), GD(gradient descent), mini-batch SGD


Upesh reposted

A simple trick cuts your LLM costs by 50%! Just stop using JSON and use this instead: TOON (Token-Oriented Object Notation) slashes your LLM token usage in half while keeping data perfectly readable. Here's why it works: TOON's sweet spot: uniform arrays with consistent…

akshay_pachaar's tweet image. A simple trick cuts your LLM costs by 50%!

Just stop using JSON and use this instead:

TOON (Token-Oriented Object Notation) slashes your LLM token usage in half while keeping data perfectly readable.

Here's why it works:

TOON's sweet spot: uniform arrays with consistent…

Upesh reposted

Introducing the File Search Tool in the Gemini API, our hosted RAG solution with free storage and free query time embeddings 💾 We are super excited about this new approach and think it will dramatically simplify the path to context aware AI systems, more details in 🧵


Upesh reposted

You are in an AI engineer interview at Google. The interviewer asks: "Our data is spread across several sources (Gmail, Drive, etc.) How would you build a unified query engine over it?" You: "I'll embed everything in a vector DB and do RAG." Interview over! Here's what you…


Upesh reposted

Quick Question to folks who do model finetuning One of the trickiest parts of fine-tuning LLMs, in my experience, is retaining the original capabilities of the post-trained model. Let’s take an example say you want to fine-tune a VLM for OCR, layout detection, or any other…


Upesh reposted

Do this: 1. Open AWS and create an account. 2. Go to EC2, spin up an instance, generate a key pair, and SSH into it from your local system. Just play around install Nginx, deploy a Node app, break things, fix them. 3. Decide to launch something? Go to Security Groups open…

Don’t overthink it. • Learn EC2 → compute • Learn S3 → storage • Learn RDS → database • Learn IAM → security • Learn VPC → networking • Learn Lambda → automation • Learn CloudWatch → monitoring • Learn Route 53 → DNS & domains You don’t need every AWS service.…



Upesh reposted

This November, history changes. An NVIDIA H100 GPU—100 times more powerful than any GPU ever flown in space—launches to orbit. It will run Google's Gemma—the open-source version of Gemini. In space. For the first time. First AI training in orbit. First model fine-tuning in…


Upesh reposted

Today, we’re announcing a major breakthrough that marks a significant step forward in the world of quantum computing. For the first time in history, our teams at @GoogleQuantumAI demonstrated that a quantum computer can successfully run a verifiable algorithm, 13,000x faster than…


Upesh reposted

The security vulnerability we found in Perplexity’s Comet browser this summer is not an isolated issue. Indirect prompt injections are a systemic problem facing Comet and other AI-powered browsers. Today we’re publishing details on more security vulnerabilities we uncovered.


Upesh reposted

I still can’t believe this thing is taking our jobs.

TheVixhal's tweet image. I still can’t believe this thing is taking our jobs.

Upesh reposted

The term “AGI” is currently a vague, moving goalpost. To ground the discussion, we propose a comprehensive, testable definition of AGI. Using it, we can quantify progress: GPT-4 (2023) was 27% of the way to AGI. GPT-5 (2025) is 58%. Here’s how we define and measure it: 🧵

DanHendrycks's tweet image. The term “AGI” is currently a vague, moving goalpost.

To ground the discussion, we propose a comprehensive, testable definition of AGI.
Using it, we can quantify progress:
GPT-4 (2023) was 27% of the way to AGI. GPT-5 (2025) is 58%.

Here’s how we define and measure it: 🧵
DanHendrycks's tweet image. The term “AGI” is currently a vague, moving goalpost.

To ground the discussion, we propose a comprehensive, testable definition of AGI.
Using it, we can quantify progress:
GPT-4 (2023) was 27% of the way to AGI. GPT-5 (2025) is 58%.

Here’s how we define and measure it: 🧵

Upesh reposted

Banger paper from Meta and collaborators. This paper is one of the best deep dives yet on how reinforcement learning (RL) actually scales for LLMs. The team ran over 400,000 GPU hours of experiments to find a predictable scaling pattern and a stable recipe (ScaleRL) that…

omarsar0's tweet image. Banger paper from Meta and collaborators.

This paper is one of the best deep dives yet on how reinforcement learning (RL) actually scales for LLMs.

The team ran over 400,000 GPU hours of experiments to find a predictable scaling pattern and a stable recipe (ScaleRL) that…

Upesh reposted

Google officially starts selling TPUs to external customers and competes directly with Nvidia now

Broadcom's 5th customer ($10B) isn't Apple or XAI It's Anthropic. They won't design a new chip. They will be buying TPUs from Broadcom. Expect Anthropic to announce a funding round from Google soon.

zephyr_z9's tweet image. Broadcom's 5th customer ($10B) isn't Apple or XAI

It's Anthropic. They won't design a new chip.
They will be buying TPUs from Broadcom.
Expect Anthropic to announce a funding round from Google soon.


Upesh reposted

nanochat d32, i.e. the depth 32 version that I specced for $1000, up from $100 has finished training after ~33 hours, and looks good. All the metrics go up quite a bit across pretraining, SFT and RL. CORE score of 0.31 is now well above GPT-2 at ~0.26. GSM8K went ~8% -> ~20%,…

karpathy's tweet image. nanochat d32, i.e. the depth 32 version that I specced for $1000, up from $100 has finished training after ~33 hours, and looks good. All the metrics go up quite a bit across pretraining, SFT and RL. CORE score of 0.31 is now well above GPT-2 at ~0.26. GSM8K went ~8% -> ~20%,…

Upesh reposted

How does a vector database work? It's a question I get asked constantly. So, let’s refresh. At its core, a vector database retrieves data objects using vector search. But let's break down what's actually happening under the hood. 𝗧𝗵𝗲 𝗙𝗼𝘂𝗻𝗱𝗮𝘁𝗶𝗼𝗻: 𝗩𝗲𝗰𝘁𝗼𝗿…

femke_plantinga's tweet image. How does a vector database work?

It's a question I get asked constantly. So, let’s refresh.

At its core, a vector database retrieves data objects using vector search. But let's break down what's actually happening under the hood.

𝗧𝗵𝗲 𝗙𝗼𝘂𝗻𝗱𝗮𝘁𝗶𝗼𝗻: 𝗩𝗲𝗰𝘁𝗼𝗿…

Upesh reposted

Research Scientist interview at Google. Interviewer: "You need to quantize a model from FP16 to INT8. Walk me through how you'd do it without destroying quality." Your answer: "I'll just convert all weights to INT8 format" ❌ Rejected. Here's the critical mistake: Don't say:…


Upesh reposted

An exciting milestone for AI in science: Our C2S-Scale 27B foundation model, built with @Yale and based on Gemma, generated a novel hypothesis about cancer cellular behavior, which scientists experimentally validated in living cells.  With more preclinical and clinical tests,…


Upesh reposted

Google shocked the world. They solved the code security nightmare that's been killing developers for decades. DeepMind's new AI agent "Codemender" just auto-finds and fixes vulnerabilities in your code. Already shipped 72 solid fixes to major open source projects. This is…

kakarot_ai's tweet image. Google shocked the world.

They solved the code security nightmare that's been killing developers for decades.

DeepMind's new AI agent "Codemender" just auto-finds and fixes vulnerabilities in your code.

Already shipped 72 solid fixes to major open source projects.

This is…

Upesh reposted

My brain broke when I read this paper. A tiny 7 Million parameter model just beat DeepSeek-R1, Gemini 2.5 pro, and o3-mini at reasoning on both ARG-AGI 1 and ARC-AGI 2. It's called Tiny Recursive Model (TRM) from Samsung. How can a model 10,000x smaller be smarter? Here's how…

JacksonAtkinsX's tweet image. My brain broke when I read this paper.

A tiny 7 Million parameter model just beat DeepSeek-R1, Gemini 2.5 pro, and o3-mini at reasoning on both ARG-AGI 1 and ARC-AGI 2.

It's called Tiny Recursive Model (TRM) from Samsung.

How can a model 10,000x smaller be smarter?

Here's how…

Upesh reposted

Everything shipped at DevDay [2025] 🧵


Upesh reposted

You're in a ML Engineer interview at Google, and the interviewer asks: "GPUs vs TPUs which one to choose?" Here's how you answer:


United States Trends

Loading...

Something went wrong.


Something went wrong.