Upesh

@Upesh_AI

Data Scientist @FICO_corp AI Enthusiast B. Tech in Mathematics & Computing @IITGuwahati Dakshanite Navodayan #Letsconnect

Education

Udaipur

topmate.io/upesh

Joined July 2019

165Posts 198Followers 207Following

You might like

@unionlotuce

@ranjann_sol

@zindgiwrites_ab

@RudyKrm

Pinned

Upesh

@UpeshStats

Jul 8, 2024

Optimizers in Deep Learning, a thread🧵 First of all let's talk about why we need optimizer if we have SGD(stochastic gradient descent), GD(gradient descent), mini-batch SGD

Upesh reposted

A simple trick cuts your LLM costs by 50%! Just stop using JSON and use this instead: TOON (Token-Oriented Object Notation) slashes your LLM token usage in half while keeping data perfectly readable. Here's why it works: TOON's sweet spot: uniform arrays with consistent…

akshay_pachaar's tweet image. A simple trick cuts your LLM costs by 50%!

Just stop using JSON and use this instead:

TOON (Token-Oriented Object Notation) slashes your LLM token usage in half while keeping data perfectly readable.

Here's why it works:

TOON's sweet spot: uniform arrays with consistent…

Upesh reposted

Logan Kilpatrick

@OfficialLoganK

Nov 6

Introducing the File Search Tool in the Gemini API, our hosted RAG solution with free storage and free query time embeddings 💾 We are super excited about this new approach and think it will dramatically simplify the path to context aware AI systems, more details in 🧵

Upesh reposted

Avi Chawla

@_avichawla

Nov 6

You are in an AI engineer interview at Google. The interviewer asks: "Our data is spread across several sources (Gmail, Drive, etc.) How would you build a unified query engine over it?" You: "I'll embed everything in a vector DB and do RAG." Interview over! Here's what you…

Upesh reposted

Adithya S K

@adithya_s_k

Nov 3

Quick Question to folks who do model finetuning One of the trickiest parts of fine-tuning LLMs, in my experience, is retaining the original capabilities of the post-trained model. Let’s take an example say you want to fine-tune a VLM for OCR, layout detection, or any other…

Upesh reposted

Adarsh Gupta ✨

@Adarsh____gupta

Nov 2

Do this: 1. Open AWS and create an account. 2. Go to EC2, spin up an instance, generate a key pair, and SSH into it from your local system. Just play around install Nginx, deploy a Node app, break things, fix them. 3. Decide to launch something? Go to Security Groups open…

yourclouddude

@yourclouddude

Nov 1

Don’t overthink it. • Learn EC2 → compute • Learn S3 → storage • Learn RDS → database • Learn IAM → security • Learn VPC → networking • Learn Lambda → automation • Learn CloudWatch → monitoring • Learn Route 53 → DNS & domains You don’t need every AWS service.…

Upesh reposted

Ask Perplexity

@AskPerplexity

Oct 27

This November, history changes. An NVIDIA H100 GPU—100 times more powerful than any GPU ever flown in space—launches to orbit. It will run Google's Gemma—the open-source version of Gemini. In space. For the first time. First AI training in orbit. First model fine-tuning in…

Upesh reposted

Google AI

@GoogleAI

Oct 22

Today, we’re announcing a major breakthrough that marks a significant step forward in the world of quantum computing. For the first time in history, our teams at @GoogleQuantumAI demonstrated that a quantum computer can successfully run a verifiable algorithm, 13,000x faster than…

Upesh reposted

Brave

@brave

Oct 21

The security vulnerability we found in Perplexity’s Comet browser this summer is not an isolated issue. Indirect prompt injections are a systemic problem facing Comet and other AI-powered browsers. Today we’re publishing details on more security vulnerabilities we uncovered.

Upesh reposted

vixhal

@TheVixhal

Oct 20

I still can’t believe this thing is taking our jobs.

Upesh reposted

Dan Hendrycks

@DanHendrycks

Oct 16

The term “AGI” is currently a vague, moving goalpost. To ground the discussion, we propose a comprehensive, testable definition of AGI. Using it, we can quantify progress: GPT-4 (2023) was 27% of the way to AGI. GPT-5 (2025) is 58%. Here’s how we define and measure it: 🧵

DanHendrycks's tweet image. The term “AGI” is currently a vague, moving goalpost.

To ground the discussion, we propose a comprehensive, testable definition of AGI.
Using it, we can quantify progress:
GPT-4 (2023) was 27% of the way to AGI. GPT-5 (2025) is 58%.

Here’s how we define and measure it: 🧵

Upesh reposted

elvis

@omarsar0

Oct 16

Banger paper from Meta and collaborators. This paper is one of the best deep dives yet on how reinforcement learning (RL) actually scales for LLMs. The team ran over 400,000 GPU hours of experiments to find a predictable scaling pattern and a stable recipe (ScaleRL) that…

omarsar0's tweet image. Banger paper from Meta and collaborators.

This paper is one of the best deep dives yet on how reinforcement learning (RL) actually scales for LLMs.

The team ran over 400,000 GPU hours of experiments to find a predictable scaling pattern and a stable recipe (ScaleRL) that…

Upesh reposted

Zephyr

@zephyr_z9

Oct 16

Google officially starts selling TPUs to external customers and competes directly with Nvidia now

Zephyr

@zephyr_z9

Oct 16

Broadcom's 5th customer ($10B) isn't Apple or XAI It's Anthropic. They won't design a new chip. They will be buying TPUs from Broadcom. Expect Anthropic to announce a funding round from Google soon.

zephyr_z9's tweet image. Broadcom's 5th customer ($10B) isn't Apple or XAI

It's Anthropic. They won't design a new chip.
They will be buying TPUs from Broadcom.
Expect Anthropic to announce a funding round from Google soon.

Upesh reposted

Andrej Karpathy

@karpathy

Oct 16

nanochat d32, i.e. the depth 32 version that I specced for $1000, up from $100 has finished training after ~33 hours, and looks good. All the metrics go up quite a bit across pretraining, SFT and RL. CORE score of 0.31 is now well above GPT-2 at ~0.26. GSM8K went ~8% -> ~20%,…

karpathy's tweet image. nanochat d32, i.e. the depth 32 version that I specced for $1000, up from $100 has finished training after ~33 hours, and looks good. All the metrics go up quite a bit across pretraining, SFT and RL. CORE score of 0.31 is now well above GPT-2 at ~0.26. GSM8K went ~8% -&gt; ~20%,…

Upesh reposted

Femke Plantinga

@femke_plantinga

Oct 15

How does a vector database work? It's a question I get asked constantly. So, let’s refresh. At its core, a vector database retrieves data objects using vector search. But let's break down what's actually happening under the hood. 𝗧𝗵𝗲 𝗙𝗼𝘂𝗻𝗱𝗮𝘁𝗶𝗼𝗻: 𝗩𝗲𝗰𝘁𝗼𝗿…

femke_plantinga's tweet image. How does a vector database work?

It's a question I get asked constantly. So, let’s refresh.

At its core, a vector database retrieves data objects using vector search. But let's break down what's actually happening under the hood.

𝗧𝗵𝗲 𝗙𝗼𝘂𝗻𝗱𝗮𝘁𝗶𝗼𝗻: 𝗩𝗲𝗰𝘁𝗼𝗿…

Upesh reposted

anshuman

@athleticKoder

Oct 15

Research Scientist interview at Google. Interviewer: "You need to quantize a model from FP16 to INT8. Walk me through how you'd do it without destroying quality." Your answer: "I'll just convert all weights to INT8 format" ❌ Rejected. Here's the critical mistake: Don't say:…

Upesh reposted

Sundar Pichai

@sundarpichai

Oct 15

An exciting milestone for AI in science: Our C2S-Scale 27B foundation model, built with @Yale and based on Gemma, generated a novel hypothesis about cancer cellular behavior, which scientists experimentally validated in living cells. With more preclinical and clinical tests,…

Upesh reposted

Goku

@kakarot_ai

Oct 7

Google shocked the world. They solved the code security nightmare that's been killing developers for decades. DeepMind's new AI agent "Codemender" just auto-finds and fixes vulnerabilities in your code. Already shipped 72 solid fixes to major open source projects. This is…

kakarot_ai's tweet image. Google shocked the world.

They solved the code security nightmare that's been killing developers for decades.

DeepMind's new AI agent "Codemender" just auto-finds and fixes vulnerabilities in your code.

Already shipped 72 solid fixes to major open source projects.

This is…

Upesh reposted

Jackson Atkins

@JacksonAtkinsX

Oct 7

My brain broke when I read this paper. A tiny 7 Million parameter model just beat DeepSeek-R1, Gemini 2.5 pro, and o3-mini at reasoning on both ARG-AGI 1 and ARC-AGI 2. It's called Tiny Recursive Model (TRM) from Samsung. How can a model 10,000x smaller be smarter? Here's how…