ajitesh_shukla7's profile picture. Student,Love to solve hardest math problem. LLM's, Mathematical Research(Geometric Topology,Differential Geometry),Quantum Computing.Lord Krishna is God Of Math

Ajitesh Shukla

@ajitesh_shukla7

Student,Love to solve hardest math problem. LLM's, Mathematical Research(Geometric Topology,Differential Geometry),Quantum Computing.Lord Krishna is God Of Math

Ajitesh Shukla reposted

Can AI invent new math? A new paper from DeepMind and renowned mathematician Terence Tao shows how. Using AlphaEvolve, the team merges LLM-generated ideas with automated evaluation to propose, test, and refine mathematical algorithms. In tests on 67 problems across analysis,…

jiqizhixin's tweet image. Can AI invent new math?

A new paper from DeepMind and renowned mathematician Terence Tao shows how.

Using AlphaEvolve, the team merges LLM-generated ideas with automated evaluation to propose, test, and refine mathematical algorithms.

In tests on 67 problems across analysis,…

Ajitesh Shukla reposted

Cosandal, Ulukus: Optimal Source Coding of Markov Chains for Real-Time Remote Es... arxiv.org/abs/2511.02803 arxiv.org/pdf/2511.02803 arxiv.org/html/2511.02803


Ajitesh Shukla reposted

Presenting today our work "Unsupervised Word-level Quality Estimation Through the Lens of Annotator (Dis)agreement" at the #EMNLP2025 Machine Translation morning session (Room A301, 11:45 China time). See you there! 🤗


Ajitesh Shukla reposted

I'm recruiting two #ComputerScience #PhD students in #MachineLearning and #AI4Science at @UAlbany @SUNY starting Fall 2026! Ad: chong-l.github.io/hiring.html


Ajitesh Shukla reposted

🥳🎉Sana-video inference code has been integrated into diffusers! Thanks to @lawrence_cjs @RisingSayak and the team for making it happen. huggingface.co/docs/diffusers…


Ajitesh Shukla reposted

Presenting Interactive Training today (led by @wtzhang0820)! Tune models like cooking: adjust the "heat" when the loss smells off 😄 🕟4:30-6pm • Hall C3 • Demo Session 5 Come talk to us! #EMNLP2025

Every time I watch models train, I wish I could tune LR on the fly. It's like cooking: we adjust the dial when the food smells off. We built Interactive Training to do that, turning loss monitoring into interaction. Paper👉huggingface.co/papers/2510.02… Led by @wtzhang0820 w/ Yang Lu

yuntiandeng's tweet image. Every time I watch models train, I wish I could tune LR on the fly.
It's like cooking: we adjust the dial when the food smells off.

We built Interactive Training to do that, turning loss monitoring into interaction.

Paper👉huggingface.co/papers/2510.02…
Led by @wtzhang0820 w/ Yang Lu


Ajitesh Shukla reposted

I’ve been so busy with trying to write my thesis and finish journal revisions that I am glad #EMNLP2025 forced me to take a break and scroll papers. Help me out by posting interesting papers!! Self promotion welcomed!


Ajitesh Shukla reposted

Ajitesh Shukla reposted

New paper drop! 🎙️ We beat GPT-5 with a 36B model 🤯🤯 Not just better in terms of completing real-world complex tasks: software engineering (locating code) and deep research. But also substantially better in terms of proactively asking for clarifying questions when necessary…

AI agents are supposed to collaborate with us to solve real-world problems, but can they really? Even the most advanced models can still give us frustrating moments when working with them deeply. We argue that real-world deployment requires more than productivity (e.g., task…

sunweiwei12's tweet image. AI agents are supposed to collaborate with us to solve real-world problems, but can they really? Even the most advanced models can still give us frustrating moments when working with them deeply.

We argue that real-world deployment requires more than productivity (e.g., task…


Ajitesh Shukla reposted

Comments welcome! With @RobinSFWalters and @yuqirose. “Symmetry in Neural Network Parameter Spaces” arxiv.org/abs/2506.13018


Ajitesh Shukla reposted

Parameter space symmetry describes transformation of parameters that leaves the loss unchanged.

BoZhao__'s tweet image. Parameter space symmetry describes transformation of parameters that leaves the loss unchanged.

Ajitesh Shukla reposted

There’s lots of symmetry in neural networks! 🔍 We survey where they appear, how they shape loss landscapes and learning dynamics, and applications in optimization, weight space learning, and much more. ➡️ Symmetry in Neural Network Parameter Spaces arxiv.org/abs/2506.13018

BoZhao__'s tweet image. There’s lots of symmetry in neural networks!

🔍 We survey where they appear, how they shape loss landscapes and learning dynamics, and applications in optimization, weight space learning, and much more.

➡️ Symmetry in Neural Network Parameter Spaces arxiv.org/abs/2506.13018

Ajitesh Shukla reposted

🤗Will present our #EMNLP2025 paper this morning! TLDR: Beyond KV Cache: New Insights on LLM Sparsity. This paper offers not just an efficient inference framework, but a new theoretical lens to understand how information flows inside LLMs. Come & talk to us if you are interested!

ZhijiangG's tweet image. 🤗Will present our #EMNLP2025 paper this morning! TLDR: Beyond KV Cache: New Insights on LLM Sparsity.
This paper offers not just an efficient inference framework, but a new theoretical lens to understand how information flows inside LLMs.
Come & talk to us if you are interested!

Ajitesh Shukla reposted

I think it's pretty wild that there's still no (publicly known) larger models than the Switch Transformer at 1.6T params, which was: - trained 2020, ie 5y ago - open-weights - by Barret, Liam, and Noam, what a line-up!

giffmana's tweet image. I think it's pretty wild that there's still no (publicly known) larger models than the Switch Transformer at 1.6T params, which was:
- trained 2020, ie 5y ago
- open-weights
- by Barret, Liam, and Noam, what a line-up!
giffmana's tweet image. I think it's pretty wild that there's still no (publicly known) larger models than the Switch Transformer at 1.6T params, which was:
- trained 2020, ie 5y ago
- open-weights
- by Barret, Liam, and Noam, what a line-up!

Apple just leaked the size of Gemini 3 Pro - 1.2T params

scaling01's tweet image. Apple just leaked the size of Gemini 3 Pro - 1.2T params


Ajitesh Shukla reposted

this is what's called "Noam's touch" btw

stalkermustang's tweet image. this is what's called "Noam's touch" btw

We ran 1,680 tournaments (25,200 rounds) to evaluate 8 frontier models. Claude Sonnet 4.5 tops the leaderboard, but no model wins across all arenas! GPT-5 dominates Poker. o3 crushes Halite. Claude owns Core War. Every arena reveals different strengths.

jyangballin's tweet image. We ran 1,680 tournaments (25,200 rounds) to evaluate 8 frontier models.

Claude Sonnet 4.5 tops the leaderboard, but no model wins across all arenas!

GPT-5 dominates Poker. o3 crushes Halite. Claude owns Core War.

Every arena reveals different strengths.


Ajitesh Shukla reposted

🧭 Siren’s Song in the AI Ocean — our survey on LLM hallucination will be presented at #EMNLP2025! We map the space of: • Hallucination phenomena • Detection & explanation • Mitigation strategies & future directions 📍 Poster Session 7 (Hall C) 🗓️ Fri, Nov 7 · 14:00–15:30…

yafuly's tweet image. 🧭 Siren’s Song in the AI Ocean — our survey on LLM hallucination will be presented at #EMNLP2025!

We map the space of:
• Hallucination phenomena
• Detection & explanation
• Mitigation strategies & future directions
📍 Poster Session 7 (Hall C)
🗓️ Fri, Nov 7 · 14:00–15:30…

Ajitesh Shukla reposted

Were you longing for counterintuitive and intriguing results? I have a surprising discovery on core principles of reinforcement learning that directly scales to high-dimensional MDPs! ✨NeurIPS Spotlight✨ Check out: Counteractive Reinforcement Learning #NeurIPS2025

EzgiKorkmazAI's tweet image. Were you longing for counterintuitive and intriguing results?

I have a surprising discovery on core principles of reinforcement learning that directly scales to high-dimensional MDPs!

✨NeurIPS Spotlight✨

Check out: Counteractive Reinforcement Learning #NeurIPS2025…

Ajitesh Shukla reposted

Don't sleep on PipelineRL -- this is one of the biggest jumps in compute efficiency of RL setups that we found in the ScaleRL paper (also validated by Magistral & others before)! What's the problem PipelineRL solves? In RL for LLMs, we need to send weight updates from trainer to…

agarwl_'s tweet image. Don't sleep on PipelineRL -- this is one of the biggest jumps in compute efficiency of RL setups that we found in the ScaleRL paper (also validated by Magistral & others before)!

What's the problem PipelineRL solves? In RL for LLMs, we need to send weight updates from trainer to…
agarwl_'s tweet image. Don't sleep on PipelineRL -- this is one of the biggest jumps in compute efficiency of RL setups that we found in the ScaleRL paper (also validated by Magistral & others before)!

What's the problem PipelineRL solves? In RL for LLMs, we need to send weight updates from trainer to…
agarwl_'s tweet image. Don't sleep on PipelineRL -- this is one of the biggest jumps in compute efficiency of RL setups that we found in the ScaleRL paper (also validated by Magistral & others before)!

What's the problem PipelineRL solves? In RL for LLMs, we need to send weight updates from trainer to…

In-flight weight updates have gone from a “weird trick” to a must to train LLMs with RL in the last few weeks. If you want to understand the on-policy and throughput benefits here’s the CoLM talk @DBahdanau and I gave: youtu.be/Z1uEuRKACRs

alexpiche_'s tweet card. Pipeline RL: RL training speed through the roofline

youtube.com

YouTube

Pipeline RL: RL training speed through the roofline



Loading...

Something went wrong.


Something went wrong.