Satyabrat Singh
@satyabratsingh
Interest in Software, ML, Quant Research, MSc in ML from UCL, MSc Maths from IIT
Some learnings while training large #NeuralNets for Quantitative dataset - Data, very important - ensure that data is correctly distributed and has variety of signals - Normalise before feeding to NN - Condense features before feeding - Attention is difficult - more to come :)…
So, some insights for me from Ilya's podcast - Continual learning would be more effective as it's same as evolution for humans - Some value functions might be ingrained in human genes. - Might have hit limit of scaling, so need new ways of pre-training
Following the Text Gradient at Scale We wrote a @StanfordAILab blog post about the limitations of RL methods that learn solely from scalar rewards + a new method that addresses this Blog: ai.stanford.edu/blog/feedback-… Paper: arxiv.org/abs/2511.07919
So contextual-based retrieval turns out to be effective. Even with very granular chunking, search performance improved, LLM judge gave higher scores.. More details on anthropic.com/engineering/co…
#Gemini3 is indeed good in reasoning tasks, helped to optimize a neural network, where it suggested, rather than optimizing the network, we could tweak the loss function, pretty smart !!!
If you want to learn AI from the experts, keep reading. 💡 Together with @UCL, we made a free AI Research Foundations curriculum – available now on Google Skills. With lessons from a Gemini Lead like @OriolVinyalsML, you'll explore how to code better, fine-tune an AI model and…
Glad to introduce our new work "Game-Theoretic Regularized Self-Play Alignment of Large Language Models". arxiv.org/abs/2503.00030 🎉 We introduce RSPO, a general, provably convergent framework to bring different regularization strategies into self-play alignment. 🧵👇
Thrilled to introduce our test-time algorithm for robust multi-objective alignment! Huge kudos to my incredible collaborators for making this happen!
❓No clue about the priorities of the objectives? ❗️ Focus on robustness at test-time! 🚀Robust Multi-Objective Decoding (RMOD) is a novel inference-time alignment algorithm that produces robust responses under multiple objectives to consider.
❓No clue about the priorities of the objectives? ❗️ Focus on robustness at test-time! 🚀Robust Multi-Objective Decoding (RMOD) is a novel inference-time alignment algorithm that produces robust responses under multiple objectives to consider.
🚀Sampling = Reinforcement Learning🤖 This means you can train a neural sampler using RL! We introduce the Value Gradient Sampler (VGS)—a novel diffusion sampler that leverages value functions to generate samples from an unnormalized density. 📄 Paper: arxiv.org/abs/2502.13280
(1/9) Flying to #NeurIPS2024 ? Our paper arxiv.org/abs/2405.20304 and blog shorturl.at/aIShm might be an interesting read on ur long flight to Vancouver! Accepted at #NeurIPS2024 and excited to present it as a poster on 13th December (1-4pm)!
On my way to #NeurIPS2024 ✈️ We are presenting several papers this year, including REDUCER, ARDT, GR-DPO/IPO, invariant BO. I’d love to connect and chat about topics like Alignment, RL/RLHF, LLM deception, robustness, and reasoning!
🚀🚀🚀 Introducing Adversarially Robust Decision Transformer (ARDT) 🚀🚀🚀 The first Decision Transformer for adversarial game-solving and robust decision-making, accepted to #NeurIPS #NeurIPS2024 🚨Change slightly : Replacing returns-to-go with minimax return. 🚨 Improve…
15 years ago today, I got a second chance at life… never realized how close death could be #MumbaiTerrorAttack #GratefulForLife
📣 If you've got an objective that exhibits symmetries, you should be using invariant kernel BO 📣 🚀 More sample efficient than constrained/naive BO! 🚀 More compute efficient than data augmentation! 🧵 1/4 #NeurIPS2024 #BayesianOptimisation #ai
This book is an absolute gem for understanding the intricacies of neural nets. Huge thanks to @SimonScardapane #MachineLearning #DeepLearning #AI
DeepSets are useful where we need permutation invariance. Imagine a batch of data with shape (n,m) —we split this batch into k sets, each of size (k,m) feed them through a neural network, and aggregate the outputs as: f(X) = ∑(i=1 to k) g(x_i). This method captures the essence…
The 2nd edition of my #ReinforcementLearning 477-page textbook for my course at ASU has just been published and is freely available at the book's website web.mit.edu/dimitrib/www/R… which also contains slides, videolectures, and supporting material
Competition Launch Alert! Realtime Marketdata hosted by @JaneStreetGroup 🎯 Challenge: Develop an ML forecasting model using real-world data derived from production systems 💰 Prize Pool: $120,000 ⏰ Entry Deadline: 12/30/2024 Explore the difficult dynamics that shape financial…
United States الاتجاهات
- 1. Mets 46.7K posts
- 2. Orioles 15.8K posts
- 3. Orioles 15.8K posts
- 4. #TheMaskedSinger N/A
- 5. David Stearns 4,881 posts
- 6. David Stearns 4,881 posts
- 7. Red Sox 1,866 posts
- 8. Nimmo 2,561 posts
- 9. Bregman 3,264 posts
- 10. AL East 4,031 posts
- 11. #Supergirl 40.7K posts
- 12. John Henry 1,033 posts
- 13. Bellinger 2,902 posts
- 14. Breslow 1,528 posts
- 15. Polar Bear 2,807 posts
- 16. #Birdland N/A
- 17. #MerryChristmasJustin 14K posts
- 18. Chris Davis N/A
- 19. Yankees 10.9K posts
- 20. Coby Mayo N/A
Something went wrong.
Something went wrong.