#reinforcementlearning search results

AGI.Eth

Sep 28

UserRL: Training Interactive User-Centric Agent via Reinforcement Learning Qian et al.: arxiv.org/abs/2509.19736 #ArtificialIntelligence #DeepLearning #ReinforcementLearning

ceobillionaire's tweet image. UserRL: Training Interactive User-Centric Agent via Reinforcement Learning

Qian et al.: arxiv.org/abs/2509.19736

#ArtificialIntelligence #DeepLearning #ReinforcementLearning

Alright let's do this 🔥 building Flappy Bird from scratch in Unity, then training an AI to master it sharing every win, every bug, every "why isn't this working" moment starts now. let's see where this goes follow for the journey → #ReinforcementLearning #gamedev

Vishal02__'s tweet image. Alright let's do this 🔥

building Flappy Bird from scratch in Unity, then training an AI to master it

sharing every win, every bug, every "why isn't this working" moment

starts now. let's see where this goes

follow for the journey →

#ReinforcementLearning #gamedev

Chenlu Ye

@ye_chenlu

Sep 5

PROF🌀Right answer, flawed reason?🤔🌀 📄arxiv.org/pdf/2509.03403 Excited to share our work: PROF-PRocess cOnsistency Filter! 🚀 Challenge: ORM is blind to flawed logic, and PRM suffers from reward hacking. Our method harmonizes strengths of PRM & ORM. #LLM #ReinforcementLearning

ye_chenlu's tweet image. PROF🌀Right answer, flawed reason?🤔🌀
📄arxiv.org/pdf/2509.03403
Excited to share our work: PROF-PRocess cOnsistency Filter! 🚀
Challenge: ORM is blind to flawed logic, and PRM suffers from reward hacking. Our method harmonizes strengths of PRM &amp; ORM. #LLM #ReinforcementLearning

AGI.Eth

@ceobillionaire

Nov 10

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? Yue et al.: arxiv.org/abs/2504.13837 #ArtificialIntelligence #DeepLearning #ReinforcementLearning

ceobillionaire's tweet image. Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Yue et al.: arxiv.org/abs/2504.13837

#ArtificialIntelligence #DeepLearning #ReinforcementLearning

Deluthium

@Deluthium

Oct 15

What if markets could think before they move? At #Deluthium, we treat liquidity as signal, not noise. #ReinforcementLearning turns execution into adaptive intelligence. Brought to you by the Onchain Flash Boys. Powered by RL.

Deluthium's tweet image. What if markets could think before they move?
At #Deluthium, we treat liquidity as signal, not noise.

#ReinforcementLearning turns execution into adaptive intelligence.

Brought to you by the Onchain Flash Boys.
Powered by RL.

DeepPCB

@DeepPCB

Oct 23

Manual 𝐏𝐂𝐁 𝐝𝐞𝐬𝐢𝐠𝐧 can’t keep up with today’s complexity. ✨ 𝐀𝐈 𝐜𝐚𝐧. 👉 Discover how @DeepPCB uses reinforcement learning to deliver DRC-clean layouts in hours in our new White Paper: link in comment! #PCBDesign #AIinEngineering #ReinforcementLearning #InstaDeep⁣

DeepPCB's tweet image. Manual 𝐏𝐂𝐁 𝐝𝐞𝐬𝐢𝐠𝐧 can’t keep up with today’s complexity. ✨ 𝐀𝐈 𝐜𝐚𝐧.
👉 Discover how @DeepPCB uses reinforcement learning to deliver DRC-clean layouts in hours in our new White Paper: link in comment!
#PCBDesign #AIinEngineering #ReinforcementLearning #InstaDeep⁣

Dr. Ganapathi Pulipaka 🇺🇸

@gp_pulipaka

Nov 9

Intro: #ReinforcementLearning. #BigData #Analytics #DataScience #AI #MachineLearning #IoT #IIoT #Python #RStats #TensorFlow #Java #JavaScript #ReactJS #CloudComputing #Serverless #DataScientist #Linux #Programming #Coding #100DaysofCode geni.us/Intro-RL

gp_pulipaka's tweet image. Intro: #ReinforcementLearning. #BigData #Analytics #DataScience #AI #MachineLearning #IoT #IIoT #Python #RStats #TensorFlow #Java #JavaScript #ReactJS #CloudComputing #Serverless #DataScientist #Linux #Programming #Coding #100DaysofCode
geni.us/Intro-RL

AI Ghana

@AIGhana1

Nov 12

Reinforcement learning helps an agent learn through trial, error and experience. What real world use of RL do you find most interesting? #AIGhana #ReinforcementLearning #MachineLearning #AI

AIGhana1's tweet image. Reinforcement learning helps an agent learn through trial, error and experience. What real world use of RL do you find most interesting?
#AIGhana #ReinforcementLearning #MachineLearning #AI

kedar

@_kedar_18

Sep 26

Day 12 🦾 of becoming an ML Beast: Explored Reinforcement Learning – where an agent interacts with an environment, takes actions, and learns from rewards to improve decisions over time. #MachineLearning #ReinforcementLearning #AI #Learninginpublic #100daysofcoding

_kedar_18's tweet image. Day 12 🦾 of becoming an ML Beast: Explored Reinforcement Learning – where an agent interacts with an environment, takes actions, and learns from rewards to improve decisions over time.

#MachineLearning #ReinforcementLearning #AI
#Learninginpublic #100daysofcoding

T.Yamazaki

@ZappyZappy7

Nov 5

汎用ロボットハンドの開発柔らかい物を摘まんだり、棚に商品を補充したり、バッグを持ち運んだりと様々なシナリオに適応 youtu.be/8gQ7qVmcKs0 #RobotHand #dexterous #ReinforcementLearning #EmbodiedAI #VLA #GeneralPurpose #haptic #touching #tactile #teleoperation #PsiBot

IMAGINE AI LIVE

@ImagineAILive

Sep 25

🚨 @CoreWeave x @OpenPipeAI 🚨 Reinforcement learning just got a hyperscaler boost. At Imagine AI Live 25, OpenPipe CEO Kyle Corbitt showed how RL: ⚡ Turns prototypes → production 📧 Built an email assistant that beat frontier models #ReinforcementLearning #AIinnovation

ImagineAILive's tweet image. 🚨 @CoreWeave x @OpenPipeAI 🚨

Reinforcement learning just got a hyperscaler boost.

At Imagine AI Live 25, OpenPipe CEO Kyle Corbitt showed how RL:

⚡ Turns prototypes → production
📧 Built an email assistant that beat frontier models

#ReinforcementLearning #AIinnovation

Antonio Lieto @[email protected]

@antoniolieto

Nov 11

#Attention + #ReinforcementLearning is not all you need: arxiv.org/abs/2504.13837

Deluthium

@Deluthium

Oct 21

Fast markets die first. Smart markets survive. #Deluthium uses #ReinforcementLearning to adapt in real time. Brought to you by the Onchain Flash Boys. Powered by RL.

Deluthium's tweet image. Fast markets die first.
Smart markets survive.

#Deluthium uses #ReinforcementLearning to adapt in real time.

Brought to you by the Onchain Flash Boys.
Powered by RL.

PRX Life

@PRX_Life

Oct 27

A new theory based on #ReinforcementLearning reveals the optimal pairing relationship between signal sensing and modulation and provides a new way to understand collective information processing in populations of cells. 🔗 go.aps.org/46RIIhh

PRX_Life's tweet image. A new theory based on #ReinforcementLearning reveals the optimal pairing relationship between signal sensing and modulation and provides a new way to understand collective information processing in populations of cells.

🔗 go.aps.org/46RIIhh

JohnSnowLabs

@JohnSnowLabs

Oct 16

What happens when reinforcement learning meets agentic AI? @Mastercard's Garima Arora shares insights on self-improving autonomous systems: hubs.li/Q03NN_Pv0 #AgenticAI #ReinforcementLearning

JohnSnowLabs's tweet image. What happens when reinforcement learning meets agentic AI? @Mastercard's Garima Arora shares insights on self-improving autonomous systems: hubs.li/Q03NN_Pv0 #AgenticAI #ReinforcementLearning

Deluthium

@Deluthium

Oct 7

What if liquidity could evolve on its own, adjusting, optimizing, adapting? #Deluthium doesn't just route your trade, we transform it into an intelligent liquidity signal. #ReinforcementLearning meets market-making. Brought to you by the Onchain Flash Boys. Powered by RL.

Deluthium

@Deluthium

Oct 9

Every swap, limit order, cross-chain action, it’s input to the #ReinforcementLearning engine. In #Deluthium, your request becomes part of the learning feedback loop. No black boxes. Full transparency. Brought to you by the Onchain Flash Boys. Powered by RL.

Deluthium's tweet image. Every swap, limit order, cross-chain action, it’s input to the #ReinforcementLearning engine.

In #Deluthium, your request becomes part of the learning feedback loop.

No black boxes. Full transparency.

Brought to you by the Onchain Flash Boys.
Powered by RL.

Amir

@amir81ak

Oct 15

SAPO (Swarm sAmpling Policy Optimization) redefines LLM post-training through collective reinforcement learning — models learn together, share insights, and reach 94% higher rewards with less compute. 🧠🤝 🔗 blog.gensyn.ai/sapo-efficient… #AI #LLMs #ReinforcementLearning #SAPO

amir81ak's tweet card. This is an academic paper describing SAPO, a meta-algorithm that wraps around your preferred policy gradient algorithm.

SAPO, Efficient LM Post-Training with Collective RL

Source: blog.gensyn.ai

J Synchrotron Rad

@JSynchrotronRad

3 h

Boltz, Ratner and Webb: Adaptive X-ray imaging with reinforcement learning #SynchrotronRadiation #XRayImaging #ReinforcementLearning @SLAClab... #IUCr journals.iucr.org/paper?S1600577…

Alistaired

@Alistaired_Van

7 h

Policy Iteration Reinforcement Learning: ouariachirafik.github.io/Compsci/Reinfo… #ReinforcementLearning #AI

COLLECTiEF project

@CollectiefP

18 h

📣New publication alert! How can #CollectiveIntelligence & #ReinforcementLearning boost building #energy efficiency & flexibility? Tested in a real living lab at G2Elab🇫🇷, CIRLEM achieved: ⚡-18% energy use 🔥-32% cooling, -5% heating 📉-50% peak power 🔗sciencedirect.com/science/articl…

CollectiefP's tweet image. 📣New publication alert!
How can #CollectiveIntelligence &amp; #ReinforcementLearning boost building #energy efficiency &amp; flexibility?

Tested in a real living lab at G2Elab🇫🇷, CIRLEM achieved:
⚡-18% energy use
🔥-32% cooling, -5% heating
📉-50% peak power

🔗sciencedirect.com/science/articl…

Frank La Vigne

@FrankDigsData

Nov 12

Reinforcement Learning (RL) isn't just static labels—it's iterative feedback. Like a game, models learn by interacting, acting, and receiving rewards. The better the action, the better the reward. #ReinforcementLearning #AI

Griffintaur

@griffintaur

Nov 12

🚀 Interestingresearch: Grounding Computer Use Agents on Human Demonstrations Read more: huggingface.co/papers/2511.07… #LLM #ReinforcementLearning #MLResearch

Paper page - Grounding Computer Use Agents on Human Demonstrations

Source: huggingface.co

Miklós Z. Horváth

@mzhorvath

Nov 12

This was a monumental team effort at Google DeepMind. I feel priviliged to have worked alongside such brilliant and dedicated group of colleagues. A huge thank you for the entire team and the project leads for making this work possible. #AlphaProof #AI #ReinforcementLearning…

Vaijayanth

@vaijayanth

Nov 12

AI gets smarter only when its outputs are measured. Feedback from human evaluation, user interaction, and system telemetry forms the backbone of continual optimization. No feedback, no intelligence. #AIQuality #ReinforcementLearning #AIOps

yuki

@Nakocha37254585

Nov 12

The swarm evolves — harder tasks, smarter agents, stronger results 🔁 Join the next wave of decentralized AI learning 🌐 🔗 blog.gensyn.ai/codezero-exten… #AI #ReinforcementLearning #SwarmLearning #CodeZero #RLswarm #GensynAI.

Alistaired

@Alistaired_Van

Nov 12

Introduction to Reinforcement learning: ouariachirafik.github.io/Compsci/Reinfo… #ReinforcementLearning #AI

Cátedra Ciberseguridad INCIBE-UEx-EPCC

@ciber_uex

Nov 12

Nuestro compañero Hubert presentó en #JITEL2025 (Cáceres) nuestro protocolo de enrutamiento con #ReinforcementLearning para #IoUT: +PDR, menos varianza y rendimiento estable en movilidad frente a baselines. ¡Gracias por el feedback! 📊🌊

ciber_uex's tweet image. Nuestro compañero Hubert presentó en #JITEL2025 (Cáceres) nuestro protocolo de enrutamiento con #ReinforcementLearning para #IoUT: +PDR, menos varianza y rendimiento estable en movilidad frente a baselines.

¡Gracias por el feedback! 📊🌊

AI Ghana

@AIGhana1

Nov 12

Reinforcement learning helps an agent learn through trial, error and experience. What real world use of RL do you find most interesting? #AIGhana #ReinforcementLearning #MachineLearning #AI

Chanakya Ekbote

@thecekbote

Nov 12

8/ GRPO trains models to reason. MURPHY trains models to reflect. A step toward robust, self-correcting code generation. #ReinforcementLearning #LLMs #AWS #AmazonScience

Daivik Hirpara

@Hirpara_Daivik

Nov 12

Reward drives behavior. Gradients drive rewards. Welcome to RL. #ReinforcementLearning #AI #MachineLearning #DeepRL

Valōtan Māel-T'lawn Phurkrow

@phurkrow

Nov 11

Here's the hook: Zazu bifurcates execution (HRL Body) & interpretability (LLM Mind) for resilient trading agents, using MOO to self-evolve without forgetting. Full paper soon! @arXiv_Lab @RLHF #csLG #ReinforcementLearning

Valōtan Māel-T'lawn Phurkrow

@phurkrow

Nov 11

@phurkrow submitting my first paper to arXiv cs.LG: "The Zazu Architecture" - a hybrid HRL-LLM framework to mitigate catastrophic forgetfulness in financial trading agents. Seeking endorsement—DM or reply if you can help! #csLG #ReinforcementLearning #AI

phurkrow's tweet image. @phurkrow submitting my first paper to arXiv cs.LG: "The Zazu Architecture" - a hybrid HRL-LLM framework to mitigate catastrophic forgetfulness in financial trading agents. Seeking endorsement—DM or reply if you can help! #csLG #ReinforcementLearning #AI

Antonio Lieto @[email protected]

@antoniolieto

Nov 11

#Attention + #ReinforcementLearning is not all you need: arxiv.org/abs/2504.13837

Dr. Ganapathi Pulipaka 🇺🇸

@gp_pulipaka

Nov 11

Asynchronous #ReinforcementLearning #Algorithm! #BigData #Analytics #DataScience #AI #MachineLearning #IoT #IIoT #PyTorch #Python #RStats #TensorFlow #ReactJS #GoLang #CloudComputing #Serverless #DataScientist #Linux #Programming #Coding #100DaysofCode geni.us/Asynchronous-R…

gp_pulipaka's tweet image. Asynchronous #ReinforcementLearning #Algorithm! #BigData #Analytics #DataScience #AI #MachineLearning #IoT #IIoT #PyTorch #Python #RStats #TensorFlow #ReactJS #GoLang #CloudComputing #Serverless #DataScientist #Linux #Programming #Coding #100DaysofCode
geni.us/Asynchronous-R…

Dr. Ganapathi Pulipaka 🇺🇸

@gp_pulipaka

Nov 11

Intro: A Journey Toward #ReinforcementLearning. #BigData #Analytics #DataScience #AI #MachineLearning #IoT #IIoT #Python #RStats #TensorFlow #Java #JavaScript #ReactJS #CloudComputing #Serverless #DataScientist #Linux #Programming #Coding #100DaysofCode geni.us/Journey--RL

gp_pulipaka's tweet image. Intro: A Journey Toward #ReinforcementLearning. #BigData #Analytics #DataScience #AI #MachineLearning #IoT #IIoT #Python #RStats #TensorFlow #Java #JavaScript #ReactJS #CloudComputing #Serverless #DataScientist #Linux #Programming #Coding #100DaysofCode
geni.us/Journey--RL

Aditya Choudhary

@buildaditya

Nov 11

Most AI agents don’t fail because the model is weak — they fail because the feedback loop is. If you can’t measure progress, your agent can’t learn. #AI #ReinforcementLearning #Agents

Daivik Hirpara

@Hirpara_Daivik

Nov 11

Before GPUs and gradient descent, there was MENACE , the Machine Educable Noughts And Crosses Engine. A matchbox RL system that learned Tic-Tac-Toe with beads. The first real machine that learned. #AI #ReinforcementLearning #HistoryOfAI

Technion - Reinforcement Learning Research Labs

@Technion_RL

ReinforcementLearning

@ReinforcementL3

Yu-Xiang Wang

@yuxiangw_cs

ReinforcementLearning

@ReinforcementL

Ofir Nachum

@ofirnachum

Sasha Alexander Lambert

@SashLambert

Daniel J. Mankowitz

@DJ_Mankowitz

James

@jmac_ai

CogitAI

@Cogitai

Joseph Cox

@JosephJohnCox

Seydina Ndiaye

@seysoosey

Jacqueline Isabelle Forien

@JackieForien

robertjneal

@robertjneal

Ashish Umre

@hormigaloca

AGI.Eth

@ceobillionaire

Sep 28

UserRL: Training Interactive User-Centric Agent via Reinforcement Learning Qian et al.: arxiv.org/abs/2509.19736 #ArtificialIntelligence #DeepLearning #ReinforcementLearning

Deluthium

@Deluthium

Oct 15

DeepPCB

@DeepPCB

Oct 23

Chenlu Ye

@ye_chenlu

Sep 5

kedar

@_kedar_18

Sep 26

Deluthium

@Deluthium

Oct 7

Deluthium

@Deluthium

Oct 9

AGI.Eth

@ceobillionaire

Jun 1

A Tutorial on Meta-Reinforcement Learning Beck et al.: arxiv.org/abs/2301.08028 #ArtificialIntelligence #MetaLearning #ReinforcementLearning

ceobillionaire's tweet image. A Tutorial on Meta-Reinforcement Learning

Beck et al.: arxiv.org/abs/2301.08028

#ArtificialIntelligence #MetaLearning #ReinforcementLearning

AGI.Eth

@ceobillionaire

May 28

Reinforcing General Reasoning without Verifiers Zhou et al.: arxiv.org/abs/2505.21493 #ArtificialIntelligence #DeepLearning #ReinforcementLearning

ceobillionaire's tweet image. Reinforcing General Reasoning without Verifiers

Zhou et al.: arxiv.org/abs/2505.21493

#ArtificialIntelligence #DeepLearning #ReinforcementLearning

SA News Channel

@SatlokChannel

Aug 11

7/10 Reinforcement Learning trains agents through trial and error to maximize rewards. It’s used in gaming, robotics, and real-time decision systems like traffic control. #ReinforcementLearning #AI #SmartSystems #DeepLearning #GameAI #AutonomousTech

SatlokChannel's tweet image. 7/10
Reinforcement Learning trains agents through trial and error to maximize rewards. It’s used in gaming, robotics, and real-time decision systems like traffic control.
#ReinforcementLearning #AI #SmartSystems #DeepLearning #GameAI #AutonomousTech

PRX Life

@PRX_Life

Oct 27

Zichen Chen (🐱,💖)

@my_cat_can_code

Sep 26, 2024

🚀 Exciting News! Our paper has been accepted at @NeurIPSConf! 🎉 We introduce State Chrono Representation (SCR) -- a novel approach in #ReinforcementLearning. SCR integrates long-term temporal dynamics and cumulative rewards into state representations, addressing key challenges…

my_cat_can_code's tweet image. 🚀 Exciting News! Our paper has been accepted at @NeurIPSConf! 🎉
We introduce State Chrono Representation (SCR) -- a novel approach in #ReinforcementLearning. SCR integrates long-term temporal dynamics and cumulative rewards into state representations, addressing key challenges…

Glen Berseth

@GlenBerseth

Apr 15

I am hiring another postdoc for my lab. Consider applying if you have #foundationalmodels, #robotics, or #reinforcementlearning skills. You will help create generalist real-world agents (robots) with a team of 20 working on these problems and overly competitive go-karting.

GlenBerseth's tweet image. I am hiring another postdoc for my lab. Consider applying if you have #foundationalmodels, #robotics, or #reinforcementlearning skills. You will help create generalist real-world agents (robots) with a team of 20 working on these problems and overly competitive go-karting.

AGI.Eth

@ceobillionaire

May 4

Automated Design of Agentic Systems Shengran Hu, Cong Lu, Jeff Clune: arxiv.org/abs/2408.08435 #ArtificialIntelligence #DeepLearning #ReinforcementLearning

ceobillionaire's tweet image. Automated Design of Agentic Systems

Shengran Hu, Cong Lu, Jeff Clune: arxiv.org/abs/2408.08435

#ArtificialIntelligence #DeepLearning #ReinforcementLearning

Dr. Ganapathi Pulipaka 🇺🇸

@gp_pulipaka

Oct 19

Deep #ReinforcementLearning for #Keras! #BigData #Analytics #DataScience #AI #MachineLearning #IoT #IIoT #Python #RStats #TensorFlow #Java #JavaScript #ReactJS #CloudComputing #Serverless #DataScientist #Linux #Programming #Coding #100DaysofCode geni.us/DRL-Keras

gp_pulipaka's tweet image. Deep #ReinforcementLearning for #Keras! #BigData #Analytics #DataScience #AI #MachineLearning #IoT #IIoT #Python #RStats #TensorFlow #Java #JavaScript #ReactJS #CloudComputing #Serverless #DataScientist #Linux #Programming #Coding #100DaysofCode
geni.us/DRL-Keras

AGI.Eth

@ceobillionaire

Jul 13

The Bitter Lesson "Search and learning are general purpose methods that continue to scale with increased computation, even as the available computation becomes very great." — Richard Sutton Rich Sutton: incompleteideas.net/IncIdeas/Bitte… #ReinforcementLearning

ceobillionaire's tweet image. The Bitter Lesson

"Search and learning are general purpose methods that continue to scale with increased computation, even as the available computation becomes very great." — Richard Sutton

Rich Sutton: incompleteideas.net/IncIdeas/Bitte…

#ReinforcementLearning

Deluthium

@Deluthium

Oct 21

Fast markets die first. Smart markets survive. #Deluthium uses #ReinforcementLearning to adapt in real time. Brought to you by the Onchain Flash Boys. Powered by RL.

Mike Erlihson, Math PhD, AI

@MikeE_3_14

Sep 19

🔥: פרק חדש ב-DataScienceDecoded! 😎חזרנו ל-1957 עם המאמר האגדי של ריצ'רד בלמן: A Markovian Decision Process מכאן נולד עקרון האופטימליות, משוואת בלמן ו-MDP – הבסיס ל-RL מודרני, מ-Q-learning ועד AlphaGo #AI #ReinforcementLearning #Bellman