#reinforcementlearning search results

UserRL: Training Interactive User-Centric Agent via Reinforcement Learning Qian et al.: arxiv.org/abs/2509.19736 #ArtificialIntelligence #DeepLearning #ReinforcementLearning

ceobillionaire's tweet image. UserRL: Training Interactive User-Centric Agent via Reinforcement Learning

Qian et al.: arxiv.org/abs/2509.19736

#ArtificialIntelligence #DeepLearning #ReinforcementLearning

Alright let's do this 🔥 building Flappy Bird from scratch in Unity, then training an AI to master it sharing every win, every bug, every "why isn't this working" moment starts now. let's see where this goes follow for the journey → #ReinforcementLearning #gamedev

Vishal02__'s tweet image. Alright let's do this 🔥

building Flappy Bird from scratch in Unity, then training an AI to master it

sharing every win, every bug, every "why isn't this working" moment

starts now. let's see where this goes

follow for the journey →

#ReinforcementLearning #gamedev

PROF🌀Right answer, flawed reason?🤔🌀 📄arxiv.org/pdf/2509.03403 Excited to share our work: PROF-PRocess cOnsistency Filter! 🚀 Challenge: ORM is blind to flawed logic, and PRM suffers from reward hacking. Our method harmonizes strengths of PRM & ORM. #LLM #ReinforcementLearning

ye_chenlu's tweet image. PROF🌀Right answer, flawed reason?🤔🌀
📄arxiv.org/pdf/2509.03403
Excited to share our work: PROF-PRocess cOnsistency Filter! 🚀
Challenge: ORM is blind to flawed logic, and PRM suffers from reward hacking. Our method harmonizes strengths of PRM & ORM. #LLM #ReinforcementLearning
ye_chenlu's tweet image. PROF🌀Right answer, flawed reason?🤔🌀
📄arxiv.org/pdf/2509.03403
Excited to share our work: PROF-PRocess cOnsistency Filter! 🚀
Challenge: ORM is blind to flawed logic, and PRM suffers from reward hacking. Our method harmonizes strengths of PRM & ORM. #LLM #ReinforcementLearning

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? Yue et al.: arxiv.org/abs/2504.13837 #ArtificialIntelligence #DeepLearning #ReinforcementLearning

ceobillionaire's tweet image. Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Yue et al.: arxiv.org/abs/2504.13837

#ArtificialIntelligence #DeepLearning #ReinforcementLearning

What if markets could think before they move? At #Deluthium, we treat liquidity as signal, not noise. #ReinforcementLearning turns execution into adaptive intelligence. Brought to you by the Onchain Flash Boys. Powered by RL.

Deluthium's tweet image. What if markets could think before they move?
At #Deluthium, we treat liquidity as signal, not noise.

#ReinforcementLearning turns execution into adaptive intelligence.

Brought to you by the Onchain Flash Boys.
Powered by RL.

Manual 𝐏𝐂𝐁 𝐝𝐞𝐬𝐢𝐠𝐧 can’t keep up with today’s complexity. ✨ 𝐀𝐈 𝐜𝐚𝐧. 👉 Discover how @DeepPCB uses reinforcement learning to deliver DRC-clean layouts in hours in our new White Paper: link in comment! #PCBDesign #AIinEngineering #ReinforcementLearning #InstaDeep

DeepPCB's tweet image. Manual 𝐏𝐂𝐁 𝐝𝐞𝐬𝐢𝐠𝐧 can’t keep up with today’s complexity. ✨ 𝐀𝐈 𝐜𝐚𝐧. 
👉 Discover how @DeepPCB uses reinforcement learning to deliver DRC-clean layouts in hours in our new White Paper: link in comment!
#PCBDesign #AIinEngineering #ReinforcementLearning #InstaDeep⁣

Reinforcement learning helps an agent learn through trial, error and experience. What real world use of RL do you find most interesting? #AIGhana #ReinforcementLearning #MachineLearning #AI

AIGhana1's tweet image. Reinforcement learning helps an agent learn through trial, error and experience. What real world use of RL do you find most interesting?
#AIGhana #ReinforcementLearning #MachineLearning #AI

Day 12 🦾 of becoming an ML Beast: Explored Reinforcement Learning – where an agent interacts with an environment, takes actions, and learns from rewards to improve decisions over time. #MachineLearning #ReinforcementLearning #AI #Learninginpublic #100daysofcoding

_kedar_18's tweet image. Day 12 🦾 of becoming an ML Beast: Explored Reinforcement Learning – where an agent interacts with an environment, takes actions, and learns from rewards to improve decisions over time. 

#MachineLearning #ReinforcementLearning #AI 
#Learninginpublic #100daysofcoding
_kedar_18's tweet image. Day 12 🦾 of becoming an ML Beast: Explored Reinforcement Learning – where an agent interacts with an environment, takes actions, and learns from rewards to improve decisions over time. 

#MachineLearning #ReinforcementLearning #AI 
#Learninginpublic #100daysofcoding

汎用ロボットハンドの開発 柔らかい物を摘まんだり、棚に商品を補充したり、バッグを持ち運んだりと様々なシナリオに適応 youtu.be/8gQ7qVmcKs0 #RobotHand #dexterous #ReinforcementLearning #EmbodiedAI #VLA #GeneralPurpose #haptic #touching #tactile #teleoperation #PsiBot


🚨 @CoreWeave x @OpenPipeAI 🚨 Reinforcement learning just got a hyperscaler boost. At Imagine AI Live 25, OpenPipe CEO Kyle Corbitt showed how RL: ⚡ Turns prototypes → production 📧 Built an email assistant that beat frontier models #ReinforcementLearning #AIinnovation

ImagineAILive's tweet image. 🚨 @CoreWeave  x @OpenPipeAI  🚨

Reinforcement learning just got a hyperscaler boost.

At Imagine AI Live 25, OpenPipe CEO Kyle Corbitt showed how RL:

⚡ Turns prototypes → production
📧 Built an email assistant that beat frontier models

#ReinforcementLearning #AIinnovation

Fast markets die first. Smart markets survive. #Deluthium uses #ReinforcementLearning to adapt in real time. Brought to you by the Onchain Flash Boys. Powered by RL.

Deluthium's tweet image. Fast markets die first.
Smart markets survive.

#Deluthium uses #ReinforcementLearning to adapt in real time.

Brought to you by the Onchain Flash Boys.
Powered by RL.

A new theory based on #ReinforcementLearning reveals the optimal pairing relationship between signal sensing and modulation and provides a new way to understand collective information processing in populations of cells. 🔗 go.aps.org/46RIIhh

PRX_Life's tweet image. A new theory based on #ReinforcementLearning reveals the optimal pairing relationship between signal sensing and modulation and provides a new way to understand collective information processing in populations of cells. 

🔗 go.aps.org/46RIIhh

What happens when reinforcement learning meets agentic AI? @Mastercard's Garima Arora shares insights on self-improving autonomous systems: hubs.li/Q03NN_Pv0 #AgenticAI #ReinforcementLearning

JohnSnowLabs's tweet image. What happens when reinforcement learning meets agentic AI? @Mastercard's Garima Arora shares insights on self-improving autonomous systems: hubs.li/Q03NN_Pv0 #AgenticAI #ReinforcementLearning

What if liquidity could evolve on its own, adjusting, optimizing, adapting? #Deluthium doesn't just route your trade, we transform it into an intelligent liquidity signal. #ReinforcementLearning meets market-making. Brought to you by the Onchain Flash Boys. Powered by RL.

Deluthium's tweet image. What if liquidity could evolve on its own, adjusting, optimizing, adapting?

#Deluthium doesn't just route your trade, we transform it into an intelligent liquidity signal.

#ReinforcementLearning meets market-making.

Brought to you by the Onchain Flash Boys.
Powered by RL.

Every swap, limit order, cross-chain action, it’s input to the #ReinforcementLearning engine. In #Deluthium, your request becomes part of the learning feedback loop. No black boxes. Full transparency. Brought to you by the Onchain Flash Boys. Powered by RL.

Deluthium's tweet image. Every swap, limit order, cross-chain action, it’s input to the #ReinforcementLearning engine.

In #Deluthium, your request becomes part of the learning feedback loop.

No black boxes. Full transparency.

Brought to you by the Onchain Flash Boys.
Powered by RL.

SAPO (Swarm sAmpling Policy Optimization) redefines LLM post-training through collective reinforcement learning — models learn together, share insights, and reach 94% higher rewards with less compute. 🧠🤝 🔗 blog.gensyn.ai/sapo-efficient… #AI #LLMs #ReinforcementLearning #SAPO


📣New publication alert! How can #CollectiveIntelligence & #ReinforcementLearning boost building #energy efficiency & flexibility? Tested in a real living lab at G2Elab🇫🇷, CIRLEM achieved: ⚡-18% energy use 🔥-32% cooling, -5% heating 📉-50% peak power 🔗sciencedirect.com/science/articl…

CollectiefP's tweet image. 📣New publication alert!
How can #CollectiveIntelligence & #ReinforcementLearning boost building #energy efficiency & flexibility?

Tested in a real living lab at G2Elab🇫🇷, CIRLEM achieved:
⚡-18% energy use
🔥-32% cooling, -5% heating
📉-50% peak power

🔗sciencedirect.com/science/articl…

Reinforcement Learning (RL) isn't just static labels—it's iterative feedback. Like a game, models learn by interacting, acting, and receiving rewards. The better the action, the better the reward. #ReinforcementLearning #AI


This was a monumental team effort at Google DeepMind. I feel priviliged to have worked alongside such brilliant and dedicated group of colleagues. A huge thank you for the entire team and the project leads for making this work possible. #AlphaProof #AI #ReinforcementLearning


AI gets smarter only when its outputs are measured. Feedback from human evaluation, user interaction, and system telemetry forms the backbone of continual optimization. No feedback, no intelligence. #AIQuality #ReinforcementLearning #AIOps


The swarm evolves — harder tasks, smarter agents, stronger results 🔁 Join the next wave of decentralized AI learning 🌐 🔗 blog.gensyn.ai/codezero-exten… #AI #ReinforcementLearning #SwarmLearning #CodeZero #RLswarm #GensynAI.


Nuestro compañero Hubert presentó en #JITEL2025 (Cáceres) nuestro protocolo de enrutamiento con #ReinforcementLearning para #IoUT: +PDR, menos varianza y rendimiento estable en movilidad frente a baselines. ¡Gracias por el feedback! 📊🌊

ciber_uex's tweet image. Nuestro compañero Hubert presentó en #JITEL2025 (Cáceres) nuestro protocolo de enrutamiento con #ReinforcementLearning para #IoUT: +PDR, menos varianza y rendimiento estable en movilidad frente a baselines. 

¡Gracias por el feedback! 📊🌊
ciber_uex's tweet image. Nuestro compañero Hubert presentó en #JITEL2025 (Cáceres) nuestro protocolo de enrutamiento con #ReinforcementLearning para #IoUT: +PDR, menos varianza y rendimiento estable en movilidad frente a baselines. 

¡Gracias por el feedback! 📊🌊
ciber_uex's tweet image. Nuestro compañero Hubert presentó en #JITEL2025 (Cáceres) nuestro protocolo de enrutamiento con #ReinforcementLearning para #IoUT: +PDR, menos varianza y rendimiento estable en movilidad frente a baselines. 

¡Gracias por el feedback! 📊🌊
ciber_uex's tweet image. Nuestro compañero Hubert presentó en #JITEL2025 (Cáceres) nuestro protocolo de enrutamiento con #ReinforcementLearning para #IoUT: +PDR, menos varianza y rendimiento estable en movilidad frente a baselines. 

¡Gracias por el feedback! 📊🌊

Reinforcement learning helps an agent learn through trial, error and experience. What real world use of RL do you find most interesting? #AIGhana #ReinforcementLearning #MachineLearning #AI

AIGhana1's tweet image. Reinforcement learning helps an agent learn through trial, error and experience. What real world use of RL do you find most interesting?
#AIGhana #ReinforcementLearning #MachineLearning #AI

8/ GRPO trains models to reason. MURPHY trains models to reflect. A step toward robust, self-correcting code generation. #ReinforcementLearning #LLMs #AWS #AmazonScience


Reward drives behavior. Gradients drive rewards. Welcome to RL. #ReinforcementLearning #AI #MachineLearning #DeepRL

Hirpara_Daivik's tweet image. Reward drives behavior. Gradients drive rewards. Welcome to RL.
#ReinforcementLearning #AI #MachineLearning #DeepRL

Here's the hook: Zazu bifurcates execution (HRL Body) & interpretability (LLM Mind) for resilient trading agents, using MOO to self-evolve without forgetting. Full paper soon! @arXiv_Lab @RLHF #csLG #ReinforcementLearning


@phurkrow submitting my first paper to arXiv cs.LG: "The Zazu Architecture" - a hybrid HRL-LLM framework to mitigate catastrophic forgetfulness in financial trading agents. Seeking endorsement—DM or reply if you can help! #csLG #ReinforcementLearning #AI

phurkrow's tweet image. @phurkrow submitting my first paper to arXiv cs.LG: "The Zazu Architecture" - a hybrid HRL-LLM framework to mitigate catastrophic forgetfulness in financial trading agents.  Seeking endorsement—DM or reply if you can help! #csLG #ReinforcementLearning #AI

Most AI agents don’t fail because the model is weak — they fail because the feedback loop is. If you can’t measure progress, your agent can’t learn. #AI #ReinforcementLearning #Agents


Before GPUs and gradient descent, there was MENACE , the Machine Educable Noughts And Crosses Engine. A matchbox RL system that learned Tic-Tac-Toe with beads. The first real machine that learned. #AI #ReinforcementLearning #HistoryOfAI


UserRL: Training Interactive User-Centric Agent via Reinforcement Learning Qian et al.: arxiv.org/abs/2509.19736 #ArtificialIntelligence #DeepLearning #ReinforcementLearning

ceobillionaire's tweet image. UserRL: Training Interactive User-Centric Agent via Reinforcement Learning

Qian et al.: arxiv.org/abs/2509.19736

#ArtificialIntelligence #DeepLearning #ReinforcementLearning

What if markets could think before they move? At #Deluthium, we treat liquidity as signal, not noise. #ReinforcementLearning turns execution into adaptive intelligence. Brought to you by the Onchain Flash Boys. Powered by RL.

Deluthium's tweet image. What if markets could think before they move?
At #Deluthium, we treat liquidity as signal, not noise.

#ReinforcementLearning turns execution into adaptive intelligence.

Brought to you by the Onchain Flash Boys.
Powered by RL.

Manual 𝐏𝐂𝐁 𝐝𝐞𝐬𝐢𝐠𝐧 can’t keep up with today’s complexity. ✨ 𝐀𝐈 𝐜𝐚𝐧. 👉 Discover how @DeepPCB uses reinforcement learning to deliver DRC-clean layouts in hours in our new White Paper: link in comment! #PCBDesign #AIinEngineering #ReinforcementLearning #InstaDeep

DeepPCB's tweet image. Manual 𝐏𝐂𝐁 𝐝𝐞𝐬𝐢𝐠𝐧 can’t keep up with today’s complexity. ✨ 𝐀𝐈 𝐜𝐚𝐧. 
👉 Discover how @DeepPCB uses reinforcement learning to deliver DRC-clean layouts in hours in our new White Paper: link in comment!
#PCBDesign #AIinEngineering #ReinforcementLearning #InstaDeep⁣

PROF🌀Right answer, flawed reason?🤔🌀 📄arxiv.org/pdf/2509.03403 Excited to share our work: PROF-PRocess cOnsistency Filter! 🚀 Challenge: ORM is blind to flawed logic, and PRM suffers from reward hacking. Our method harmonizes strengths of PRM & ORM. #LLM #ReinforcementLearning

ye_chenlu's tweet image. PROF🌀Right answer, flawed reason?🤔🌀
📄arxiv.org/pdf/2509.03403
Excited to share our work: PROF-PRocess cOnsistency Filter! 🚀
Challenge: ORM is blind to flawed logic, and PRM suffers from reward hacking. Our method harmonizes strengths of PRM & ORM. #LLM #ReinforcementLearning
ye_chenlu's tweet image. PROF🌀Right answer, flawed reason?🤔🌀
📄arxiv.org/pdf/2509.03403
Excited to share our work: PROF-PRocess cOnsistency Filter! 🚀
Challenge: ORM is blind to flawed logic, and PRM suffers from reward hacking. Our method harmonizes strengths of PRM & ORM. #LLM #ReinforcementLearning

Day 12 🦾 of becoming an ML Beast: Explored Reinforcement Learning – where an agent interacts with an environment, takes actions, and learns from rewards to improve decisions over time. #MachineLearning #ReinforcementLearning #AI #Learninginpublic #100daysofcoding

_kedar_18's tweet image. Day 12 🦾 of becoming an ML Beast: Explored Reinforcement Learning – where an agent interacts with an environment, takes actions, and learns from rewards to improve decisions over time. 

#MachineLearning #ReinforcementLearning #AI 
#Learninginpublic #100daysofcoding
_kedar_18's tweet image. Day 12 🦾 of becoming an ML Beast: Explored Reinforcement Learning – where an agent interacts with an environment, takes actions, and learns from rewards to improve decisions over time. 

#MachineLearning #ReinforcementLearning #AI 
#Learninginpublic #100daysofcoding

What if liquidity could evolve on its own, adjusting, optimizing, adapting? #Deluthium doesn't just route your trade, we transform it into an intelligent liquidity signal. #ReinforcementLearning meets market-making. Brought to you by the Onchain Flash Boys. Powered by RL.

Deluthium's tweet image. What if liquidity could evolve on its own, adjusting, optimizing, adapting?

#Deluthium doesn't just route your trade, we transform it into an intelligent liquidity signal.

#ReinforcementLearning meets market-making.

Brought to you by the Onchain Flash Boys.
Powered by RL.

Every swap, limit order, cross-chain action, it’s input to the #ReinforcementLearning engine. In #Deluthium, your request becomes part of the learning feedback loop. No black boxes. Full transparency. Brought to you by the Onchain Flash Boys. Powered by RL.

Deluthium's tweet image. Every swap, limit order, cross-chain action, it’s input to the #ReinforcementLearning engine.

In #Deluthium, your request becomes part of the learning feedback loop.

No black boxes. Full transparency.

Brought to you by the Onchain Flash Boys.
Powered by RL.

7/10 Reinforcement Learning trains agents through trial and error to maximize rewards. It’s used in gaming, robotics, and real-time decision systems like traffic control. #ReinforcementLearning #AI #SmartSystems #DeepLearning #GameAI #AutonomousTech

SatlokChannel's tweet image. 7/10
Reinforcement Learning trains agents through trial and error to maximize rewards. It’s used in gaming, robotics, and real-time decision systems like traffic control.  
#ReinforcementLearning #AI #SmartSystems #DeepLearning #GameAI #AutonomousTech

A new theory based on #ReinforcementLearning reveals the optimal pairing relationship between signal sensing and modulation and provides a new way to understand collective information processing in populations of cells. 🔗 go.aps.org/46RIIhh

PRX_Life's tweet image. A new theory based on #ReinforcementLearning reveals the optimal pairing relationship between signal sensing and modulation and provides a new way to understand collective information processing in populations of cells. 

🔗 go.aps.org/46RIIhh

🚀 Exciting News! Our paper has been accepted at @NeurIPSConf! 🎉 We introduce State Chrono Representation (SCR) -- a novel approach in #ReinforcementLearning. SCR integrates long-term temporal dynamics and cumulative rewards into state representations, addressing key challenges…

my_cat_can_code's tweet image. 🚀 Exciting News! Our paper has been accepted at @NeurIPSConf! 🎉
We introduce State Chrono Representation (SCR) -- a novel approach in #ReinforcementLearning. SCR integrates long-term temporal dynamics and cumulative rewards into state representations, addressing key challenges…

I am hiring another postdoc for my lab. Consider applying if you have #foundationalmodels, #robotics, or #reinforcementlearning skills. You will help create generalist real-world agents (robots) with a team of 20 working on these problems and overly competitive go-karting.

GlenBerseth's tweet image. I am hiring another postdoc for my lab. Consider applying if you have #foundationalmodels, #robotics, or #reinforcementlearning skills. You will help create generalist real-world agents (robots) with a team of 20 working on these problems and overly competitive go-karting.

Automated Design of Agentic Systems Shengran Hu, Cong Lu, Jeff Clune: arxiv.org/abs/2408.08435 #ArtificialIntelligence #DeepLearning #ReinforcementLearning

ceobillionaire's tweet image. Automated Design of Agentic Systems

Shengran Hu, Cong Lu, Jeff Clune: arxiv.org/abs/2408.08435

#ArtificialIntelligence #DeepLearning #ReinforcementLearning

The Bitter Lesson "Search and learning are general purpose methods that continue to scale with increased computation, even as the available computation becomes very great." — Richard Sutton Rich Sutton: incompleteideas.net/IncIdeas/Bitte… #ReinforcementLearning

ceobillionaire's tweet image. The Bitter Lesson

"Search and learning are general purpose methods that continue to scale with increased computation, even as the available computation becomes very great." — Richard Sutton

Rich Sutton: incompleteideas.net/IncIdeas/Bitte…

#ReinforcementLearning

Fast markets die first. Smart markets survive. #Deluthium uses #ReinforcementLearning to adapt in real time. Brought to you by the Onchain Flash Boys. Powered by RL.

Deluthium's tweet image. Fast markets die first.
Smart markets survive.

#Deluthium uses #ReinforcementLearning to adapt in real time.

Brought to you by the Onchain Flash Boys.
Powered by RL.

🔥: פרק חדש ב-DataScienceDecoded! 😎חזרנו ל-1957 עם המאמר האגדי של ריצ'רד בלמן: A Markovian Decision Process מכאן נולד עקרון האופטימליות, משוואת בלמן ו-MDP – הבסיס ל-RL מודרני, מ-Q-learning ועד AlphaGo #AI #ReinforcementLearning #Bellman

MikeE_3_14's tweet image. 🔥: פרק חדש ב-DataScienceDecoded!
 
😎חזרנו ל-1957 עם המאמר האגדי של ריצ'רד בלמן:

 A Markovian  Decision Process

 מכאן נולד עקרון האופטימליות, משוואת בלמן ו-MDP – הבסיס ל-RL מודרני, מ-Q-learning ועד AlphaGo 

#AI #ReinforcementLearning #Bellman

Loading...

Something went wrong.


Something went wrong.


United States Trends