#reinforcementlearning 搜尋結果

PROF🌀Right answer, flawed reason?🤔🌀 📄arxiv.org/pdf/2509.03403 Excited to share our work: PROF-PRocess cOnsistency Filter! 🚀 Challenge: ORM is blind to flawed logic, and PRM suffers from reward hacking. Our method harmonizes strengths of PRM & ORM. #LLM #ReinforcementLearning

ye_chenlu's tweet image. PROF🌀Right answer, flawed reason?🤔🌀
📄arxiv.org/pdf/2509.03403
Excited to share our work: PROF-PRocess cOnsistency Filter! 🚀
Challenge: ORM is blind to flawed logic, and PRM suffers from reward hacking. Our method harmonizes strengths of PRM & ORM. #LLM #ReinforcementLearning
ye_chenlu's tweet image. PROF🌀Right answer, flawed reason?🤔🌀
📄arxiv.org/pdf/2509.03403
Excited to share our work: PROF-PRocess cOnsistency Filter! 🚀
Challenge: ORM is blind to flawed logic, and PRM suffers from reward hacking. Our method harmonizes strengths of PRM & ORM. #LLM #ReinforcementLearning

🚗 Sim-to-Real Application of Reinforcement Learning Agents for Autonomous, Real Vehicle Drifting Read: mdpi.com/2624-8921/6/2/… #AutonomousDrifting #ReinforcementLearning

Vehicles_MDPI's tweet image. 🚗 Sim-to-Real Application of Reinforcement Learning Agents for Autonomous, Real Vehicle Drifting

Read: mdpi.com/2624-8921/6/2/…

#AutonomousDrifting #ReinforcementLearning

UserRL: Training Interactive User-Centric Agent via Reinforcement Learning Qian et al.: arxiv.org/abs/2509.19736 #ArtificialIntelligence #DeepLearning #ReinforcementLearning

ceobillionaire's tweet image. UserRL: Training Interactive User-Centric Agent via Reinforcement Learning

Qian et al.: arxiv.org/abs/2509.19736

#ArtificialIntelligence #DeepLearning #ReinforcementLearning

Deep #ReinforcementLearning Hands-On — Practical easy-to-follow guide to RL from Q-learning and DQNs to PPO and RLHF: amzn.to/3MV9o60 [3rd Edition] v/ @PacktDataML —— #AI #MachineLearning #DeepLearning #DataScience #DataScientist —— 𝓚𝓮𝔂 𝓕𝓮𝓪𝓽𝓾𝓻𝓮𝓼: 🟢Learn with…

KirkDBorne's tweet image. Deep #ReinforcementLearning Hands-On — Practical easy-to-follow guide to RL from Q-learning and DQNs to PPO and RLHF: amzn.to/3MV9o60 [3rd Edition] v/ @PacktDataML 
——
#AI #MachineLearning #DeepLearning #DataScience #DataScientist
——
𝓚𝓮𝔂 𝓕𝓮𝓪𝓽𝓾𝓻𝓮𝓼:
🟢Learn with…

Scientists have designed a #ReinforcementLearning-based framework that enables multiple robot arms to perform up to 40 tasks simultaneously without colliding in a crowded workspace. @GoogleDeepMind Learn more in Science #Robotics: scim.ag/3JIh7WF


[1/4] 🚀 We’re excited to announce the v1 release of JaxAHT – a new library for Ad Hoc Teamwork (AHT) research, built with JAX for speed & scalability! Check it out 👉 larg.github.io/jax-aht #AI #MARL #ReinforcementLearning #JAX #AdHocTeamwork


Day 12 🦾 of becoming an ML Beast: Explored Reinforcement Learning – where an agent interacts with an environment, takes actions, and learns from rewards to improve decisions over time. #MachineLearning #ReinforcementLearning #AI #Learninginpublic #100daysofcoding

_kedar_18's tweet image. Day 12 🦾 of becoming an ML Beast: Explored Reinforcement Learning – where an agent interacts with an environment, takes actions, and learns from rewards to improve decisions over time. 

#MachineLearning #ReinforcementLearning #AI 
#Learninginpublic #100daysofcoding
_kedar_18's tweet image. Day 12 🦾 of becoming an ML Beast: Explored Reinforcement Learning – where an agent interacts with an environment, takes actions, and learns from rewards to improve decisions over time. 

#MachineLearning #ReinforcementLearning #AI 
#Learninginpublic #100daysofcoding

Enabling robots to improve autonomously via RL will be powerful, and dense shaping rewards can greatly facilitate RL. Our #IROS2025 paper presents a method leveraging VLMs to derive dense rewards for efficient autonomous RL. ⚡🦾 #Robotics #ReinforcementLearning 🧵1/5


Scientists have developed a method based on #ReinforcementLearning that enables a robot to use its upper body to lift and flip a water jug. @ToyotaResearch Learn more in Science #Robotics: scim.ag/4oK6qmt


In the Age of AI, start from First Principles. Unlock bottom-up design. Solve classes of problems, not isolated features. Think systems, not silos. Solve fundamentally. Scale exponentially. #AI #AIAgents #ReinforcementLearning #RAG #KnowledgeGraph #Orchestration

amitrathore's tweet image. In the Age of AI, start from First Principles.
Unlock bottom-up design.
Solve classes of problems, not isolated features.
Think systems, not silos.
Solve fundamentally. Scale exponentially.

#AI #AIAgents #ReinforcementLearning #RAG #KnowledgeGraph #Orchestration

7/10 Reinforcement Learning trains agents through trial and error to maximize rewards. It’s used in gaming, robotics, and real-time decision systems like traffic control. #ReinforcementLearning #AI #SmartSystems #DeepLearning #GameAI #AutonomousTech

SatlokChannel's tweet image. 7/10
Reinforcement Learning trains agents through trial and error to maximize rewards. It’s used in gaming, robotics, and real-time decision systems like traffic control.  
#ReinforcementLearning #AI #SmartSystems #DeepLearning #GameAI #AutonomousTech

ロボットアーム装着 四足歩行ロボットが 腕と脚を組み合わせて動かし、移動しながら物体を操作する rai-inst.com/resources/blog… #ReinforcementLearning #framework #WholeBody #flexible #LocoManipulation #ReLIC #RAIInstitute


RL playgrounds 🚀🔨🔨 I am playing with the Unity ML agents (which isnt even very recent). The possibilities are insane. From simple tasks to complex challenges, AI agents are leveling up. #ReinforcementLearning #AI #Unity

sp3akerstelios's tweet image. RL playgrounds 🚀🔨🔨

I am playing with the Unity ML agents (which isnt even very recent).
The possibilities are insane. From simple tasks to complex challenges, AI agents are leveling up.
#ReinforcementLearning #AI #Unity

Introduction to various #ReinforcementLearning #Algorithms: bit.ly/2UPHbSj ————— #DataScience #AI #MachineLearning #ML #DeepLearning #DataMining #Mathematics #Gamification ————— + See this foundational book (2nd edition): amzn.to/3UtbeAa

KirkDBorne's tweet image. Introduction to various #ReinforcementLearning #Algorithms: bit.ly/2UPHbSj
—————
#DataScience #AI #MachineLearning #ML #DeepLearning #DataMining #Mathematics #Gamification
—————
+
See this foundational book (2nd edition): amzn.to/3UtbeAa

Inverse reinforcement learning infers reward functions from observing expert behavior. Instead of programming rewards, AI learns what experts value by watching them. Learning goals from demonstrations. #ReinforcementLearning #InverseRL #LearningObjectives andrewroche.ai/ai-reinforceme…


アスリートのように考え、計画し、動くロボット自転車 rai-inst.com/resources/blog… パルクールの機動性と、どんなに複雑な地形も知覚して計画し、ナビゲートする知性を兼ね備える #ReinforcementLearning #UltraMobileVehicle #UMV #JumpingBicycle #RAI_Institute


🚗 Sim-to-Real Application of Reinforcement Learning Agents for Autonomous, Real Vehicle Drifting Read: mdpi.com/2624-8921/6/2/… #AutonomousDrifting #ReinforcementLearning

Vehicles_MDPI's tweet image. 🚗 Sim-to-Real Application of Reinforcement Learning Agents for Autonomous, Real Vehicle Drifting

Read: mdpi.com/2624-8921/6/2/…

#AutonomousDrifting #ReinforcementLearning

Inverse reinforcement learning infers reward functions from observing expert behavior. Instead of programming rewards, AI learns what experts value by watching them. Learning goals from demonstrations. #ReinforcementLearning #InverseRL #LearningObjectives andrewroche.ai/ai-reinforceme…


#RA3: Mid-Training with Temporal Action Abstractions for Faster #ReinforcementLearning (RL) Post-Training in Code #LLMs buff.ly/LXLMNyi

andresvilarino's tweet image. #RA3: Mid-Training with Temporal Action Abstractions for Faster #ReinforcementLearning (RL) Post-Training in Code #LLMs 

buff.ly/LXLMNyi

Congrats to the team for building Webscale-RL, an automated data pipeline that turns pretraining corpora into verifiable RL data at trillion-token scale. Excited to see this bridge the gap between pretraining and RL! @SFResearch #AI #LLMs #ReinforcementLearning #MachineLearning

📣 Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels 📣 RL for LLMs faces a critical data bottleneck: existing RL datasets are <10B tokens while pretraining uses >1T tokens. Our Webscale-RL pipeline solves this by automatically converting pretraining…

SFResearch's tweet image. 📣 Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels 📣

RL for LLMs faces a critical data bottleneck: existing RL datasets are &amp;lt;10B tokens while pretraining uses &amp;gt;1T tokens. Our Webscale-RL pipeline solves this by automatically converting pretraining…


Reinforcement learning is a transformative AI approach where machines learn through trial and error, akin to human learning. Unlike traditional programming, it allows AI systems to... #ReinforcementLearning #ArtificialIntelligence #MachineLearning youaccel.com/blog/reinforce…


Finished chapters 1–8 of Sutton & Barto’s Reinforcement Learning. Reading alongside Stanford’s CS234 lectures, great combo so far. Any recs for what to read next once I finish? #reinforcementlearning


(Open Access) Distributional Reinforcement Learning: freecomputerbooks.com/Distributional… Look for "Read and Download Links" section to download. Follow me if you like this post. #ReinforcementLearning #MachineLearning #DeepLearning #LLMs #GenAI #GenerativeAI #NeuralNetworks

ecomputerbooks's tweet image. (Open Access) Distributional Reinforcement Learning: freecomputerbooks.com/Distributional…

Look for &quot;Read and Download Links&quot; section to download. Follow me if you like this post.
#ReinforcementLearning #MachineLearning #DeepLearning #LLMs #GenAI #GenerativeAI #NeuralNetworks

Postdoctoral Researcher in Reinforcement Learning 📍ETH Zurich, Institute of Machine Learning, Switzerland Online, Multi-Agent & Human Feedback RL. Apply now: researchhires.com/position/ea709… #Postdoc #ReinforcementLearning #AIResearch #MachineLearning #AcademicJobs #ResearchHires


(Open Access) Reinforcement Learning: An Introduction - freecomputerbooks.com/Reinforcement-… Look for "Read and Download Links" section to download. Follow me if you like this post. #ReinforcementLearning #MachineLearning #DeepLearning #LLMs #GenAI #GenerativeAI #NeuralNetworks

ecomputerbooks's tweet image. (Open Access) Reinforcement Learning: An Introduction - freecomputerbooks.com/Reinforcement-…

Look for &quot;Read and Download Links&quot; section to download. Follow me if you like this post.
#ReinforcementLearning #MachineLearning #DeepLearning #LLMs #GenAI #GenerativeAI #NeuralNetworks

📉 Errors aren’t failures; they’re learning vectors. Talus Labs trains agents to evolve through error feedback. @Talus_Labs #ReinforcementLearning #AI


“Usually, the first idea doesn't check out. That's just the nature of #AIresearch.” For Derek Li, Senior Researcher at Noah's Ark Lab, that first idea was training all multitask #reinforcementlearning objectives together. What went wrong: Runs underperformed baselines, with…


Every swap, limit order, cross-chain action, it’s input to the #ReinforcementLearning engine. In #Deluthium, your request becomes part of the learning feedback loop. No black boxes. Full transparency. Brought to you by the Onchain Flash Boys. Powered by RL.

Deluthium's tweet image. Every swap, limit order, cross-chain action, it’s input to the #ReinforcementLearning engine.

In #Deluthium, your request becomes part of the learning feedback loop.

No black boxes. Full transparency.

Brought to you by the Onchain Flash Boys.
Powered by RL.

Reinforcement learning AI skills are advancing quickly, creating a split in tech capabilities. Leaders ignoring this risk falling behind. Prioritize where AI momentum builds. reyemte.ch/jrh #AI #ReinforcementLearning


📉 Errors aren’t failures; they’re learning vectors. Talus Labs trains agents to evolve through error feedback. @Talus_Labs #ReinforcementLearning #AI


⚙️ A model becomes intelligent when it changes itself. Talus Labs programs that reflexivity into machine agents. @Talus_Labs #ReinforcementLearning #AI


💫 Intelligence scales when feedback deepens. Talus Labs builds recursive systems that learn through their own outputs. @Talus_Labs #ReinforcementLearning #AI


Meet PASTA: An RL agent that refines text-to-image results collaboratively, reducing trial-and-error. 🤖🎨 #PASTA #ReinforcementLearning


PROF🌀Right answer, flawed reason?🤔🌀 📄arxiv.org/pdf/2509.03403 Excited to share our work: PROF-PRocess cOnsistency Filter! 🚀 Challenge: ORM is blind to flawed logic, and PRM suffers from reward hacking. Our method harmonizes strengths of PRM & ORM. #LLM #ReinforcementLearning

ye_chenlu's tweet image. PROF🌀Right answer, flawed reason?🤔🌀
📄arxiv.org/pdf/2509.03403
Excited to share our work: PROF-PRocess cOnsistency Filter! 🚀
Challenge: ORM is blind to flawed logic, and PRM suffers from reward hacking. Our method harmonizes strengths of PRM &amp;amp; ORM. #LLM #ReinforcementLearning
ye_chenlu's tweet image. PROF🌀Right answer, flawed reason?🤔🌀
📄arxiv.org/pdf/2509.03403
Excited to share our work: PROF-PRocess cOnsistency Filter! 🚀
Challenge: ORM is blind to flawed logic, and PRM suffers from reward hacking. Our method harmonizes strengths of PRM &amp;amp; ORM. #LLM #ReinforcementLearning

7/10 Reinforcement Learning trains agents through trial and error to maximize rewards. It’s used in gaming, robotics, and real-time decision systems like traffic control. #ReinforcementLearning #AI #SmartSystems #DeepLearning #GameAI #AutonomousTech

SatlokChannel's tweet image. 7/10
Reinforcement Learning trains agents through trial and error to maximize rewards. It’s used in gaming, robotics, and real-time decision systems like traffic control.  
#ReinforcementLearning #AI #SmartSystems #DeepLearning #GameAI #AutonomousTech

UserRL: Training Interactive User-Centric Agent via Reinforcement Learning Qian et al.: arxiv.org/abs/2509.19736 #ArtificialIntelligence #DeepLearning #ReinforcementLearning

ceobillionaire's tweet image. UserRL: Training Interactive User-Centric Agent via Reinforcement Learning

Qian et al.: arxiv.org/abs/2509.19736

#ArtificialIntelligence #DeepLearning #ReinforcementLearning

Day 12 🦾 of becoming an ML Beast: Explored Reinforcement Learning – where an agent interacts with an environment, takes actions, and learns from rewards to improve decisions over time. #MachineLearning #ReinforcementLearning #AI #Learninginpublic #100daysofcoding

_kedar_18's tweet image. Day 12 🦾 of becoming an ML Beast: Explored Reinforcement Learning – where an agent interacts with an environment, takes actions, and learns from rewards to improve decisions over time. 

#MachineLearning #ReinforcementLearning #AI 
#Learninginpublic #100daysofcoding
_kedar_18's tweet image. Day 12 🦾 of becoming an ML Beast: Explored Reinforcement Learning – where an agent interacts with an environment, takes actions, and learns from rewards to improve decisions over time. 

#MachineLearning #ReinforcementLearning #AI 
#Learninginpublic #100daysofcoding

Today on the blog we propose Action-Based Contrastive Self-Training, a data-efficient #ReinforcementLearning tuning approach for improving multi-turn conversation modeling in mixed-initiative LLM interaction. Read all about it →goo.gle/3Sxas2T

GoogleResearch's tweet image. Today on the blog we propose Action-Based Contrastive Self-Training, a data-efficient #ReinforcementLearning tuning approach for improving multi-turn conversation modeling in mixed-initiative LLM interaction. Read all about it →goo.gle/3Sxas2T

RL playgrounds 🚀🔨🔨 I am playing with the Unity ML agents (which isnt even very recent). The possibilities are insane. From simple tasks to complex challenges, AI agents are leveling up. #ReinforcementLearning #AI #Unity

sp3akerstelios's tweet image. RL playgrounds 🚀🔨🔨

I am playing with the Unity ML agents (which isnt even very recent).
The possibilities are insane. From simple tasks to complex challenges, AI agents are leveling up.
#ReinforcementLearning #AI #Unity

Introduction to various #ReinforcementLearning #Algorithms: bit.ly/2UPHbSj ————— #DataScience #AI #MachineLearning #ML #DeepLearning #DataMining #Mathematics #Gamification ————— + See this foundational book (2nd edition): amzn.to/3UtbeAa

KirkDBorne's tweet image. Introduction to various #ReinforcementLearning #Algorithms: bit.ly/2UPHbSj
—————
#DataScience #AI #MachineLearning #ML #DeepLearning #DataMining #Mathematics #Gamification
—————
+
See this foundational book (2nd edition): amzn.to/3UtbeAa

🚀 Exciting News! Our paper has been accepted at @NeurIPSConf! 🎉 We introduce State Chrono Representation (SCR) -- a novel approach in #ReinforcementLearning. SCR integrates long-term temporal dynamics and cumulative rewards into state representations, addressing key challenges…

my_cat_can_code's tweet image. 🚀 Exciting News! Our paper has been accepted at @NeurIPSConf! 🎉
We introduce State Chrono Representation (SCR) -- a novel approach in #ReinforcementLearning. SCR integrates long-term temporal dynamics and cumulative rewards into state representations, addressing key challenges…

Automated Design of Agentic Systems Shengran Hu, Cong Lu, Jeff Clune: arxiv.org/abs/2408.08435 #ArtificialIntelligence #DeepLearning #ReinforcementLearning

ceobillionaire's tweet image. Automated Design of Agentic Systems

Shengran Hu, Cong Lu, Jeff Clune: arxiv.org/abs/2408.08435

#ArtificialIntelligence #DeepLearning #ReinforcementLearning

The Bitter Lesson "Search and learning are general purpose methods that continue to scale with increased computation, even as the available computation becomes very great." — Richard Sutton Rich Sutton: incompleteideas.net/IncIdeas/Bitte… #ReinforcementLearning

ceobillionaire's tweet image. The Bitter Lesson

&quot;Search and learning are general purpose methods that continue to scale with increased computation, even as the available computation becomes very great.&quot; — Richard Sutton

Rich Sutton: incompleteideas.net/IncIdeas/Bitte…

#ReinforcementLearning

Had an incredible time yesterday at #TECHMEET Abeokuta 3.0! 🎉 Spoke about the future of AI and why #ReinforcementLearning deserves more attention. Great discussions on how can AI can be leveraged and the positive impact on future careers! 🚀 #AI #TechEvent #Abeokuta

Mkm_world's tweet image. Had an incredible time yesterday at #TECHMEET Abeokuta 3.0! 🎉 Spoke about the future of AI and why #ReinforcementLearning deserves more attention. Great discussions on how can AI can be leveraged and the positive impact on future careers! 🚀 #AI  #TechEvent #Abeokuta
Mkm_world's tweet image. Had an incredible time yesterday at #TECHMEET Abeokuta 3.0! 🎉 Spoke about the future of AI and why #ReinforcementLearning deserves more attention. Great discussions on how can AI can be leveraged and the positive impact on future careers! 🚀 #AI  #TechEvent #Abeokuta
Mkm_world's tweet image. Had an incredible time yesterday at #TECHMEET Abeokuta 3.0! 🎉 Spoke about the future of AI and why #ReinforcementLearning deserves more attention. Great discussions on how can AI can be leveraged and the positive impact on future careers! 🚀 #AI  #TechEvent #Abeokuta
Mkm_world's tweet image. Had an incredible time yesterday at #TECHMEET Abeokuta 3.0! 🎉 Spoke about the future of AI and why #ReinforcementLearning deserves more attention. Great discussions on how can AI can be leveraged and the positive impact on future careers! 🚀 #AI  #TechEvent #Abeokuta

🚀 New Survey: Reinforcement Learning in Vision We review 200+ works spanning MLLMs, visual generation, unified models & VLA — from RLHF to GRPO & RLVR. 🔗 Paper: arxiv.org/abs/2508.08189 🔗 Resources: github.com/weijiawu/Aweso… #AI #ReinforcementLearning #ComputerVision #Survey

weijiawu7's tweet image. 🚀 New Survey: Reinforcement Learning in Vision 

We review 200+ works spanning MLLMs, visual generation, unified models &amp;amp; VLA — from RLHF to GRPO &amp;amp; RLVR.

🔗 Paper: arxiv.org/abs/2508.08189
🔗 Resources: github.com/weijiawu/Aweso…
#AI #ReinforcementLearning #ComputerVision #Survey

Loading...

Something went wrong.


Something went wrong.


United States Trends