Deep Reinforcement Learning with a Natural Language Action Space
Agent models: Internalizing Chain-of-Action Generation into Reasoning models
Search-R1: Training LLMs to Reason and Leverage Search
Engines with Reinforcement Learning
Training VLM Agents with Multi-Turn Reinforcement Learning
Understanding Self-Evolution in LLM Agents via
Multi-Turn Reinforcement Learning
Curiosity-driven Exploration by Self-supervised Prediction
Unifying Count-Based Exploration and Intrinsic Motivation