Domain randomization is a technique used in Reinforcement Learning (RL) and Sim-to-Real transfer to improve the generalization of an RL agent. The idea is to train the agent in a simulated environment where various physical properties are randomized during training, so the policy becomes robust to variations.
Some examples of randomized parameters in domain randomization:
For your dual-arm manipulation task, you could use domain randomization in:
In Reinforcement Learning, we estimate how good a certain state or action is by using value functions.
Measures the expected cumulative reward an agent will get starting from a given state and following a policy.
Defined as:
$$ \begin{equation}V^{\pi}(s) = \mathbb{E} \left[ \sum_{t=0}^{\infty} \gamma^t r_t \mid s_0 = s, \pi \right]\end{equation} $$
Here, Vπ(s)V^\pi(s) is the value of state ss under policy π\pi.
Example: In your setup, V(s)V(s) tells you how good a given robot + object state is if you keep following the current policy.
Measures the expected cumulative reward starting from a given state-action pair and following a policy afterward.
Defined as:
$$ \begin{equation}Q^{\pi}(s, a) = \mathbb{E} \left[ \sum_{t=0}^{\infty} \gamma^t r_t \mid s_0 = s, a_0 = a, \pi \right]\end{equation} $$
Here, Qπ(s,a)Q^\pi(s, a) is the expected return if you take action aa in state ss and then follow policy π\pi.
Example: Q(s,a)Q(s, a) in your setup will tell you how good it is to apply a certain joint velocity command given the current robot + object state.
| Concept | Value Function (V-function) | Q-Function (Q-function) |
|---|---|---|
| Definition | Expected reward from state ss | Expected reward from state-action pair (s,a)(s, a) |
| Input | State ss | State ss, Action aa |
| Output | Expected total reward | Expected total reward if taking action aa in state ss |
| Used in | Policy-based methods (e.g., Actor-Critic) | Q-learning, DDPG, SAC (off-policy methods) |
Since your action space is joint velocities, your Q-function will evaluate how good different velocity commands are in a given state.
No, Domain Randomization and Curriculum Learning are different concepts in reinforcement learning, though they can sometimes be used together.