RL [omni.isaac.gym]

Base Environment Wrapper

class VecEnvBase(headless: bool, sim_device: int = 0, enable_livestream: bool = False, enable_viewport: bool = False, launch_simulation_app: bool = True, experience: Optional[str] = None)

This class provides a base interface for connecting RL policies with task implementations. APIs provided in this interface follow the interface in gym.Env. This class also provides utilities for initializing simulation apps, creating the World, and registering a task.

action_space: spaces.Space[ActType]

close() → None: Closes simulation.

create_viewport_render_product(resolution=(1280, 720)): Create a render product of the viewport for rendering.

metadata: dict[str, Any] = {'render_modes': []}

property np_random: numpy.random._generator.Generator

Returns the environment’s internal _np_random that if not set will initialise with a random seed.

Returns: Instances of np.random.Generator

property num_envs

Retrieves number of environments.

Returns: Number of environments.
Return type: num_envs(int)

observation_space: spaces.Space[ObsType]

render(mode='human') → None

Run rendering without stepping through the physics.

By convention, if mode is:

human: render to the current display and return nothing. Usually for human consumption.

rgb_array: Return an numpy.ndarray with shape (x, y, 3), representing RGB values for an x-by-y pixel image, suitable for turning into a video.

Parameters: mode (str, optional) – The mode to render with. Defaults to “human”.

property render_enabled

Whether rendering is enabled.

Returns: is render enabled.
Return type: render(bool)

render_mode: str | None = None

reset(seed=None, options=None)

Resets the task and updates observations.

Parameters

seed (Optional[int]) – Seed.
options (Optional[dict]) – Options as used in gymnasium.

Returns

Buffer of observation data. info(dict): Dictionary of extras data.

Return type

observations(Union[numpy.ndarray, torch.Tensor])

reward_range = (-inf, inf)

seed(seed=- 1)

Sets a seed. Pass in -1 for a random seed.

Parameters: seed (int) – Seed to set. Defaults to -1.
Returns: Seed that was set.
Return type: seed (int)

set_task(task, backend='numpy', sim_params=None, init_sim=True, rendering_dt=0.016666666666666666) → None

Creates a World object and adds Task to World.: Initializes and registers task to the environment interface. Triggers task start-up.

Parameters

task (RLTask) – The task to register to the env.
backend (str) – Backend to use for task. Can be “numpy” or “torch”. Defaults to “numpy”.
sim_params (dict) – Simulation parameters for physics settings. Defaults to None.
init_sim (Optional[bool]) – Automatically starts simulation. Defaults to True.
rendering_dt (Optional[float]) – dt for rendering. Defaults to 1/60s.

signal_handler(sig, frame)

property simulation_app

Retrieves the SimulationApp object.

Returns: SimulationApp.
Return type: simulation_app(SimulationApp)

spec: EnvSpec | None = None

step(actions)

Basic implementation for stepping simulation.: Can be overriden by inherited Env classes to satisfy requirements of specific RL libraries. This method passes actions to task for processing, steps simulation, and computes observations, rewards, and resets.

Parameters: actions (Union[numpy.ndarray, torch.Tensor]) – Actions buffer from policy.
Returns: Buffer of observation data. rewards(Union[numpy.ndarray, torch.Tensor]): Buffer of rewards data. dones(Union[numpy.ndarray, torch.Tensor]): Buffer of resets/dones data. info(dict): Dictionary of extras data.
Return type: observations(Union[numpy.ndarray, torch.Tensor])

property task

Retrieves the task.

Returns: Task.
Return type: task(BaseTask)

property unwrapped: gymnasium.core.Env[gymnasium.core.ObsType, gymnasium.core.ActType]

Returns the base non-wrapped environment.

Returns: The base non-wrapped gymnasium.Env instance
Return type: Env

update_task_params()

property world

Retrieves the World object for simulation.

Returns: Simulation World.
Return type: world(World)

Multi-Threaded Environment Wrapper

exception TaskStopException

Exception class for signalling task termination.

args

with_traceback(): Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

class TrainerMT

A base abstract trainer class for controlling starting and stopping of RL policy.

abstract run(): Runs RL loop in a new thread

abstract stop(): Stop RL thread

class VecEnvMT(headless: bool, sim_device: int = 0, enable_livestream: bool = False, enable_viewport: bool = False, launch_simulation_app: bool = True, experience: Optional[str] = None)

This class provides a base interface for connecting RL policies with task implementations in a multi-threaded fashion. RL policies using this class will run on a different thread than the thread simulation runs on. This can be useful for interacting with the UI before, during, and after running RL policies. Data sharing between threads happen through message passing on multi-threaded queues.

action_space: spaces.Space[ActType]

clear_queues(): Clears all queues.

close() → None: Closes simulation.

create_viewport_render_product(resolution=(1280, 720)): Create a render product of the viewport for rendering.

get_actions(block=True)

Retrieves actions from policy by waiting for actions to be sent to the queue from the RL thread.

Parameters: block (Optional[bool]) – Whether to block thread when waiting for data.
Returns: actions buffer retrieved from queue.
Return type: actions (Union[np.ndarray, torch.Tensor, None])

get_data(block=True)

Retrieves data from task by waiting for data dictionary to be sent to the queue from the simulation thread.

Parameters: block (Optional[bool]) – Whether to block thread when waiting for data.
Returns: data dictionary retrieved from queue.
Return type: actions (Union[np.ndarray, torch.Tensor, None])

initialize(action_queue, data_queue, timeout=30)

Initializes queues for sharing data across threads.

Parameters

action_queue (queue.Queue) – Queue for passing actions from policy to task.
data_queue (queue.Queue) – Queue for passing data from task to policy.
timeout (Optional[int]) – Seconds to wait for data when queue is empty. An exception will be thrown when the timeout limit is reached. Defaults to 30 seconds.

metadata: dict[str, Any] = {'render_modes': []}

property np_random: numpy.random._generator.Generator

Returns the environment’s internal _np_random that if not set will initialise with a random seed.

Returns: Instances of np.random.Generator

property num_envs

Retrieves number of environments.

Returns: Number of environments.
Return type: num_envs(int)

observation_space: spaces.Space[ObsType]

render(mode='human') → None

Run rendering without stepping through the physics.

By convention, if mode is:

human: render to the current display and return nothing. Usually for human consumption.

rgb_array: Return an numpy.ndarray with shape (x, y, 3), representing RGB values for an x-by-y pixel image, suitable for turning into a video.

Parameters: mode (str, optional) – The mode to render with. Defaults to “human”.

property render_enabled

Whether rendering is enabled.

Returns: is render enabled.
Return type: render(bool)

render_mode: str | None = None

reset(seed=None, options=None)

Resets the task and updates observations.

Parameters

seed (Optional[int]) – Seed.
options (Optional[dict]) – Options as used in gymnasium.

Returns

Buffer of observation data. info(dict): Dictionary of extras data.

Return type

observations(Union[numpy.ndarray, torch.Tensor])

reward_range = (-inf, inf)

async run(trainer)

Main loop for controlling simulation and task stepping. This method is responsible for stepping task and simulation, collecting buffers from task, sending data to policy, and retrieving actions from policy. It also deals with the case when the policy terminates on completion and continues the simulation thread so that UI does not get affected.

Parameters: trainer (TrainerMT) – A Trainer object that implements APIs for starting and stopping RL thread.

seed(seed=- 1)

Sets a seed. Pass in -1 for a random seed.

Parameters: seed (int) – Seed to set. Defaults to -1.
Returns: Seed that was set.
Return type: seed (int)

send_actions(actions, block=True)

Sends actions from RL thread to simulation thread by adding actions to queue.

Parameters

actions (Union[np.ndarray, torch.Tensor]) – actions buffer to be added to queue.
block (Optional[bool]) – Whether to block thread when writing to queue.

send_data(data, block=True)

Sends data from task thread to RL thread by adding data to queue.

Parameters

data (dict) – Dictionary containing task data.
block (Optional[bool]) – Whether to block thread when writing to queue.

set_render_mode(render_mode)

set_task(task, backend='numpy', sim_params=None, init_sim=True, rendering_dt=0.016666666666666666) → None

Creates a World object and adds Task to World.: Initializes and registers task to the environment interface. Triggers task start-up.

Parameters

task (RLTask) – The task to register to the env.
backend (str) – Backend to use for task. Can be “numpy” or “torch”. Defaults to “numpy”.
sim_params (dict) – Simulation parameters for physics settings. Defaults to None.
init_sim (Optional[bool]) – Automatically starts simulation. Defaults to True.
rendering_dt (Optional[float]) – dt for rendering. Defaults to 1/60s.

signal_handler(sig, frame)

property simulation_app

Retrieves the SimulationApp object.

Returns: SimulationApp.
Return type: simulation_app(SimulationApp)

spec: EnvSpec | None = None

step(actions)

Basic implementation for stepping simulation.: Can be overriden by inherited Env classes to satisfy requirements of specific RL libraries. This method passes actions to task for processing, steps simulation, and computes observations, rewards, and resets.

Parameters: actions (Union[numpy.ndarray, torch.Tensor]) – Actions buffer from policy.
Returns: Buffer of observation data. rewards(Union[numpy.ndarray, torch.Tensor]): Buffer of rewards data. dones(Union[numpy.ndarray, torch.Tensor]): Buffer of resets/dones data. info(dict): Dictionary of extras data.
Return type: observations(Union[numpy.ndarray, torch.Tensor])

property task

Retrieves the task.

Returns: Task.
Return type: task(BaseTask)

property unwrapped: gymnasium.core.Env[gymnasium.core.ObsType, gymnasium.core.ActType]

Returns the base non-wrapped environment.

Returns: The base non-wrapped gymnasium.Env instance
Return type: Env

update_task_params()

property world

Retrieves the World object for simulation.

Returns: Simulation World.
Return type: world(World)