`_.
.. raw:: html
See the specific code in Alg_Base/rl_a3c_pytorch_benchmark/envs/envs_parallel.py
Part1 UAV_VAT (Adapting to Gym Library)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. raw:: html
See the specific code in Alg_Base/rl_a3c_pytorch_benchmark/envs/gym_envs.py
in the UAV_VAT(gym.Env)
class
.. raw:: html
1. __init__
: Initializes the gym environment (including initialization of action_space
and observation_space
).
**Input Parameters:**
* :code:`arg_worker(int)`: Number of processes.
* :code:`conf_path(list)`: Path to the environment configuration file.
* :code:`env_conf_path(list)`: Path to the simulator configuration file.
* :code:`process_idx(list)`: Current process index.
.. raw:: html
2. reset
: Resets the environment, returning the initial state.
**Return Parameters:**
* :code:`state(np.ndarray)`: Returns the state (image).
.. raw:: html
3. step
: Environment interaction, inputs action, retrieves state information.
**Input Parameters:**
* :code:`action(torch.tensor)`: Action.
**Return Parameters:**
* :code:`state(np.ndarray)`: Returns the state (image).
* :code:`reward(float)`: Reward.
* :code:`done(int)`: Whether the environment has ended.
* :code:`info(dict)`: Information, including :code:`info["TimeLimit.truncated"]` which indicates whether the reset was due to step ending.
.. raw:: html
Part 2 UAV_VAT_Gymnasium (Adapting to Gymnasium Library)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The Gymnasium environment is largely compatible with the Gym environment, and the following describes the differences from the Gym environment.
.. raw:: html
See the specific code in Alg_Base/rl_a3c_pytorch_benchmark/envs/gym_envs.py
in the UAV_VAT_Gymnasium(gymnasium.Env)
class.
.. raw:: html
1. reset
: Resets the environment, returning the initial state.
Input Parameters:
:code:`seed(int)`: Random seed.
Return Parameters:
:code:`reset_info(dict)`: Reset environment information (default is empty).
.. raw:: html
2. step
: Environment interaction, inputs action, retrieves state information.
Return Parameters:
:code:`truncated(bool)`: Whether the reset was due to step ending (equivalent to :code:`info["TimeLimit.truncated"]` in Gym environment).
.. raw:: html
.. _SubprovecEnv:
Part3 Parallel Environment Encapsulation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. raw:: html
Stacked environment for parallel computation, enabling multiple environments to interact with multiple agents.
.. raw:: html
For specific code, see the ASYNC_SubprocVecEnv
class in Alg_Base/rl_a3c_pytorch_benchmark/envs/async_vecenv.py
.
Also, see the SubprocVecEnv_TS
class in Alg_Base/rl_a3c_pytorch_benchmark/envs/async_vecenv_ts.py
.
**Existing Reinforcement Learning Library Parallel Class Features and Modifications in This Project**
* Existing reinforcement learning libraries directly compatible with gym environments, such as the :code:`stable-baselines3` and :code:`tianshou` libraries, provide the parallel environment class :code:`SubprocVecEnv` , allowing users to directly encapsulate gym environments into parallel environments.
* The above-mentioned :code:`SubprocVecEnv` environment class checks whether to execute ``reset`` immediately after executing ``step``, which may cause the ``step`` and :code:`reset` to be executed in the same cycle.(Not strictly aligned with :code:`step` and :code:`reset` as required by the project environment.)
* Modification in this project: Strictly align :code:`step` with :code:`reset` to keep individual environment operations synchronized.
* The :code:`SubprocVecEnv` class from the ``stable-baselines3`` library is modified to the ``Async_SubprocVecEnv`` class, and the ``SubprocVecEnv`` and ``Collector`` classes from the ``tianshou`` library are modified to ``SubprocVecEnv_TS`` and ``TS_Collector`` classes, respectively.
* The differences between the original and modified implementations are shown in the figure below.
.. raw:: html
comparison diagram of SubprocVecEnv before and after modification
.. raw:: html
Part 4 Interface Invocation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. raw:: html
For specific code, see the make_env()
function in Alg_Base/rl_a3c_pytorch_benchmark/envs/envs_parallel.py
.
.. raw:: html
make_env
: Initializes multiple gym environments.
**Input Parameters:**
* :code:`n_envs(int)`: Number of environments.
* :code:`rank(int)`: Current environment index.
* :code:`gym(bool)`: If :code:`True`, generates environments from the Gym library; if :code:`False`, generates environments from the Gymnasium library.
* :code:`monitor(bool)`: Whether to wrap in the :code:`stable_baselines3.common.monitor.Monitor` class.
**Test Code:**
.. raw:: html
For specific code, see the gym_test()
function and stableparallel_test()
function in Alg_Base/rl_a3c_pytorch_benchmark/envs/envs_parallel.py
.
If you wish to quickly test whether the implementation of the above environment class is correct, you can use the following code:
1. Single process (using only gym environment)
.. code:: python
from gym.envs.registration import register
register(
id='UAVVAT-v0',
entry_point='gym_envs:UAV_VAT',
)
gym_test()
2. Use stable-baselines3 for algorithm testing
.. code:: python
stableparallel_test(n_envs=16, gym=False)
.. raw:: html
Note: When using the stable-baselines3 library for parallel training, invoke the above Async_SubprocVecEnv class.
3. Use tianshou for algorithm testing
.. code:: python
test_tianshou_env(args)
.. raw:: html
Note: When using the tianshou library for parallel training, you need to invoke the above SubprocVecEnv_TS class.
.. _Parallel:
3 Parallel Environment Classes
-------------------------------------------------
.. raw:: html
The design goal of the base environment class (BenchEnv_Multi) is multi-process asynchronous computation (e.g., A3C algorithm), calling the environment class to obtain data within individual processes.
To better utilize the GPU for parallel acceleration, synchronize the data of the above classes and execute the same commands in parallel, allowing the algorithm to operate in a single process on a synchronized parallel data flow, leading to this class (Envs).
This environment is primarily tailored to user-defined algorithm design needs (in scenarios where stable-baselines3 and other Gym-based environments are not used for algorithm implementation).
.. raw:: html
For specific code, see the Envs
class in Alg_Base/rl_a3c_pytorch_benchmark/envs/envs_parallel.py
.
Part 1 Envs()
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. raw:: html
For specific code, see the Envs
class in Alg_Base/rl_a3c_pytorch_benchmark/envs/envs_parallel.py
.
.. raw:: html
1. __init__
: Initializes the Envs parallel environment (including the initialization of action_space
and observation_space
).
**Input Parameters:**
* :code:`num_envs(int)`: Represents the concurrency level n.
* :code:`env_list(list(env))`: Initialize the list returned by the :code:`GE(n)` function by storing a number of pre-constructed environment classes in the list
* :code:`logs_show(bool)` :Whether to collect reward data in one of the environments to draw a training curve
.. raw:: html
2. reset
:reset environment, and return to the init environment
**Return Parameters:**
* :code:`states` :Returns a collection of states (images) for each environment
.. raw:: html
3. step
:Environment interaction, input action, get status information
**Input Parameters:**
* :code:`actions` :Each Agent action
**Return Parameters:**
* :code:`states` :Returns a collection of states (images) for each environment
* :code:`rewards` :The reward stack for each Agent
* :code:`done` :Whether each environment is in the end state
.. raw:: html
Part2 Data format changes before and after encapsulation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
**Comparison of data formats before and after encapsulation:**
* Pre-package(BenchEnv_Multi):
Data sampling: :code:`state:ndarray,reward:float,done:float= env.step(action:int)`
Environment reset: :code:`state:ndarray = env.reset()`
* After packaging(Envs), set the parallel number n:
Data sampling: :code:`states:ndarray(n,),rewards:ndarray(n),done:ndarray(n)= env.step(action:ndarray(n))`
Environment reset: :code:`states:ndarray(n,) = envs.reset()`
.. raw:: html
4 Performance Evaluation Metrics
------------------------------------------------------------------------------------------
1. Cumulative Reward CR
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The cumulative reward uses continuous rewards, and a cumulative calculation is performed to evaluate the performance of the intelligent body in keeping the target at the center of the image, as shown in the following formula:
.. raw:: html
In the above formula, :math:`r_{ct}` represents the continuous reward value at step :math:`t`, :math:`E_{ml}` represents the maximum trajectory length, and :math:`E_t` represents the length of this trajectory.
For a graphical representation of continuous reward calculation, see :ref:`Continuous Reward Diagram for Tracking Object`.
2. Tracking Success Rate TSR
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This metric uses discrete rewards to evaluate the proportion of successful tracking, calculated using the following formula:
.. raw:: html
In the above formula, :math:`r_{dt}` represents the discrete reward value at step :math:`t`, :math:`E_{ml}` is the maximum step length across batches, and :math:`E_l` represents the length of the current batch. Detail parameters see: :ref:`Built-in reward parameter configuration`
For a graphical representation of discrete reward calculation, see :ref:`Discrete Reward Diagram for Tracking Object`.
.. raw:: html
Reference:
.. [1] Luo, Wenhan, et al. "End-to-end active object tracking and its real-world deployment via reinforcement learning." IEEE transactions on pattern analysis and machine intelligence 42.6 (2019): 1317-1332.
.. [2] Dionigi, Alberto, et al. "D-VAT: End-to-End Visual Active Tracking for Micro Aerial Vehicles." IEEE Robotics and Automation Letters (2024).