Environment Wrapping ============================== .. raw:: html

This project provides three forms of Environment (Path: :code:`Alg_Base/rl_a3c_pytorch_benchmark/envs`): the base Environment class :code:`BenchEnv_Multi()`, the Environment class wrapped by OpenAI Gym (Gynasium) :code:`gym()`:code:`gynasium()`, and our own implementation of the parallel Environment class. The relationships among these three Environment types are shown in the figure below: .. figure:: ../_static/image/Environment3Types.png :alt: Relationship diagram of three types of environment classes :width: 600px :align: center Diagram of the three environment classes .. _BenchEnv_Multi: 1 Base Environment Class ------------------------------------- Part 1 Interface Details ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. raw:: html

The base environment class is the class for direct communication between the algorithm side and the simulator side, mainly responsible for multi-process communication and data transmission with the simulator. If you only need to call the class for testing, you can directly jump to: configuration file and constructor

.. raw:: html

See the specific code in Alg_Base/rl_a3c_pytorch_benchmark/envs/environment.py in BenchEnv_Multi()

.. raw:: html

1. __init__: Used to initialize the communication interface, number of processes, and other essential parameters for the environment.

* :code:`Action_dim(int)`: Agent action dimension * :code:`action_list(list)`: Used to convert discrete action commands into continuous movement commands * The storage order of :code:`action_list` is: :code:`[Forward, Backward, Left, Right, CCW, CW]`, representing the movement commands "forward, backward, left, right, counterclockwise, clockwise." * :code:`process_state(function)`: Function for post-processing the images * :code:`arg_worker(int)`: System parallelism count * :code:`process_idx(int)`: Current environment's parallel index * :code:`Other_State(bool)`: Whether to enable additional state options on the simulator side: :code:`Other_State`, detail see: :ref:`Parameter configuration of the simulation environment` * :code:`CloudPoint(bool)`: Whether to enable the point cloud option on the simulator side: :code:`CloudPoint`, detail see: :ref:`Parameter configuration of the simulation environment` * :code:`RewardParams(bool)`: Whether to enable additional reward parameters options on the simulator side: :code:`RewardParams`, detail see: :ref:`Parameter configuration of the simulation environment` * :code:`port_process(int)`: Port specified for data transmission between processes * :code:`end_reward(bool)`: Whether to enable additional end rewards(additional penalties if end reward is 0; additional rewards if end reward is not 0) * :code:`end_reward_list(list)`: Additional end rewards list (valid only when :code:`end_reward` is enabled) * :code:`scene(str)`: Determines the current experimental scene (valid only when :code:`auto_start` is enabled) * Valid scenes include: :code:`["citystreet", "downtown", "lake", "village", "desert", "farmland", None]` * :code:`weather(str)`: Determines the weather of the current experimental scene (valid only when :code:`auto_start` is enabled) * Valid weathers include: :code:`["day", "night", "foggy", "snow", None]` * :code:`auto_start(bool)`: Whether to automatically start the simulator * :code:`delay(int)`: Configure how long the system waits for the map to load, only works if :code:`auto_start(bool)` is enabled. * :code:`Control_Frequence(int)`: Specifies the system control frequency * :code:`reward_type(str)`: Specifies the type of reward (can choose different baseline algorithms for reward) * :code:`verbose(bool)`: Configure the environment to output log files or not .. raw:: html

2. step: Used for the environment's simulation

**Input Parameters:** * :code:`action(numpy.int64)`: Action taken by the agent .. raw:: html
**Return Parameters:** * :code:`image(numpy.ndarray)`: The image returned for the next time step (already post-processed using :code:`process_state`) * :code:`reward(float)`: Reward for the current time step * :code:`done(bool)`: Whether the current environment has ended * :code:`self.SuppleInfo(list)`: Supplementary information (if :code:`Other_State` or :code:`CloudPoint` or :code:`RewardParams` is enabled, it will be retrieved here) .. raw:: html

2. reset: Used to restart the environment

**Input Parameters:** None **Return Parameters:** * :code:`image(numpy.ndarray)`: Same explanation as above * :code:`self.SuppleInfo(list)`: Same explanation as above .. raw:: html

.. _config: Part 2 Configuration File ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. raw:: html

Since the above environment interface has many parameters, this project provides a configuration file to simplify the configuration process. The content of the configuration file is as follows:

.. raw:: html

See the specific code in Alg_Base/rl_a3c_pytorch_benchmark/config.json

.. code:: json { "Benchmark": { "Need_render": 0, "Action_dim": 7, "State_size": 84, "State_channel": 3, "Norm_Type": 0, "Forward_Speed": 40, "Backward_Speed": -40, "Left_Speed": 40, "Right_Speed": -40, "CW_omega": 2, "CCW_omega": -2, "Control_Frequence": 125, "port_process": 39945, "end_reward": false, "end_reward_list": [ -20, 20 ], "scene": "citystreet", "weather": "day", "auto_start": true, "Other_State": false, "CloudPoint": false, "RewardParams": true, "RewardType": "default", "verbose": false, "delay": 20 }, "LunarLander": { "Need_render": 1, "Action_dim": 4, "State_size": 84, "State_channel": 3, "Norm_Type": 0 } } * Among them, :code:`"Benchmark"`: The corresponding parameters are the configuration parameters for the simulation environment proposed in this project. * :code:`"LunarLander"`: The parameter configuration for the Gym environment :code:`LunarLander`, used only for compatibility testing. * :code:`"Need_render"`: Select rendering mode (Gym environments require a rendering mode; **the default for this project is 0**). * :code:`"Action_dim"`: Dimension of the agent's action space. * :code:`"State_size"`: Size of the images output by the environment. * :code:`"State_channel"`: Number of channels in the images output by the environment (=3 for RGB images, =1 for grayscale images). * :code:`"Norm_Type"`: Normalization method for the images (default normalization to :code:`[0,1]`). * :code:`"Forward_Speed"`: Speed corresponding to the agent's forward action ( :math:`m/s`). * :code:`"Backward_Speed"`: Speed corresponding to the agent's backward action ( :math:`m/s`). * :code:`"Left_Speed"`: Speed corresponding to the agent's left movement action ( :math:`m/s`). * :code:`"Right_Speed"`: Speed corresponding to the agent's right movement action ( :math:`m/s`). * :code:`"CW_omega"`: Speed corresponding to the agent's clockwise action ( :math:`rad/s`). * :code:`"CCW_omega"`: Speed corresponding to the agent's counterclockwise action ( :math:`rad/s`). * :code:`"Control_Frequence"`: Frequency at which the agent operates (the simulator runs at 500 Hz, while the algorithm side generally runs at 125 Hz). * :code:`"port_process"`: Port specified for data transmission between processes. * :code:`"end_reward"`: Whether to enable additional end rewards. * :code:`"end_reward_list"`: List of additional end rewards (valid only when :code:`end_reward` is enabled). * :code:`"scene"`: Determines the current experimental scene (valid only when :code:`auto_start` is enabled). * :code:`"weather"`: Determines the weather of the current experimental scene (valid only when :code:`auto_start` is enabled). * :code:`"auto_start"`: Whether to automatically start the simulator. * :code:`"Other_State"`: Whether to enable additional state options on the simulator side. * :code:`"CloudPoint"`: Whether to enable the point cloud option on the simulator side. * :code:`"RewardParams"`: Whether to enable additional reward parameters options on the simulator side. * :code:`"RewardType"`: Specifies the type of reward (can choose different baseline algorithms for reward). * Includes :code:`"default", "E2E", "D-VAT"`, corresponding to :ref:`Built-in Reward`, [1]_, [2]_ reward setting * :code:`"verbose"(bool)`: Configures whether the environment outputs log files. * :code:`"delay"(int)`: Configures the duration the system waits for the map to load. .. raw:: html

.. _makeenv: Part 3 Constructor ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. raw:: html

Based on the above configuration file, this project directly provides a constructor to obtain an initialized environment. The usage of this function is as follows:

.. raw:: html

See the specific code in Alg_Base/rl_a3c_pytorch_benchmark/envs/environment.py in general_env()

.. code:: python # Generate a single-process environment (env_conf is the path to config.json) env, _ = general_env(env_id="Benchmark", env_conf=env_conf, arg_worker=1, process_idx=0) .. raw:: html

general_env: Generates an initialized environment based on the config.json file.

**Input Parameters:** * :code:`env_id(str)`: The name of the current environment (:code:`"Benchmark"` or :code:`"LunarLander"`). * :code:`env_conf(str)`: The path to the configuration file :code:`config.json`. * :code:`arg_worker(int)`: Total number of processes for the environment. * :code:`process_idx(int)`: The process index of the current environment. **Return Parameters:** * :code:`env`: Environment class. * :code:`process_func`: Environment post-processing function (for :code:`"Benchmark"`, there are no post-processing operations). .. raw:: html
.. _Gymnasium_Gym: 2 Gym Environment class ------------------------------------------------- Considering that users often build reinforcement learning algorithms based on `Gym `_ and `Gymnasium `_ environment interfaces, or utilize libraries such as `stable-baselines3 `_ for algorithm validation, this project encapsulates the aforementioned base environment class to provide fully compatible environment classes for `Gym `_ and `Gymnasium `_. .. raw:: html

See the specific code in Alg_Base/rl_a3c_pytorch_benchmark/envs/envs_parallel.py

Part1 UAV_VAT (Adapting to Gym Library) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. raw:: html

See the specific code in Alg_Base/rl_a3c_pytorch_benchmark/envs/gym_envs.py in the UAV_VAT(gym.Env) class

.. raw:: html

1. __init__: Initializes the gym environment (including initialization of action_space and observation_space).

**Input Parameters:** * :code:`arg_worker(int)`: Number of processes. * :code:`conf_path(list)`: Path to the environment configuration file. * :code:`env_conf_path(list)`: Path to the simulator configuration file. * :code:`process_idx(list)`: Current process index. .. raw:: html

2. reset: Resets the environment, returning the initial state.

**Return Parameters:** * :code:`state(np.ndarray)`: Returns the state (image). .. raw:: html

3. step: Environment interaction, inputs action, retrieves state information.

**Input Parameters:** * :code:`action(torch.tensor)`: Action. **Return Parameters:** * :code:`state(np.ndarray)`: Returns the state (image). * :code:`reward(float)`: Reward. * :code:`done(int)`: Whether the environment has ended. * :code:`info(dict)`: Information, including :code:`info["TimeLimit.truncated"]` which indicates whether the reset was due to step ending. .. raw:: html

Part 2 UAV_VAT_Gymnasium (Adapting to Gymnasium Library) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The Gymnasium environment is largely compatible with the Gym environment, and the following describes the differences from the Gym environment. .. raw:: html

See the specific code in Alg_Base/rl_a3c_pytorch_benchmark/envs/gym_envs.py in the UAV_VAT_Gymnasium(gymnasium.Env) class.

.. raw:: html

1. reset: Resets the environment, returning the initial state.

Input Parameters: :code:`seed(int)`: Random seed. Return Parameters: :code:`reset_info(dict)`: Reset environment information (default is empty). .. raw:: html

2. step: Environment interaction, inputs action, retrieves state information.

Return Parameters: :code:`truncated(bool)`: Whether the reset was due to step ending (equivalent to :code:`info["TimeLimit.truncated"]` in Gym environment). .. raw:: html

.. _SubprovecEnv: Part3 Parallel Environment Encapsulation ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. raw:: html

Stacked environment for parallel computation, enabling multiple environments to interact with multiple agents.

.. raw:: html

For specific code, see the ASYNC_SubprocVecEnv class in Alg_Base/rl_a3c_pytorch_benchmark/envs/async_vecenv.py. Also, see the SubprocVecEnv_TS class in Alg_Base/rl_a3c_pytorch_benchmark/envs/async_vecenv_ts.py.

**Existing Reinforcement Learning Library Parallel Class Features and Modifications in This Project** * Existing reinforcement learning libraries directly compatible with gym environments, such as the :code:`stable-baselines3` and :code:`tianshou` libraries, provide the parallel environment class :code:`SubprocVecEnv` , allowing users to directly encapsulate gym environments into parallel environments. * The above-mentioned :code:`SubprocVecEnv` environment class checks whether to execute ``reset`` immediately after executing ``step``, which may cause the ``step`` and :code:`reset` to be executed in the same cycle.(Not strictly aligned with :code:`step` and :code:`reset` as required by the project environment.) * Modification in this project: Strictly align :code:`step` with :code:`reset` to keep individual environment operations synchronized. * The :code:`SubprocVecEnv` class from the ``stable-baselines3`` library is modified to the ``Async_SubprocVecEnv`` class, and the ``SubprocVecEnv`` and ``Collector`` classes from the ``tianshou`` library are modified to ``SubprocVecEnv_TS`` and ``TS_Collector`` classes, respectively. * The differences between the original and modified implementations are shown in the figure below. .. raw:: html Comparison of Async_SubprocVecEnv before and after modifications

Comparison of Async_SubprocVecEnv before and after modifications

comparison diagram of SubprocVecEnv before and after modification
.. raw:: html

Part 4 Interface Invocation ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. raw:: html

For specific code, see the make_env() function in Alg_Base/rl_a3c_pytorch_benchmark/envs/envs_parallel.py.

.. raw:: html

make_env: Initializes multiple gym environments.

**Input Parameters:** * :code:`n_envs(int)`: Number of environments. * :code:`rank(int)`: Current environment index. * :code:`gym(bool)`: If :code:`True`, generates environments from the Gym library; if :code:`False`, generates environments from the Gymnasium library. * :code:`monitor(bool)`: Whether to wrap in the :code:`stable_baselines3.common.monitor.Monitor` class. **Test Code:** .. raw:: html

For specific code, see the gym_test() function and stableparallel_test() function in Alg_Base/rl_a3c_pytorch_benchmark/envs/envs_parallel.py.

If you wish to quickly test whether the implementation of the above environment class is correct, you can use the following code: 1. Single process (using only gym environment) .. code:: python from gym.envs.registration import register register( id='UAVVAT-v0', entry_point='gym_envs:UAV_VAT', ) gym_test() 2. Use stable-baselines3 for algorithm testing .. code:: python stableparallel_test(n_envs=16, gym=False) .. raw:: html

Note: When using the stable-baselines3 library for parallel training, invoke the above Async_SubprocVecEnv class.

3. Use tianshou for algorithm testing .. code:: python test_tianshou_env(args) .. raw:: html

Note: When using the tianshou library for parallel training, you need to invoke the above SubprocVecEnv_TS class.

.. _Parallel: 3 Parallel Environment Classes ------------------------------------------------- .. raw:: html

The design goal of the base environment class (BenchEnv_Multi) is multi-process asynchronous computation (e.g., A3C algorithm), calling the environment class to obtain data within individual processes.

To better utilize the GPU for parallel acceleration, synchronize the data of the above classes and execute the same commands in parallel, allowing the algorithm to operate in a single process on a synchronized parallel data flow, leading to this class (Envs).

This environment is primarily tailored to user-defined algorithm design needs (in scenarios where stable-baselines3 and other Gym-based environments are not used for algorithm implementation).

.. raw:: html

For specific code, see the Envs class in Alg_Base/rl_a3c_pytorch_benchmark/envs/envs_parallel.py.

Part 1 Envs() ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. raw:: html

For specific code, see the Envs class in Alg_Base/rl_a3c_pytorch_benchmark/envs/envs_parallel.py.

.. raw:: html

1. __init__: Initializes the Envs parallel environment (including the initialization of action_space and observation_space).

**Input Parameters:** * :code:`num_envs(int)`: Represents the concurrency level n. * :code:`env_list(list(env))`: Initialize the list returned by the :code:`GE(n)` function by storing a number of pre-constructed environment classes in the list * :code:`logs_show(bool)` ：Whether to collect reward data in one of the environments to draw a training curve .. raw:: html

2. reset ：reset environment, and return to the init environment

**Return Parameters：** * :code:`states` ：Returns a collection of states (images) for each environment .. raw:: html

3. step ：Environment interaction, input action, get status information

**Input Parameters：** * :code:`actions` ：Each Agent action **Return Parameters：** * :code:`states` ：Returns a collection of states (images) for each environment * :code:`rewards` ：The reward stack for each Agent * :code:`done` ：Whether each environment is in the end state .. raw:: html

Part2 Data format changes before and after encapsulation ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Comparison of data formats before and after encapsulation：** * Pre-package(BenchEnv_Multi): Data sampling： :code:`state:ndarray,reward:float,done:float= env.step(action:int)` Environment reset： :code:`state:ndarray = env.reset()` * After packaging(Envs), set the parallel number n： Data sampling： :code:`states:ndarray(n,),rewards:ndarray(n),done:ndarray(n)= env.step(action:ndarray(n))` Environment reset： :code:`states:ndarray(n,) = envs.reset()` .. raw:: html

4 Performance Evaluation Metrics ------------------------------------------------------------------------------------------ 1. Cumulative Reward CR ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The cumulative reward uses continuous rewards, and a cumulative calculation is performed to evaluate the performance of the intelligent body in keeping the target at the center of the image, as shown in the following formula: .. raw:: html

C R = \sum_{t = 1}^{E_{l}} r_{c t}

In the above formula, :math:`r_{ct}` represents the continuous reward value at step :math:`t`, :math:`E_{ml}` represents the maximum trajectory length, and :math:`E_t` represents the length of this trajectory. For a graphical representation of continuous reward calculation, see :ref:`Continuous Reward Diagram for Tracking Object`. 2. Tracking Success Rate TSR ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This metric uses discrete rewards to evaluate the proportion of successful tracking, calculated using the following formula: .. raw:: html

T S R = \frac{1}{E_{m l}} \sum_{t = 1}^{E_{t}} r_{d t} \times 100 %

In the above formula, :math:`r_{dt}` represents the discrete reward value at step :math:`t`, :math:`E_{ml}` is the maximum step length across batches, and :math:`E_l` represents the length of the current batch. Detail parameters see: :ref:`Built-in reward parameter configuration` For a graphical representation of discrete reward calculation, see :ref:`Discrete Reward Diagram for Tracking Object`. .. raw:: html

Reference: .. [1] Luo, Wenhan, et al. "End-to-end active object tracking and its real-world deployment via reinforcement learning." IEEE transactions on pattern analysis and machine intelligence 42.6 (2019): 1317-1332. .. [2] Dionigi, Alberto, et al. "D-VAT: End-to-End Visual Active Tracking for Micro Aerial Vehicles." IEEE Robotics and Automation Letters (2024).