Improving Reinforcement Learning Agents' Performance and Memory Efficiency in 3D Environment via Semantic Segmentation
An Informatics MSc Project at University of Edinburgh
Some gameplay videos are made available on Google Drive in 256x144 resolution (256x288 for SS+RGB stacked together):
- PPO Agent with SS(4)/RGB/RGB+SS input (uses DeepLabV3 with ResNet-101 backbone for SS, frame stack of 4 for SS(4))
- [Old recordings] PPO Agent with RGB+SS input (uses DeepLabV3 with ResNet-101 backbone for SS, no frame stack)
Here's my favourite episode:
rtss_map1_ep5.mp4
- train_models.py - runs the training session
- eval_models.py - collects data from evaluation episodes (game frames, position, etc.)
- tasks_eval.py - defines what tasks to be executed by
tasks_eval.py
- playit.py - play a scenario yourself
- create_run.py - a sketchy solution to create temporary scripts for train/eval tasks
- clean_up.py - cleans temporary scripts
DeepLabV3 with ResNet-101 backbone:
test_resnet101.mp4
DeepLabV3 with MobileNet-V3 backbone:
test_mobilenetv3.mp4
Utilizing DQN, DRQN and Prioritized Experience Replay to train an agent for playing Doom. Scenarios tested:
- Deathmatch (modified): a modified version of deathmatch scenario with different map layout and texture where pickups are removed and killing enemies restore health/armor/ammo
- Deadly Corridor (modified): a more Doom-ish version of deadly corridor scenario where the player starts with a shotgun and no longer takes double damage
- Deadly Corridor (original): the classic deadly corridor scenario included in ViZDoom
This project was developed using Python 3.10.9 on a laptop with a CUDA-enabled graphics card. Only ~1.6GB of video memory is needed according to my testing, so it should be able to run even on entry-level Nvidia graphics cards like MX 150/GT 1030. Remove all .cuda() function calls if trained using CPU instead.
ViZDoom 1.2 now supports Python 3.11 and PyTorch 2.0 shouldn't break support, upgrading these might be desirable. The requirements.txt provided still uses my current environment as I haven't had the time to test. Support for automatic mixed-precision floating-point tensors can be enabled with torch.amp, should speed-up training at the cost of slight reduction in precision, but I haven't had the time to test that either.
This agent was trained using ViZDoom: https://github.com/Farama-Foundation/ViZDoom
P.S. If you want to change the episode timeout setting to > 1050 for any scenario at Nightmare difficulty (skill level 5), note that enemies respawn after 30 seconds (1050 ticks) unless the Thing_Remove function is called in your ACS script to remove them or they die of special ways specifically defined in Doom's source code.