- Install required packages using Conda:
conda env create -f requirements.yml conda activate contracting
- Create results directories:
mkdir gifs results experiment_paths
- Test the installation:
python runner.py --name "test-v1" --config_path "experiment_configs/test.json"
This codebase is built on RLlib and has a multi-layer structure to enable large benchmarking runs.
- Low Level - Config Files: Users need to create configuration files which specify the parameters for the jobs to be run. A single config file can be used to specify the parameters for multiple jobs to be run, as detailed in the section below.
- High Level - Scheduler: If multiple jobs are to be run, users have control over how they are scheduled. They can be run in parallel using python"s
multiprocessing
library or run sequentially. More details can be found in the section below.
A config file is a json
file that specifies the parameters for the jobs to be run. Configuration files are defined as a list of dictionaries:
-
The first dictionary in the list defines values for all jobs in a file.
-
The second dictionary contains hyperparameters. Keys are hyperparameter names, and values are lists of values. All combinations of all parameters will be run.
-
The remaining elements of the list can be used to specify any job-specific parameters. These job-specific parameters can also be used to override the global dictionary for a specific job.
The following image shows an example config file built using the rules described above.
Example: There are 6 possible permutations from the permutation dictionary. Each of these 6 permutations are then run on both Harvest and Cleanup environments, leading to a total of 12 jobs. Each job uses 16 workers, 15M timesteps and a batch size of 64000 as defined in the global dictionary. However, all jobs in Harvest use a batch size of 32000 due to the override. More configuration files can be found inside
experiment_configs/
[
{
"num_workers": 16,
"num_timesteps": 15000000,
"batch_size": 64000,
},
{
"num_agents":[8,4,2],
"separate": [true, false]
},
{
"environment": "cleanup_new",
},
{
"environment": "harvest_new",
"batch_size" : 32000
},
]
Config files allow for the specification of multiple jobs where each job requires a specified number of workers
- Sequentially: Jobs are run sequentially by default:
python runner.py --name "cleanup-complete" --config_path "experiment_configs/cleanup-contracting.json"
To run with gpu enabled, add --gpu
python runner.py --name "cleanup-complete" --config_path "experiment_configs/cleanup-contracting.json" --gpu
- In Parallel: Jobs in the config can be parallelized by adding the
--mp
flag and specifying the total number of workers available as well as the number of workers required per job. The following code runs 4 jobs parallely.
python runner.py --name "cleanup-complete" --config_path "experiment_configs/cleanup-contracting.json" --w_per_job 16 --workers 64 --mp
The following environments are available to train. The first column can be used in a configuration.
String Representation | Class | Description | Contract to Use (if contracting) |
---|---|---|---|
cleanup_new |
CleanupEnv |
image-based cleanup, adapted from here | CleanupContract |
cleanup |
CleanupFeatures |
feature-based cleanup, manually-designed features | CleanupContract |
harvest_new |
HarvestEnv |
image-based harvest, adapted from here | HarvestFeaturemodLocalContract |
harvest |
HarvestFeatures |
feature-based harvest, manually-designed features | HarvestFeaturemodLocalContract |
selfdrive |
SelfAcceleratingCarEnv |
self-driving merge domain | SelfdriveContractDistprop |
The following is a partial list of paramters, which are parsed in utils/ray_config_utils.py
. Additional parameters are defined in RLLib.
Argument | Type | Description | Default Value |
---|---|---|---|
num_timesteps |
int |
Training timesteps | required, 10M recommended |
num_workers |
int |
Number of parallel ray workers |
required, |
num_agents |
int |
Number of agents in the environment ( |
required |
batch_size |
int |
Number of timesteps to collect before each training update | required |
contract |
str |
Contract space to use | Required for contracting runs, refer contract/contract_list for available contracts |
wandb |
bool |
Logging on Weights and Biases | False |
separate |
bool |
No-contracting, separate training | False |
joint |
bool |
Joint training, single controller | False |
solver |
bool |
NegotiationSolver is used to find the proposed contract |
False |
shared_policy |
bool |
All agents share a policy | False |
num_renders |
int |
Generate renders at end of training for HarvestEnv /CleanupEnv
|
required |
horizon |
int |
Episode Length | Required, recommended is 1000 for HarvestEnv /CleanupEnv /SelfAcceleratingCarEnv
|
env_args |
dict |
Values passed to base environment | See environment definitions and sample configuration files |
model_params |
dict |
Values passed to the RL model | See run_training.py
|
Sample configuration files are in continuous_domain/experiment_configs
. The use of 8 workers per job in the configuration files is arbitrary and should be adjusted.
Config | Description | #Jobs |
---|---|---|
cleanup-baseline-2agents.json |
Runs CleanupEnv with 2 agents and no contracting |
1 |
cleanup-contracting-2agents.json |
Runs CleanupEnv with 2 agents and contracting |
1 |
cleanup-joint-2agents.json |
Runs the CleanupEnv with 2 agents and a joint controller |
1 |
cleanup-contracting.json |
Runs CleanupEnv with 2, 4, and 8 agents, and contracting |
3 |
harvest-contracting.json |
Runs HarvestEnv 2,4, and 8 agents, and contracting |
3 |
driving-contracting.json |
Runs SelfAcceleratingCarEnv with 2, 4, and 8 agents, and contracting |
3 |
contracting-full.json |
Runs CleanupEnv , HarvestEnv , and SelfAcceleratingCarEnv with 2, 4, and 8 agents |
9 |
baseline-full.json |
Runs CleanupEnv , HarvestEnv , and SelfAcceleratingCarEnv env with 2, 4, and 8 agents and no contracting |
9 |
cleanup-old-contracting-2agents.json |
Runs contracting in cleanup env with 2 agents | 1 |
harvest-old-contracting-2agents.json |
Runs contracting in old feature-based harvest env with 2 agents | 1 |
Example: The following command runs contracting on all environments in parallel using a GPU and 32 workers, with eight workers per job.
python runner.py --name "contracting-full-v1" --config_path "experiment_configs/contracting-full.json" --w_per_job 8 --workers 32 --mp --gpu
The following files are important for the codebase:
runner.py
: Initializes jobs.experiments_handles/contracts_experiment_handle.py
: Parses the config, and callsrun_training.py
.run_training.py
: Implements the training pipeline.utils/ray_config_utils.py
: RL model configuration parameters are processed in this file.contract/contract_list.py
: Implements contract spaces.environments/two_stage_train.py
: Defines the contracting augmentation as wrappers.
Weights and biases is integrated with the codebase and is recommended to visualize training results as well as other performance metrics.
In the contracting algorithm presented in the original Formal Contracting Mitigates Social Dilemmas in Multi-Agent RL, a reinforcement-learning-based algorithm was implemented. This repository implements a second solver.
SeparateContractNegotiationStage
follows the method described in the paper.NegotationSolver
The solver uses the agent"s frozen value functions to find the contract that maximizes welfare subject to the constraint that all agents accept the contract. Addingsolver=true
in the config file enables this for contracting.
If you want to cite this repository accademic work, please use the following citation:
@inproceedings{contracts,
author = {Christophersen, Philip and Haupt, Andreas and Hadfield-Menell, Dylan},
title = {Getting it in Writing: Formal Contracting Mitigates Social Dilemmas in Multi-Agent Reinforcement Learning},
year = {2022},
booktitle = {Proceedings of the 22nd International Conference on Autonomous Agents and Multiagent Systems},
}