An extension of HtFLlib for on-device HtFL deployment using real devices, based on Flower and CoLExT.
dataset
: Realistically and naturally distributed datasets sourced generated by the codes from PFLlib. You can find more raw data here- HAR (Human Activity Recognition) (30 clients, 6 labels)
- PAMAP2 (9 clients, 12 labels)
- iWildCam (194 camera traps, 158 labels)
system
: Flower servers and clients. Here are supported HtFL frameworks:- Partial-heterogeneity-based HtFL
- LG-FedAvg — Think Locally, Act Globally: Federated Learning with Local and Global Representations 2020
- FedGen — Data-Free Knowledge Distillation for Heterogeneous Federated Learning ICML 2021
- FedGH — FedGH: Heterogeneous Federated Learning with Generalized Global Header ACM MM 2023
- Full-heterogeneity-based HtFL
- FD — Communication-Efficient On-Device Machine Learning: Federated Distillation and Augmentation under Non-IID Private Data 2018
- FML — Federated Mutual Learning 2020
- FedKD — Communication-efficient federated learning via knowledge distillation Nature Communications 2022
- FedProto — FedProto: Federated Prototype Learning across Heterogeneous Clients AAAI 2022
- FedTGP — FedTGP: Trainable Global Prototypes with Adaptive-Margin-Enhanced Contrastive Learning for Data and Model Heterogeneity in Federated Learning AAAI 2024
- Partial-heterogeneity-based HtFL
- Distribute and store realistic datasets (
.npz
files) on all devices using a designated strategy (to be determined). Please read theconfig.json
file for the details of each dataset. Each.npz
file can be directly used by PyTorch's DataLoader. - Run
./remap_labels.py
to re-map the labels into consecutive integers, and get the total number of labels to set--num_classes
. - For each HtFL frameworks, deploy the
./system/servers/serverNAME.py
and./system/clients/clientNAME.py
(with./system/utils
) to the workstation and devices, respectively. The server and client models will be saved to theircheckpoints
folders, respectively. - Execute the server file with the appropriate configurations (
argparse
), which varies by HtFL framework. - Execute the client file with the appropriate configurations (
argparse
), which varies by HtFL framework. Real data loading not implemented yet. - Checkpoints are stored locally on clients in
args.save_folder_path
, with timestamps used by default.
Deploying experiments on CoLExT requires the specification of a CoLExT config file. An example is shown below. Additional examples will be placed inside colext_experiments/
.
Important notes:
- Before running the client Python code, client devices self-assign their own data using the
./config_device_data.sh
script. The script assigns data partitions based on the client ID provided by CoLExT. For example, the "identify" strategy assigns partition x to client x. - It's possible to specify additional arguments to a particular client group, allowing different client groups to receive different arguments.
Example config:
# colext_experiments/example_config.yaml
project: htfl-ondevice
code:
# Path to the code root dir, relative to the config file
# Defaults to the config file dir if omitted
# The working directory is set to this path
path: "../"
client:
command: >-
./config_device_data.sh ${COLEXT_DATASETS}/iWildCam identity &&
python3 -m system.clients.clientLG
--server_address=${COLEXT_SERVER_ADDRESS}
--num_classes=158
server:
command: >-
python3 -m system.servers.serverLG
--min_fit_clients=${COLEXT_N_CLIENTS}
--min_available_clients=${COLEXT_N_CLIENTS}
--num_rounds=10
clients:
- dev_type: JetsonAGXOrin
count: 2
# Add additional arguments this client group
add_args: "--model=ResNet101"
- dev_type: OrangePi5B
count: 2
add_args: "--model=ResNet18"
- dev_type: OrangePi5B
# If `count` is not specified, it's assumed to be 1
add_args: "--model=ResNet34"
For an up-to-date reference on the CoLExT config file, refer to CoLExT's README.
Interacting with CoLExT:
# Launch an experiment
colext_launch_job -c colext_experiments/example_config.yaml
# Collect metrics
colext_get_metrics -j <job_id>
To debug an experiment, the CoLExT config can be run locally on the CoLExT server using CoLExT's local Python deployer.
colext_launch_job -c colext_experiments/example_config.yaml --deployer=local_py
# Experiment logs are collected on the current working directory
Easy to run benchmarks are available in colext_experiments/benchmarks
:
resnet_models/
: benchmark several resnet models across all devices typesdevs_w_same_energy_efficiency/
: benchmark all HtFL frameworks using a model per device that ensures a similar energy budget for each device. The energy threshold was selected to be the median energy spent in a round by the fastest device based on the theresnet_models
benchmark.
$ cd colexT_experiments/benchmark
$ ./run_benchmark.sh <path_to_benchmark_folder>
# Calls <benchmark_folder>/gen_configs.py to generate CoLExT configs
# Outputs configs to `<benchmark_folder>/output/colext_configs`
# Launches jobs based on each config and records the job id
# Writes job ids to `<benchmark_folder>/output/output_job_id_maps.txt`
# After benchmark is finished, plot the results
$ python3 plot_benchmark.py <path_to_benchmark_folder>
# Creates plots based on the job ids from `<benchmark_folder>/output/output_job_id_maps.txt`
# Plots are written to `<benchmark_folder>/output/plots`
Benchmarks consist of running multiple colext_config.yaml
files. To avoid manually generating these files, each benchmark folder contains a gen_configs.py
script. This script generates all the necessary configuration files and outputs them to <benchmark_folder>/output/colext_configs
. This script should contain all the logic required to create the configuration files for the benchmark. Refer to existing benchmark folders for examples.
The gen_configs.py
file is all that's needed to create a benchmark.
With the file in place we can run the benchmark according to How to run a benchmark.