Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ rslp/crop_type_mapping/csv/
rslp/crop_type_mapping/geoparquets/
rslp/mangrove/csv/
log*.txt
*.egg-info

# for local finetuning runs
/config.yaml
Expand Down
133 changes: 133 additions & 0 deletions one_off_projects/2025_07_joint_finetune/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
# Multitask learning with Helios

This project contains the configs and scripts necessary to run multitask training on top of pretrained Helios models.

The general approach to multitask learning is to create "maker configs" (which specify finetuning options like
datasets, architecture, hyperparameters, etc.) that get processed into "run configs" (which are standard `rslearn` configs)
which define finetuning jobs. From there, finetuning runs as normal via `rslearn` infrastructure, and the resulting models
can be loaded/tested with the scripts here.

**Note**: many scripts are hardcoded to use the `ryanp` home directory, they need to be modified if other people use this.

## Creating multidataset configurations

Single-dataset finetuning configs are found in `configs/v2_*`. They are used as the building blocks for multitask maker
configurations. To use them to create multitask run configs, you must provide a maker config (the basic structure of which is outlined below):

Key | Description |
|-----|-------------|
| `base_path` | Path to the base config. |
| `output_path` | Path to the output. |
| `dataset_cfgs` | List of paths to the dataset configs. These are found in `configs/v2_*`. If you want to specify multiple configs within a single dataset, use a nested list. |
| `global_overrides` | Override global settings, like batch sizes across all datasets, or the number of workers, or actual trainer settings. Takes precedence over options from `base_path`. |
| `local_overrides` | Local overrides for the dataset configs, like batch size for a single dataset. These will be collated and applied to the final config, but are lower in preference than global_overrides. For example,if you specify a local override batch size, it will be overriden if there is a global batch size override. |
| `substitutions` | String substitutions for the base config, usually tied to a specific `helios` model, e.g. patch size, encoder embedding size, etc. Specify these here instead of when calling the finetuning script.|
| `merge_options` | Options for label merging. See below.|

Note that the base config is a way to share architecture/training parameters across different multitask runs (i.e., the `trainer`, `model`, etc. keys). Anything that goes in the base config can also just be moved to the `global_overrides` key.

Note that you must set `model.init_args.model.init_args.lazy_decode=true`, and use `rslearn.models.multitask.MultiTaskMergedModel` or `rslearn.models.multitask.MultiTaskModel` and `rslearn.train.data_module.MultiDatasetDataModule` instead of `MultiTask` and `RslearnDataModule`.

Once you've constructed a maker config, you can generate a `rslearn`-readable run config via `scripts/make_multidataset_config.py`.

Note that the `dataset_cfgs` list specifies which single-dataset configs to bring into the multitask run config. **Since `dataset_cfgs` expects a list, the names of the resulting classes/decoders correspond exactly to the names of the final decoders from the single-dataset configs.** This means two things: 1) only one subtask per single-dataset config is supported, 2) whenever "dataset names" are mentioned here, these refer to these decoder names derived from the single-dataset configs.

For example, suppose we specify `dataset_cfgs: [taskA.yaml]`. If `taskA.yaml` specifies a decoder structure with layers `[trunk, taskADecoder]`, then you should refer to the corresponding dataset as `taskADecoder` everywhere else in the maker config (e.g. if you want to merge task labels or something).

It's probably easiest to start from an example maker config and then edit hte config from there: see `configs/2025_09_02_final/detect.yaml` for one.

### Label merging

The `merge_options` key allows you to merge labels, i.e. if we have two classification tasks with `N` and `M` classes each, use a single output softmax layer with `N + M` classes. To enable this option, use the following options:

```yaml
merge_options:
merge_heads: true
merge_task_labels: true
same_label_groups:
```

* `merge_heads` will merge head weights (i.e., FasterRCNN ROI, segmentation UNet, etc.
* `merge_task_labels` will merge softmax labels, boudning box prediction labels, etc.
* `same_label_groups` allows you to specify a list of dataset names that have the same label classes, so they don't get stacked and duplicated unnecessarily.

If you set `merge_task_labels` to `true`, use `rslearn.models.multitask.MultiTaskMergedModel` as the model class. I have not tested `merge_task_labels: false` in a while, so it may not work. Generally, it seems `merge_task_labels: true` works well, I would recommend using this as default.

### Task conditioning

One feature of multitask learning in `rslearn` and `helios` is the ability to condition on task embeddings, generated by feeding natural language descriptions of tasks through a text embedding model. You can find the script to do this at `scripts/make_task_embeds.py`. The recommended usage is currently as follows:

> `python make_task_embeds.py --anchor --instruct --truncate 256 --from_yaml /weka/dfive-default/ryanp/rslearn_projects/one_off_projects/2025_07_joint_finetune/data/tasks.yaml`

If the above yaml file got deleted, use `/weka/dfive-default/rslearn-eai/data/task_descriptions.yaml` (it's a copy). In general, if the `ryanp` home directory is gone, there will be scripts that fail. They can be easily
fixed.

Once these are generated, you must use `rslp.helios.model.TaskConditionedHelios` as the encoder in the model config. This class permits a `model_overrides` and `task_embed_opts` argument, which should be used as following:

```yaml
model_overrides:
encoder_config:
task_lora_kwargs:
use_task_lora: true
task_lora_indices: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
gen_hidden: 64
task_embed_opts:
type: "precomputed"
path: /weka/dfive-default/ryanp/rslearn_projects/one_off_projects/2025_07_joint_finetune/data/task_embeds___Qwen3-Embedding-8B__256d__anchor__instruct__from_yaml.pt
```

### Decoder trunk

This is not explicitly tied to multitask learning but is often useful. Decoder trunks allow for shared, randomly-initialized layers like MoE transformer layers that are conditioned on learned task embeddings (not fixed as previously). Below is an example of how to use it:

```yaml
model:
class_path: rslearn.train.lightning_module.RslearnLightningModule
init_args:
model:
init_args:
trunk:
class_path: rslearn.models.trunk.DecoderTrunk
init_args:
task_embedding:
class_path: rslearn.models.task_embedding.TaskChannelEmbedding
init_args:
encoder_embedding_size: 768
add_spatial_embed: true
layers:
- class_path: rslp.helios.moe.MoETransformer
init_args:
dim: 768
n_layers: 1
n_heads: 12
num_experts: 4
num_slots: 4
```

## Running jobs

Once you create a `rslearn` run config, run it normally with `launch_finetune` or something similar.

## Evaluating and using multitask models

Since multitask models are trained on multiple datasets, they must be modified to be used out of the box.

### Trimming label-merged multitask models

One simple way to do this is to chop off any weights associated with irrelevant tasks (e.g. if a multitask model is trained on tasks `A, B, C` and you want to trim it for task `C` only, then you can chop off the output weights that create predictions for `A` and `B` classes). To do this, use `scripts/unmerge_singletask_model.py`.

Once you do this, you can use the script `scripts/do_eval_unmerged.py` to run evals on the trimmed model. However, this script is not very configurable and may be out of date. Also, especially for detection tasks where objects will be registered depending on an absolute probability threshold, a trimmed model will make predictions that aren't equivalent to the untrimmed model's predictions. It's recommended to do evals on the merged model without merging, as described in the next section.

### Working with unmerged multitask models

Evaluate these models with `scripts/do_eval_merged.py`, and compare them to single-dataset runs with `scripts/do_eval_sft.py`. If you want to run finetuning on a new dataset, use `scripts/submit_isolate_finetune.py`.

Note that if you finetune a multitask model with fixed NLP task embeddings on a new dataset, there will be an error unless the new dataset has a fixed task embedding already registered in the task embedding file. If you are using learnable task embeddings (via `TaskChannelEmbedding` in the decoder trunk), the first embedding in the lookup table from multitask learning is used to initailize the new task embedding. You can control this via the `default_idx` in the `askChannelEmbedding` initializtaion.

### Pretraining evals

Use `scripts/ckpt_to_distributed.py` to convert a `rslearn` finetuned checkpoint to a distributed `helios` style checkpoint. This can be used for non-multitask models as well. Then, use the eval harness in `helios` (or write a new one), the generated checkpoint folder should work plug-in-play.

### Measuring throughput

Use `scripts/measure_throughput.py`. I will admit that this script was written mostly by ChatGPT so it might have some issues, it looks okay on a quick glance through.
Original file line number Diff line number Diff line change
Expand Up @@ -1029,7 +1029,7 @@ model:
class_path: rslearn.models.trunk.DecoderTrunk
init_args:
layers:
- class_path: rslearn.models.trunk.MoETransformer
- class_path: rslp.helios.moe.MoETransformer
init_args:
dim: 768
expert_mult: 2
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1529,14 +1529,13 @@ model:
class_path: rslearn.models.trunk.DecoderTrunk
init_args:
layers:
- class_path: rslearn.models.trunk.MoETransformer
- class_path: rslp.helios.moe.MoETransformer
init_args:
dim: 768
expert_mult: 2
n_heads: 12
n_layers: 2
num_experts: 8
num_slots: 1
n_layers: 1
num_experts: 4
num_slots: 4
task_embedding:
class_path: rslearn.models.task_embedding.TaskChannelEmbedding
init_args:
Expand Down Expand Up @@ -1765,5 +1764,4 @@ trainer:
- 0
unfreeze_at_epoch: 20
unfreeze_lr_factor: 10
limit_val_batches: 1024
max_epochs: 200
Original file line number Diff line number Diff line change
Expand Up @@ -1217,7 +1217,7 @@ model:
class_path: rslearn.models.trunk.DecoderTrunk
init_args:
layers:
- class_path: rslearn.models.trunk.MoETransformer
- class_path: rslp.helios.moe.MoETransformer
init_args:
dim: 768
expert_mult: 2
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ global_overrides:
encoder_embedding_size: 768
add_spatial_embed: true
layers:
- class_path: rslearn.models.trunk.MoETransformer
- class_path: rslp.helios.moe.MoETransformer
init_args:
dim: 768
n_layers: 2
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,16 +31,14 @@ global_overrides:
encoder_embedding_size: 768
add_spatial_embed: true
layers:
- class_path: rslearn.models.trunk.MoETransformer
- class_path: rslp.helios.moe.MoETransformer
init_args:
dim: 768
n_layers: 2
n_layers: 1
n_heads: 12
num_experts: 8
num_slots: 1
expert_mult: 2
num_experts: 4
num_slots: 4
trainer:
limit_val_batches: 1024
accumulate_grad_batches: 5

merge_options:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ global_overrides:
encoder_embedding_size: 768
add_spatial_embed: true
layers:
- class_path: rslearn.models.trunk.MoETransformer
- class_path: rslp.helios.moe.MoETransformer
init_args:
dim: 768
n_layers: 2
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ global_overrides:
encoder_embedding_size: 768
add_spatial_embed: true
layers:
- class_path: rslearn.models.trunk.MoETransformer
- class_path: rslp.helios.moe.MoETransformer
init_args:
dim: 768
n_layers: 1
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ global_overrides:
encoder_embedding_size: 768
add_spatial_embed: true
layers:
- class_path: rslearn.models.trunk.MoETransformer
- class_path: rslp.helios.moe.MoETransformer
init_args:
dim: 768
n_layers: 1
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ global_overrides:
encoder_embedding_size: 768
add_spatial_embed: true
layers:
- class_path: rslearn.models.trunk.MoETransformer
- class_path: rslp.helios.moe.MoETransformer
init_args:
dim: 768
n_layers: 1
Expand Down
Loading
Loading