allenai · ryspark · Aug 12, 2025 · Aug 12, 2025 · Aug 18, 2025 · Aug 18, 2025
diff --git a/.gitignore b/.gitignore
@@ -11,6 +11,7 @@ rslp/crop_type_mapping/csv/
 rslp/crop_type_mapping/geoparquets/
 rslp/mangrove/csv/
 log*.txt
+*.egg-info
 
 # for local finetuning runs
 /config.yaml

diff --git a/one_off_projects/2025_07_joint_finetune/README.md b/one_off_projects/2025_07_joint_finetune/README.md
@@ -0,0 +1,133 @@
+# Multitask learning with Helios
+
+This project contains the configs and scripts necessary to run multitask training on top of pretrained Helios models.
+
+The general approach to multitask learning is to create "maker configs" (which specify finetuning options like 
+datasets, architecture, hyperparameters, etc.) that get processed into "run configs" (which are standard `rslearn` configs)
+which define finetuning jobs. From there, finetuning runs as normal via `rslearn` infrastructure, and the resulting models
+can be loaded/tested with the scripts here.
+
+**Note**: many scripts are hardcoded to use the `ryanp` home directory, they need to be modified if other people use this.
+
+## Creating multidataset configurations
+
+Single-dataset finetuning configs are found in `configs/v2_*`. They are used as the building blocks for multitask maker
+configurations. To use them to create multitask run configs, you must provide a maker config (the basic structure of which is outlined below):
+
+ Key | Description |
+|-----|-------------|
+| `base_path` | Path to the base config. |
+| `output_path` | Path to the output. |
+| `dataset_cfgs` | List of paths to the dataset configs. These are found in `configs/v2_*`. If you want to specify multiple configs within a single dataset, use a nested list. |
+| `global_overrides` | Override global settings, like batch sizes across all datasets, or the number of workers, or actual trainer settings. Takes precedence over options from `base_path`. |
+| `local_overrides` | Local overrides for the dataset configs, like batch size for a single dataset. These will be collated and applied to the final config, but are lower in preference than global_overrides. For example,if you specify a local override batch size, it will be overriden if there is a global batch size override. |
+| `substitutions` | String substitutions for the base config, usually tied to a specific `helios` model, e.g. patch size, encoder embedding size, etc. Specify these here instead of when calling the finetuning script.|
+| `merge_options` | Options for label merging. See below.|
+
+Note that the base config is a way to share architecture/training parameters across different multitask runs (i.e., the `trainer`, `model`, etc. keys). Anything that goes in the base config can also just be moved to the `global_overrides` key.
+
+Note that you must set `model.init_args.model.init_args.lazy_decode=true`, and use `rslearn.models.multitask.MultiTaskMergedModel` or `rslearn.models.multitask.MultiTaskModel` and `rslearn.train.data_module.MultiDatasetDataModule` instead of `MultiTask` and `RslearnDataModule`.
+
+Once you've constructed a maker config, you can generate a `rslearn`-readable run config via `scripts/make_multidataset_config.py`.
+
+Note that the `dataset_cfgs` list specifies which single-dataset configs to bring into the multitask run config. **Since `dataset_cfgs` expects a list, the names of the resulting classes/decoders correspond exactly to the names of the final decoders from the single-dataset configs.** This means two things: 1) only one subtask per single-dataset config is supported, 2) whenever "dataset names" are mentioned here, these refer to these decoder names derived from the single-dataset configs.
+
+For example, suppose we specify `dataset_cfgs: [taskA.yaml]`. If `taskA.yaml` specifies a decoder structure with layers `[trunk, taskADecoder]`, then you should refer to the corresponding dataset as `taskADecoder` everywhere else in the maker config (e.g. if you want to merge task labels or something).
+
+It's probably easiest to start from an example maker config and then edit hte config from there: see `configs/2025_09_02_final/detect.yaml` for one.
+
+### Label merging
+
+The `merge_options` key allows you to merge labels, i.e. if we have two classification tasks with `N` and `M` classes each, use a single output softmax layer with `N + M` classes. To enable this option, use the following options:
+
+```yaml
+merge_options:
+  merge_heads: true
+  merge_task_labels: true
+  same_label_groups:
+```
+
+* `merge_heads` will merge head weights (i.e., FasterRCNN ROI, segmentation UNet, etc.
+* `merge_task_labels` will merge softmax labels, boudning box prediction labels, etc.
+* `same_label_groups` allows you to specify a list of dataset names that have the same label classes, so they don't get stacked and duplicated unnecessarily.
+
+If you set `merge_task_labels` to `true`, use `rslearn.models.multitask.MultiTaskMergedModel` as the model class. I have not tested `merge_task_labels: false` in a while, so it may not work. Generally, it seems `merge_task_labels: true` works well, I would recommend using this as default.
+
+### Task conditioning
+
+One feature of multitask learning in `rslearn` and `helios` is the ability to condition on task embeddings, generated by feeding natural language descriptions of tasks through a text embedding model. You can find the script to do this at `scripts/make_task_embeds.py`. The recommended usage is currently as follows:
+
+> `python make_task_embeds.py --anchor --instruct --truncate 256 --from_yaml /weka/dfive-default/ryanp/rslearn_projects/one_off_projects/2025_07_joint_finetune/data/tasks.yaml`
+
+If the above yaml file got deleted, use `/weka/dfive-default/rslearn-eai/data/task_descriptions.yaml` (it's a copy). In general, if the `ryanp` home directory is gone, there will be scripts that fail. They can be easily
+fixed.
+
+Once these are generated, you must use `rslp.helios.model.TaskConditionedHelios` as the encoder in the model config. This class permits a `model_overrides` and `task_embed_opts` argument, which should be used as following:
+
+```yaml
+model_overrides:
+    encoder_config:
+    task_lora_kwargs:
+        use_task_lora: true
+        task_lora_indices: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
+        gen_hidden: 64
+task_embed_opts:
+    type: "precomputed"
+    path: /weka/dfive-default/ryanp/rslearn_projects/one_off_projects/2025_07_joint_finetune/data/task_embeds___Qwen3-Embedding-8B__256d__anchor__instruct__from_yaml.pt
+```
+
+### Decoder trunk
+
+This is not explicitly tied to multitask learning but is often useful. Decoder trunks allow for shared, randomly-initialized layers like MoE transformer layers that are conditioned on learned task embeddings (not fixed as previously). Below is an example of how to use it:
+
+```yaml
+model:
+    class_path: rslearn.train.lightning_module.RslearnLightningModule
+    init_args:
+        model:
+        init_args:
+            trunk:
+            class_path: rslearn.models.trunk.DecoderTrunk
+            init_args:
+                task_embedding:
+                class_path: rslearn.models.task_embedding.TaskChannelEmbedding
+                init_args:
+                    encoder_embedding_size: 768
+                    add_spatial_embed: true
+                layers:
+                - class_path: rslp.helios.moe.MoETransformer
+                    init_args:
+                    dim: 768
+                    n_layers: 1
+                    n_heads: 12
+                    num_experts: 4
+                    num_slots: 4
+```
+
+## Running jobs
+
+Once you create a `rslearn` run config, run it normally with `launch_finetune` or something similar.
+
+## Evaluating and using multitask models
+
+Since multitask models are trained on multiple datasets, they must be modified to be used out of the box.
+
+### Trimming label-merged multitask models
+
+One simple way to do this is to chop off any weights associated with irrelevant tasks (e.g. if a multitask model is trained on tasks `A, B, C` and you want to trim it for task `C` only, then you can chop off the output weights that create predictions for `A` and `B` classes). To do this, use `scripts/unmerge_singletask_model.py`.
+
+Once you do this, you can use the script `scripts/do_eval_unmerged.py` to run evals on the trimmed model. However, this script is not very configurable and may be out of date. Also, especially for detection tasks where objects will be registered depending on an absolute probability threshold, a trimmed model will make predictions that aren't equivalent to the untrimmed model's predictions. It's recommended to do evals on the merged model without merging, as described in the next section.
+
+### Working with unmerged multitask models
+
+Evaluate these models with `scripts/do_eval_merged.py`, and compare them to single-dataset runs with `scripts/do_eval_sft.py`. If you want to run finetuning on a new dataset, use `scripts/submit_isolate_finetune.py`.
+
+Note that if you finetune a multitask model with fixed NLP task embeddings on a new dataset, there will be an error unless the new dataset has a fixed task embedding already registered in the task embedding file. If you are using learnable task embeddings (via `TaskChannelEmbedding` in the decoder trunk), the first embedding in the lookup table from multitask learning is used to initailize the new task embedding. You can control this via the `default_idx` in the `askChannelEmbedding` initializtaion.
+
+### Pretraining evals
+
+Use `scripts/ckpt_to_distributed.py` to convert a `rslearn` finetuned checkpoint to a distributed `helios` style checkpoint. This can be used for non-multitask models as well. Then, use the eval harness in `helios` (or write a new one), the generated checkpoint folder should work plug-in-play.
+
+### Measuring throughput
+
+Use `scripts/measure_throughput.py`. I will admit that this script was written mostly by ChatGPT so it might have some issues, it looks okay on a quick glance through.
diff --git a/one_off_projects/2025_07_joint_finetune/configs/2025_07_31_moe/OUT_classify.yaml b/one_off_projects/2025_07_joint_finetune/configs/2025_07_31_moe/OUT_classify.yaml
@@ -1029,7 +1029,7 @@ model:
           class_path: rslearn.models.trunk.DecoderTrunk
           init_args:
             layers:
-            - class_path: rslearn.models.trunk.MoETransformer
+            - class_path: rslp.helios.moe.MoETransformer
               init_args:
                 dim: 768
                 expert_mult: 2

diff --git a/one_off_projects/2025_07_joint_finetune/configs/2025_07_31_moe/OUT_detect.yaml b/one_off_projects/2025_07_joint_finetune/configs/2025_07_31_moe/OUT_detect.yaml
@@ -1529,14 +1529,13 @@ model:
           class_path: rslearn.models.trunk.DecoderTrunk
           init_args:
             layers:
-            - class_path: rslearn.models.trunk.MoETransformer
+            - class_path: rslp.helios.moe.MoETransformer
               init_args:
                 dim: 768
-                expert_mult: 2
                 n_heads: 12
-                n_layers: 2
-                num_experts: 8
-                num_slots: 1
+                n_layers: 1
+                num_experts: 4
+                num_slots: 4
             task_embedding:
               class_path: rslearn.models.task_embedding.TaskChannelEmbedding
               init_args:
@@ -1765,5 +1764,4 @@ trainer:
       - 0
       unfreeze_at_epoch: 20
       unfreeze_lr_factor: 10
-  limit_val_batches: 1024
   max_epochs: 200
diff --git a/one_off_projects/2025_07_joint_finetune/configs/2025_07_31_moe/OUT_segment.yaml b/one_off_projects/2025_07_joint_finetune/configs/2025_07_31_moe/OUT_segment.yaml
@@ -1217,7 +1217,7 @@ model:
           class_path: rslearn.models.trunk.DecoderTrunk
           init_args:
             layers:
-            - class_path: rslearn.models.trunk.MoETransformer
+            - class_path: rslp.helios.moe.MoETransformer
               init_args:
                 dim: 768
                 expert_mult: 2

diff --git a/one_off_projects/2025_07_joint_finetune/configs/2025_07_31_moe/classify.yaml b/one_off_projects/2025_07_joint_finetune/configs/2025_07_31_moe/classify.yaml
@@ -25,7 +25,7 @@ global_overrides:
                   encoder_embedding_size: 768
                   add_spatial_embed: true
               layers:
-                - class_path: rslearn.models.trunk.MoETransformer
+                - class_path: rslp.helios.moe.MoETransformer
                   init_args:
                     dim: 768
                     n_layers: 2

diff --git a/one_off_projects/2025_07_joint_finetune/configs/2025_07_31_moe/detect.yaml b/one_off_projects/2025_07_joint_finetune/configs/2025_07_31_moe/detect.yaml
@@ -31,16 +31,14 @@ global_overrides:
                   encoder_embedding_size: 768
                   add_spatial_embed: true
               layers:
-              - class_path: rslearn.models.trunk.MoETransformer
+              - class_path: rslp.helios.moe.MoETransformer
                 init_args:
                   dim: 768
-                  n_layers: 2
+                  n_layers: 1
                   n_heads: 12
-                  num_experts: 8
-                  num_slots: 1
-                  expert_mult: 2
+                  num_experts: 4
+                  num_slots: 4
   trainer:
-    limit_val_batches: 1024
     accumulate_grad_batches: 5
 
 merge_options:

diff --git a/one_off_projects/2025_07_joint_finetune/configs/2025_07_31_moe/segment.yaml b/one_off_projects/2025_07_joint_finetune/configs/2025_07_31_moe/segment.yaml
@@ -24,7 +24,7 @@ global_overrides:
                   encoder_embedding_size: 768
                   add_spatial_embed: true
               layers:
-              - class_path: rslearn.models.trunk.MoETransformer
+              - class_path: rslp.helios.moe.MoETransformer
                 init_args:
                   dim: 768
                   n_layers: 2

diff --git a/one_off_projects/2025_07_joint_finetune/configs/2025_08_06_isolate_sft/classify.yaml b/one_off_projects/2025_07_joint_finetune/configs/2025_08_06_isolate_sft/classify.yaml
@@ -24,7 +24,7 @@ global_overrides:
                   encoder_embedding_size: 768
                   add_spatial_embed: true
               layers:
-              - class_path: rslearn.models.trunk.MoETransformer
+              - class_path: rslp.helios.moe.MoETransformer
                 init_args:
                   dim: 768
                   n_layers: 1

diff --git a/one_off_projects/2025_07_joint_finetune/configs/2025_08_06_isolate_sft/detect.yaml b/one_off_projects/2025_07_joint_finetune/configs/2025_08_06_isolate_sft/detect.yaml
@@ -30,7 +30,7 @@ global_overrides:
                   encoder_embedding_size: 768
                   add_spatial_embed: true
               layers:
-              - class_path: rslearn.models.trunk.MoETransformer
+              - class_path: rslp.helios.moe.MoETransformer
                 init_args:
                   dim: 768
                   n_layers: 1

diff --git a/one_off_projects/2025_07_joint_finetune/configs/2025_08_06_isolate_sft/segment.yaml b/one_off_projects/2025_07_joint_finetune/configs/2025_08_06_isolate_sft/segment.yaml
@@ -25,7 +25,7 @@ global_overrides:
                   encoder_embedding_size: 768
                   add_spatial_embed: true
               layers:
-              - class_path: rslearn.models.trunk.MoETransformer
+              - class_path: rslp.helios.moe.MoETransformer
                 init_args:
                   dim: 768
                   n_layers: 1