Skip to content

Conversation

@favyen2
Copy link
Collaborator

@favyen2 favyen2 commented Oct 7, 2025

This is code for OlmoEarth fine-tuning evaluations. Because we have 11 baselines and 12 tasks to compare, we want to avoid having to maintain one config per task per baseline (n^2), instead it should just be one per task and one per baseline.

This adds new rslp.olmoearth_evals module which provides some extra infrastructure to make all of the models accept a consistent input. For each model, there is some code there to get the model architecture for a given task and output shape, and also to get whatever model-specific normalizations or band re-ordering that might be needed.

Then the data/helios_v3/tasks/ configs all load data for each task in the same consistent way. An exception is needed for PASTIS since it only has a subset of bands, and some models can input the subset of bands instead of needing imputation, but otherwise it mostly works since most of the tasks are materialized using rslearn.

The launcher is in data/helios_v3/run.py and data/helios_v3/README.md provides some documentation about it. Also there are model-specific configs but they basically just configure freezing/unfreezing. The model is passed to rslp.olmoearth_evals.eval_adapter via environment variable by the launcher.

Also some new code is added to assign splits specifically for this evaluation.

This depends on allenai/rslearn#319 and allenai/rslearn#320 and allenai/rslearn#324

@favyen2 favyen2 requested a review from yawenzzzz October 7, 2025 21:53
Copy link
Collaborator

@yawenzzzz yawenzzzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! This is really nice, just one small thing - you can remove the forest_loss_driver*.yaml from models

@favyen2
Copy link
Collaborator Author

favyen2 commented Oct 9, 2025

LGTM! This is really nice, just one small thing - you can remove the forest_loss_driver*.yaml from models

I thought I need it because with forest loss driver, the model architecture is a bit different (adds another level of SimpleTimeSeries), so the specification of what gets frozen has to be a bit different.

I think the model configs could be consolidated though, I think there are just two categories (except Satlas which uses it to restore the model weights as well).

@yawenzzzz
Copy link
Collaborator

btw, I think it’s better to include the Nandi and AWF tasks here, and their Sentinel-2 ts configs are ready:
https://github.com/allenai/rslearn_projects/blob/master/data/helios/v2_nandi_crop_type/finetune_s2_20251002.yaml
https://github.com/allenai/rslearn_projects/blob/master/data/helios/v2_awf_lulc/finetune_s2_20251005.yaml

This setup is much better for testing time-series and multi-modal performance. In Helios, I’m mostly sweeping learning rates (with cosine decay and patience), so those experiments don’t really show the ts/mm advantages, which are important for crop type and land cover mapping.

To keep things consistent, we can remove Nandi and AWF from the KNN experiments. That way, the fine-tuning covers both (1) research benchmarks (the ones we do KNN/LP) and (2) real-world tasks.

@favyen2 favyen2 merged commit 9d7de17 into master Oct 10, 2025
4 checks passed
@favyen2 favyen2 deleted the favyen/20251007-olmoearth-eval branch October 10, 2025 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants