OlmoEarth fine-tuning evaluations #226

favyen2 · 2025-10-07T21:53:48Z

This is code for OlmoEarth fine-tuning evaluations. Because we have 11 baselines and 12 tasks to compare, we want to avoid having to maintain one config per task per baseline (n^2), instead it should just be one per task and one per baseline.

This adds new rslp.olmoearth_evals module which provides some extra infrastructure to make all of the models accept a consistent input. For each model, there is some code there to get the model architecture for a given task and output shape, and also to get whatever model-specific normalizations or band re-ordering that might be needed.

Then the data/helios_v3/tasks/ configs all load data for each task in the same consistent way. An exception is needed for PASTIS since it only has a subset of bands, and some models can input the subset of bands instead of needing imputation, but otherwise it mostly works since most of the tasks are materialized using rslearn.

The launcher is in data/helios_v3/run.py and data/helios_v3/README.md provides some documentation about it. Also there are model-specific configs but they basically just configure freezing/unfreezing. The model is passed to rslp.olmoearth_evals.eval_adapter via environment variable by the launcher.

Also some new code is added to assign splits specifically for this evaluation.

This depends on allenai/rslearn#319 and allenai/rslearn#320 and allenai/rslearn#324

…l1 error

yawenzzzz

LGTM! This is really nice, just one small thing - you can remove the forest_loss_driver*.yaml from models

favyen2 · 2025-10-09T16:23:27Z

LGTM! This is really nice, just one small thing - you can remove the forest_loss_driver*.yaml from models

I thought I need it because with forest loss driver, the model architecture is a bit different (adds another level of SimpleTimeSeries), so the specification of what gets frozen has to be a bit different.

I think the model configs could be consolidated though, I think there are just two categories (except Satlas which uses it to restore the model weights as well).

yawenzzzz · 2025-10-09T23:20:07Z

btw, I think it’s better to include the Nandi and AWF tasks here, and their Sentinel-2 ts configs are ready:
https://github.com/allenai/rslearn_projects/blob/master/data/helios/v2_nandi_crop_type/finetune_s2_20251002.yaml
https://github.com/allenai/rslearn_projects/blob/master/data/helios/v2_awf_lulc/finetune_s2_20251005.yaml

This setup is much better for testing time-series and multi-modal performance. In Helios, I’m mostly sweeping learning rates (with cosine decay and patience), so those experiments don’t really show the ts/mm advantages, which are important for crop type and land cover mapping.

To keep things consistent, we can remove Nandi and AWF from the KNN experiments. That way, the fine-tuning covers both (1) research benchmarks (the ones we do KNN/LP) and (2) real-world tasks.

OlmoEarth fine-tuning evaluations.

dae12d3

favyen2 requested a review from yawenzzzz October 7, 2025 21:53

uakfdotb and others added 14 commits October 7, 2025 18:40

fix inputs for solar farm tasks

d7d1b16

move launcher from data/helios_v3 to rslp/olmoearth_evals

143b649

fix panopticon issue

e8e91ae

fix readme

ffc942a

fix copernicusfm and panopticon normalization

3368317

rename data/helios_v3 to data/olmoearth_evals

f5a84ee

correct the comment about panopticon input size

0d91064

add dinov3 and galileo

85d7dfb

fix band order for timeseries tasks and minimize instead of maximize …

56a48bc

…l1 error

add lfmc and mangrove tasks

22e354a

Add landsat vessels and forest loss driver tasks

6cf8828

add doc for the subset of tasks/models we actually want to run

3d4a590

add titan and allow customizing project name

4588671

a few last fixes/updates

d84a793

yawenzzzz approved these changes Oct 9, 2025

View reviewed changes

support overriding clusters and fix dinov3 oom

a735529

Merge branch 'master' into favyen/20251007-olmoearth-eval

259bfb9

favyen2 merged commit 9d7de17 into master Oct 10, 2025
4 checks passed

favyen2 deleted the favyen/20251007-olmoearth-eval branch October 10, 2025 14:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OlmoEarth fine-tuning evaluations #226

OlmoEarth fine-tuning evaluations #226

Uh oh!

favyen2 commented Oct 7, 2025 •

edited

Loading

Uh oh!

yawenzzzz left a comment

Uh oh!

favyen2 commented Oct 9, 2025

Uh oh!

yawenzzzz commented Oct 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

OlmoEarth fine-tuning evaluations #226

OlmoEarth fine-tuning evaluations #226

Uh oh!

Conversation

favyen2 commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yawenzzzz left a comment

Choose a reason for hiding this comment

Uh oh!

favyen2 commented Oct 9, 2025

Uh oh!

yawenzzzz commented Oct 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

favyen2 commented Oct 7, 2025 •

edited

Loading