Skip to content

Conversation

@sander-willems-bruker
Copy link
Contributor

@sander-willems-bruker sander-willems-bruker commented Apr 17, 2025

Brief summary:

  • This PR is mostly a refactor of feature alignment and (lfq) quantification. Previously, this was all triggered by the predict_rt and lfq flags. Here we, disentangled and added a specific align_rt and predict_im flag.
  • Furthermore, we allow the lfq and rt_alignment modules to work with FeatureTraits rather than Features, such that they can easily be called seperately from outside of sage.
  • Finally, we added the option to disable mbr if required.

New/changed input parameters

  • changed predict_rt: This previously controlled alignment, im prediction and rt prediction. Furthermore it was a prerequisite for lfq. This has been refactored to be a stand alone option. It purely triggers rt prediction now, which internally only ends up being used for FDR calculations (due to delta_rt)
  • new predict_im: Same as the new predict_rt but then for im. Also only ends up being used in FDR calculations due to delta_im
  • new align_rt: Purely responsible to trigger rt alignment. Note that if disabled, an rt_aligned is still calculated for every feature, but this is done on a per run basis and as such only rescales rt values to be aligned_rt` values within the range [0,1]. As such, this is essentially just a per run normalization of the rt.
  • new mbr (within the /quant/lfq_settings group: This does quantification on a per run basis only, i.e. only quantifies a feature if it is identified within a run
  • NOTES:
    • When predict_rt is enabled, align_rt will always/automatically be enabled too!
    • When mbr and lfq are enabled, align_rt will always/automatically be enabled too!
    • The above always/automatically enabling is not strictly required, but not doing so will likely lead to poor(er) results.

New traits:

  • AlignableFeature: A minimal set of fields required to be able to perform rt alignments.
  • QuantifiableFeature: A minimal set of fields required to be able to perform lfq quantification.
  • CompositionDatabase A convenience tratit to be able to calculate a Composition for a QuantifiableFeature.

New functionality:

  • Option to disable match between runs. This is implemented somewhat inefficient to make sure that if MBR is enabled it is 100% backwards compatible with previous sage.
  • The new trais allow to easier control alignment and quant from outside of sage.

An example of what we want to achieve with this PR is to do e.g. the following outside of sage with a custom set of identified features:

    let mut features = experiment.get_features();
    let database = database::Database::from_features(&features);
    let alignments =
        sage_core::ml::retention_alignment::global_alignment(&mut features, experiment.len());
    let mut areas = sage_core::lfq::quantify(
        &lfq_settings,
        precursor_charge,
        &database,
        &features,
        &ms1_spectra,
        alignments,
    );

Copy link
Owner

@lazear lazear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall - only complaints are the additional traits.

}
}

pub trait QuantifiableFeature: Clone {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there plans to implement this trait for other data types? I understand the desire to add some polymorphism, but if there is only a single implementer it seems a bit unnecessary to me.

Copy link
Contributor Author

@sander-willems-bruker sander-willems-bruker May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in Sage itself, but in our case we call Sage as a library and implement the trait on our own features. Our internal features have several different fields than those in Sage (as they can use other identification algorithms). Rather than copying our internal features into sage features and having to mock multiple columns as we do not have data for them (e.g. poisson scores), the Quant and Align trait allows us to use Sage as natively as possible on our own data. See also the last section in the PR description above. The let mut features = experiment.get_features(); is where we of course have implemented the trait on our own custom features.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a pretty exotic use-case, considering that Sage's LFQ implementation is not particularly advanced. I would prefer for you to mock multiple columns and just impl Into<Feature> etc on your own structs, rather than introducing some extra complexity and abstraction into the codebase.

}

alignments
pub trait AlignableFeature: Send + Sync {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above comment about QuantifiableFeature

log::info!("performing LFQ without MBR");
let mut final_areas = FnvHashMap::default();
// MS1Spectra, features and aligments are assumed to come from the same files and all be not empty
for (file_id, (local_ms1_spectra, local_features)) in ms1_spectra
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an elegant way of doing it while still maintaining back-compatibility (I am also OK with breaking it).
I think we can do this without cloning features and collecting into intermediate Vecs.

Copy link
Contributor Author

@sander-willems-bruker sander-willems-bruker May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not convinced about elegant, it is a clunky implementation but it is indeed backwards compatible (I purposefully set it up like that) and shoul be fully functional. I agree we can probably do this far more efficiently. Especially for Bruker data I would/could/should change up the way this is called significantly, but I can imagine that you want to retain some neutrality in sage with regards to vendor. That said, If you want me to look into improving this for at least the Bruker case, I am more than happy to provide you with a reference implementation that can serve as inspiration for mzml as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants