Skip to content

Conversation

satvshr
Copy link
Collaborator

@satvshr satvshr commented Aug 25, 2025

closes #141

This PR:

  • Fixes a small bug in AptaNetPipeline
  • Makes AptaNetPipeline inherit from BaseObject to prevent errors during benchmarking
  • A csv loader
  • Removes an unnecessary test (test_pfoa), the loader is already being tested in test_loaders
  • The benchmarking framework

@satvshr satvshr requested a review from fkiraly September 21, 2025 13:28
@fkiraly
Copy link
Contributor

fkiraly commented Sep 25, 2025

can you kindly summarize how requests were addressed and what the changes since last review are?

@satvshr
Copy link
Collaborator Author

satvshr commented Sep 26, 2025

The changes made were the ones requested, so:

  1. The class now only take 3 arguments, X, y, and cv.
  2. The csv loader returns dataframes instead of numpy arrays and bunches now.
  3. Added an example to show what output the run method yields.
  4. Added tests.
  5. Removed task_check and metaclass.

Copy link
Contributor

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

May I request to add a short notebook with a small benchmarking experiment and some reasonably chosen dataset?
I think that is important to to understand your design as well as for review.

Copy link
Contributor

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want to do this in a different PR or this one? There are pros/cons for either option. In this one, it might allow to spot bugs which would otherwise need to be fixed in additional PR.

@satvshr
Copy link
Collaborator Author

satvshr commented Sep 29, 2025

In this one, it might allow to spot bugs which would otherwise need to be fixed in additional PR.

I would rather do it in a separate issue (#164), I can add bug fixing as part of the notebook PR as well.

@satvshr satvshr requested a review from fkiraly September 29, 2025 20:59
@satvshr satvshr changed the title [ENH] Benchmarking framework [ENH] Benchmarking framework and csv loader Sep 30, 2025
from pyaptamer.datasets._loaders import load_pfoa_structure


def test_pfoa_loader():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we deleting this file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in the description of the PR

@fkiraly
Copy link
Contributor

fkiraly commented Sep 30, 2025

could you kindly make sure you write a good PR description in the first post?

@satvshr satvshr requested a review from fkiraly September 30, 2025 10:13
@satvshr
Copy link
Collaborator Author

satvshr commented Sep 30, 2025

could you kindly make sure you write a good PR description in the first post?

Done.

Copy link
Contributor

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are making changes to AptaNetPipeline - is this required for benchmarking? Would this not interact with other PRs, e.g., #153?

I would recommend to make this a separate PR.

@NennoMP
Copy link
Collaborator

NennoMP commented Oct 2, 2025

While investigating #144, I noticed that the benchmark class performs cross-validation on the given dataset (training data) to generate validation splits on which the model is then evaluated. However, the purpose of cross-validation is to maximize the use of training data during model selection to select the best hyperparameters. Here there is no model selection though.

Shouldn't the purpose of the benchmarking class be to train the model on training data using the best hyperparameters previously identified, and then perform a final evaluation on held-out test data (i.e., the test_li2014.csv file from AptaTrans)?

@satvshr
Copy link
Collaborator Author

satvshr commented Oct 6, 2025

is this required for benchmarking?

If you read the PR description, the BaseObject part is required for benchmarking, the bug on the other hand should be patched and is small so I thought id put it here too.

@satvshr
Copy link
Collaborator Author

satvshr commented Oct 6, 2025

Shouldn't the purpose of the benchmarking class be to train the model on training data using the best hyperparameters previously identified, and then perform a final evaluation on held-out test data (i.e., the test_li2014.csv file from AptaTrans)?

We can use StratifiedSplit if we want a train-test split for evaluation, and "generic" cv to ensure a certain split is not the cause for one model performing better than the other one, and we can get a more accurate result by evaluating it over n iterations.

@satvshr satvshr requested a review from fkiraly October 8, 2025 14:01
@fkiraly
Copy link
Contributor

fkiraly commented Oct 8, 2025

is this required for benchmarking?

If you read the PR description, the BaseObject part is required for benchmarking, the bug on the other hand should be patched and is small so I thought id put it here too.

What exactly is the bug?

@fkiraly
Copy link
Contributor

fkiraly commented Oct 8, 2025

While investigating #144, I noticed that the benchmark class performs cross-validation on the given dataset (training data) to generate validation splits on which the model is then evaluated. However, the purpose of cross-validation is to maximize the use of training data during model selection to select the best hyperparameters. Here there is no model selection though.

Shouldn't the purpose of the benchmarking class be to train the model on training data using the best hyperparameters previously identified, and then perform a final evaluation on held-out test data (i.e., the test_li2014.csv file from AptaTrans)?

@NennoMP, re-sampling or CV can be used both in benchmarking and in tuning - it is even possible to combine both, to benchmark a model including the tuning algorithm. For benchmarking, one has to be careful about error bars etc, the CV-based error bars are not reliable (due to sample correlation from the cv splits).

@satvshr
Copy link
Collaborator Author

satvshr commented Oct 9, 2025

What exactly is the bug?

FunctionTransformer takes a dictionary as input to kw_args, the test was only passing as I was not giving an input to kw_args. Once I did that, tests started failing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ENH] Benchmarking framework

3 participants