Skip to content

Enable datasets with custom formats #615

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 36 commits into
base: develop
Choose a base branch
from

Conversation

AnuradhaKaruppiah
Copy link
Contributor

@AnuradhaKaruppiah AnuradhaKaruppiah commented Aug 10, 2025

Description

Closes: #352
This PR enables custom dataset formats by adding a new EvalDatasetCustomConfig class that allows users to specify custom Python functions for dataset transformation. The implementation provides a standardized interface for custom parsers while maintaining compatibility with existing functionality.

Adds EvalDatasetCustomConfig class for custom dataset handling
Implements custom function loading and execution with error handling
Adds preprocessing utilities to apply standard filters and transformations
Includes example implementation and documentation

By Submitting this PR I confirm:

  • I am familiar with the Contributing Guidelines.
  • We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
    • Any contribution which contains commits that are not Signed-Off will not be accepted.
  • When the PR is ready for review, new or existing tests cover these changes.
  • When the PR is ready for review, the documentation is up to date with these changes.

Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
@AnuradhaKaruppiah AnuradhaKaruppiah added improvement Improvement to existing functionality non-breaking Non-breaking change labels Aug 10, 2025
@AnuradhaKaruppiah AnuradhaKaruppiah changed the title Enable custom datasets Enable datasets with custom formats Aug 11, 2025
Signed-off-by: Anuradha Karuppiah <[email protected]>
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enables custom dataset formats by adding a new EvalDatasetCustomConfig class that allows users to specify custom Python functions for dataset transformation. The implementation provides a standardized interface for custom parsers while maintaining compatibility with existing functionality.

  • Adds EvalDatasetCustomConfig class for custom dataset handling
  • Implements custom function loading and execution with error handling
  • Adds preprocessing utilities to apply standard filters and transformations
  • Includes example implementation and documentation

Reviewed Changes

Copilot reviewed 11 out of 14 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/aiq/data_models/dataset_handler.py Adds EvalDatasetCustomConfig class with dynamic function loading and includes it in the union type
src/aiq/eval/dataset_handler/dataset_handler.py Implements custom dataset handling logic and refactors preprocessing into reusable methods
examples/evaluation_and_profiling/simple_calculator_eval/src/aiq_simple_calculator_eval/scripts/custom_dataset_parser.py Provides example custom dataset parser function
examples/evaluation_and_profiling/simple_calculator_eval/src/aiq_simple_calculator_eval/configs/config-custom-dataset-format.yml Configuration example for custom dataset usage
docs/source/reference/evaluate.md Documents the custom dataset format feature

AnuradhaKaruppiah and others added 5 commits August 11, 2025 14:17
…iq_simple_calculator_eval/scripts/custom_dataset_parser.py

Co-authored-by: Copilot <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Co-authored-by: Copilot <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Co-authored-by: Copilot <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Co-authored-by: Copilot <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement to existing functionality non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA]: Eval dataset changes to work with more diverse workflows
1 participant