Skip to content

Commit affd576

Browse files
example fixups
Signed-off-by: Anuradha Karuppiah <[email protected]>
1 parent 4e0879b commit affd576

File tree

6 files changed

+69
-29
lines changed

6 files changed

+69
-29
lines changed

docs/source/guides/evaluate-api.md

Whitespace-only changes.

docs/source/reference/evaluate.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -138,6 +138,42 @@ eval:
138138
- sympy__sympy-21055
139139
```
140140

141+
### Custom Dataset Format
142+
You can use a dataset of a custom format by providing a custom dataset parser function.
143+
144+
**Example:**
145+
`examples/evaluation_and_profiling/simple_calculator_eval/configs/config-custom-dataset-format.yml`:
146+
```yaml
147+
eval:
148+
general:
149+
dataset:
150+
_type: custom
151+
file_path: examples/evaluation_and_profiling/simple_calculator_eval/data/simple_calculator_nested.json
152+
function: aiq_simple_calculator_eval.custom_dataset_parser.extract_nested_questions
153+
kwargs:
154+
filter_by_tag: "important"
155+
max_rows: 5
156+
```
157+
This example uses a custom dataset parser function to extract the nested questions from the dataset, filter them by a tag and return only the first 5 questions.
158+
159+
The custom dataset parser function is a Python function that takes a dataset file path and returns an `EvalInput` object.
160+
161+
{py:class}`~aiq.eval.evaluator.evaluator_model.EvalInput` is a Pydantic model that contains the list of `EvalInputItem` objects.
162+
{py:class}`~aiq.eval.evaluator.evaluator_model.EvalInputItem` is a Pydantic model that contains the fields for an item in the dataset.
163+
The custom dataset parser function should fill the following fields in the `EvalInputItem` object:
164+
- `id`: The id of the item. Every item in the dataset must have a unique id of type `str` or `int`.
165+
- `input_obj`: This is the question.
166+
- `expected_output_obj`: This is the ground truth answer.
167+
- `output_obj`: This is the generated answer. This can be an empty string if it needs to be filled by running the workflow.
168+
- `expected_trajectory`: This is the expected trajectory. This can be an empty list if it needs to be filled by running the workflow.
169+
- `trajectory`: This is the trajectory. This can be an empty list if it needs to be filled by running the workflow.
170+
- `full_dataset_entry`: This is the entire dataset entry. This is passed to the evaluator.
171+
172+
To run the evaluation, run the following command:
173+
```bash
174+
aiq eval --config_file=examples/evaluation_and_profiling/simple_calculator_eval/configs/config-custom-dataset-format.yml
175+
```
176+
141177
## NeMo Agent Toolkit Built-in Evaluators
142178
NeMo Agent toolkit provides the following built-in evaluator:
143179
- `ragas` - An evaluator to run and evaluate RAG-like workflows using the public RAGAS API.

examples/evaluation_and_profiling/simple_calculator_eval/pyproject.toml

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,6 @@
22
build-backend = "setuptools.build_meta"
33
requires = ["setuptools >= 64", "setuptools-scm>=8"]
44

5-
[tool.setuptools]
6-
packages = []
7-
85
[tool.setuptools_scm]
96
root = "../../.."
107

examples/evaluation_and_profiling/simple_calculator_eval/src/aiq_simple_calculator_eval/configs/config-custom-dataset-format.yml

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,11 @@ llms:
4949
model_name: meta/llama-3.3-70b-instruct
5050
temperature: 0.2
5151
max_tokens: 2048
52+
eval_llm:
53+
_type: nim
54+
model_name: mistralai/mixtral-8x22b-instruct-v0.1
55+
temperature: 0.0
56+
max_tokens: 1024
5257

5358
workflow:
5459
_type: react_agent
@@ -70,9 +75,9 @@ eval:
7075
dataset:
7176
_type: custom
7277
file_path: examples/evaluation_and_profiling/simple_calculator_eval/data/simple_calculator_nested.json
73-
function: aiq_simple_calculator_eval.custom_dataset_parser.extract_nested_questions
78+
function: aiq_simple_calculator_eval.scripts.custom_dataset_parser.extract_nested_questions
7479
kwargs:
75-
filter_by_tag: "important"
80+
difficulty: "medium"
7681
max_rows: 5
7782

7883
evaluators:
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: Apache-2.0
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
Lines changed: 12 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -20,9 +20,14 @@
2020
from aiq.eval.evaluator.evaluator_model import EvalInputItem
2121

2222

23-
def extract_nested_questions(input_path: Path, filter_by_tag: str = None, max_rows: int = None, **kwargs) -> EvalInput:
23+
def extract_nested_questions(input_path: Path, difficulty: str = None, max_rows: int = None) -> EvalInput:
2424
"""
25-
Extract questions from a nested JSON structure with optional filtering.
25+
This is a sample custom dataset parser that:
26+
1. Loads a nested JSON file
27+
2. Extracts the questions array from the nested structure
28+
3. Applies optional filtering by difficulty (hard, medium, easy)
29+
4. Applies an optional maximum number of questions to return
30+
5. Creates an EvalInput object with the extracted questions and returns it
2631
2732
Expects JSON format:
2833
{
@@ -36,9 +41,8 @@ def extract_nested_questions(input_path: Path, filter_by_tag: str = None, max_ro
3641
3742
Args:
3843
input_path: Path to the nested JSON file
39-
filter_by_tag: Optional tag to filter questions by (matches against category or difficulty)
44+
difficulty: Optional difficulty to filter questions by
4045
max_rows: Optional maximum number of questions to return
41-
**kwargs: Additional parameters (unused in this example)
4246
4347
Returns:
4448
EvalInput object containing the extracted questions
@@ -50,17 +54,13 @@ def extract_nested_questions(input_path: Path, filter_by_tag: str = None, max_ro
5054

5155
# Extract questions array from the nested structure
5256
questions = data.get('questions', [])
53-
metadata = data.get('metadata', {})
54-
configuration = data.get('configuration', {})
5557

5658
# Apply filtering if specified
57-
if filter_by_tag:
59+
if difficulty:
5860
filtered_questions = []
5961
for question in questions:
60-
# Check if filter_by_tag matches category, difficulty, or any other field
61-
if (question.get('category', '').lower() == filter_by_tag.lower()
62-
or question.get('difficulty', '').lower() == filter_by_tag.lower()
63-
or filter_by_tag.lower() in str(question).lower()):
62+
# Check if category matches category (hard, medium, easy)
63+
if (question.get('difficulty', '').lower() == difficulty.lower()):
6464
filtered_questions.append(question)
6565
questions = filtered_questions
6666

@@ -71,26 +71,14 @@ def extract_nested_questions(input_path: Path, filter_by_tag: str = None, max_ro
7171
eval_items = []
7272

7373
for item in questions:
74-
# Create EvalInputItem with additional metadata in full_dataset_entry
75-
full_entry = {
76-
**item, # Include original question data
77-
'dataset_metadata': metadata,
78-
'dataset_configuration': configuration,
79-
'processing_info': {
80-
'filtered_by_tag': filter_by_tag,
81-
'max_rows_applied': max_rows,
82-
'total_questions_in_dataset': len(data.get('questions', []))
83-
}
84-
}
85-
8674
eval_item = EvalInputItem(
8775
id=item['id'],
8876
input_obj=item['question'],
8977
expected_output_obj=item['answer'],
9078
output_obj="", # Will be filled by workflow
9179
expected_trajectory=[],
9280
trajectory=[], # Will be filled by workflow
93-
full_dataset_entry=full_entry)
81+
full_dataset_entry=item)
9482
eval_items.append(eval_item)
9583

9684
return EvalInput(eval_input_items=eval_items)

0 commit comments

Comments
 (0)