Skip to content

Commit 2e660ef

Browse files
authored
Deep clean of run examples and automate result updating via GH Actions (#853)
* Major clean up of test results and remove non determinism Signed-off-by: Jing Chen <[email protected]> * Workflow test Signed-off-by: Jing Chen <[email protected]> * Test1 Signed-off-by: Jing Chen <[email protected]> * Test2 Signed-off-by: Jing Chen <[email protected]> * Test 4 Signed-off-by: Jing Chen <[email protected]> * Test 4 Signed-off-by: Jing Chen <[email protected]> * Test 5 Signed-off-by: Jing Chen <[email protected]> * GH Action updated results file when running examples Sat Mar 29 19:11:53 UTC 2025 Signed-off-by: Jing Chen <[email protected]> * Test 6 Signed-off-by: Jing Chen <[email protected]> * Update files Signed-off-by: Jing Chen <[email protected]> * Update run examples Signed-off-by: Jing Chen <[email protected]> * github-actions[bot]: Updated results file when running examples on Sat Mar 29 21:17:59 UTC 2025 Signed-off-by: Jing Chen <[email protected]> * Final test run Signed-off-by: Jing Chen <[email protected]> * Sign off GHA commit Signed-off-by: Jing Chen <[email protected]> --------- Signed-off-by: Jing Chen <[email protected]>
1 parent 132ecc2 commit 2e660ef

File tree

238 files changed

+946
-2010
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

238 files changed

+946
-2010
lines changed

.github/workflows/run-examples.yml

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,13 +34,14 @@ jobs:
3434
shell: bash
3535
run: |
3636
ollama pull granite3.2:2b
37+
ollama pull granite3.2:8b
3738
ollama pull mxbai-embed-large
3839
ollama list
3940
4041
- name: Check that all required models are available
4142
shell: bash
4243
run: |
43-
models=("mxbai-embed-large" "granite3.2:2b")
44+
models=("mxbai-embed-large" "granite3.2:2b" "granite3.2:8b")
4445
missing=0
4546
for model in "${models[@]}"; do
4647
if ! ollama list | awk 'NR>1 {print $1}' | grep -q "$model"; then
@@ -63,6 +64,8 @@ jobs:
6364
6465
# Run tests
6566
- uses: actions/checkout@v4
67+
with:
68+
ref: ${{ github.head_ref }}
6669
- name: Set up Python ${{ matrix.python-version }}
6770
uses: actions/setup-python@v5
6871
with:
@@ -91,4 +94,14 @@ jobs:
9194
WATSONX_APIKEY: ${{ secrets.WATSONX_APIKEY }}
9295
WATSONX_URL: ${{ secrets.WATSONX_URL }}
9396
REPLICATE_API_TOKEN: ${{ secrets.REPLICATE_API_TOKEN }}
97+
OLLAMA_GHACTIONS_RESULTS: true
9498
run: py.test -v --capture=tee-sys -rfE -s tests/test_examples_run.py
99+
- name: Update example result files (if any) generated from Ollama running on GH Actions
100+
if: matrix.python-version == '3.11'
101+
run: |
102+
git config --local user.name github-actions[bot]
103+
git config --local user.email "${{ github.actor_id }}+${{ github.actor }}@users.noreply.github.com"
104+
git status
105+
git add tests/results/
106+
git diff --cached --quiet || git commit -S -s -m "github-actions[bot]: Updated results file when running examples on $(date)"
107+
git push

examples/chatbot/chatbot.pdl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ text:
55
message: "What is your query?\n"
66
- repeat:
77
text:
8-
# Send context to Granite model hosted at replicate.com
8+
# Send context to Granite model hosted at ollama
99
- model: ollama_chat/granite3.2:2b
1010
# Allow the user to type 'yes', 'no', or anything else, storing
1111
# the input into a variable named `eval`. The input is also implicitly

examples/cldk/cldk-assistant.pdl

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
description: CodeLLM-Devkit Assistant
2-
text:
2+
text:
33
- read:
44
def: project
55
message: "Please enter the path to your Java project:\n"
@@ -34,9 +34,9 @@ text:
3434
contribute: []
3535
- "\n***Generating PDL code for your query:\n"
3636
- if: ${ query != 'quit'}
37-
then:
37+
then:
3838
text:
39-
- model: replicate/ibm-granite/granite-3.1-8b-instruct
39+
- model: ollama_chat/granite3.2:8b
4040
def: PDL
4141
input: |
4242
Question: What are all the classes?
@@ -86,7 +86,7 @@ text:
8686
text:
8787
- lang: python
8888
code: |
89-
graph = PDL_SESSION.cldk_state.get_class_call_graph("org.ibm.App", method_name=None)
89+
graph = PDL_SESSION.cldk_state.get_class_call_graph("org.ibm.App", method_name=None)
9090
result = graph
9191
```
9292

@@ -109,7 +109,7 @@ text:
109109
method = PDL_SESSION.cldk_state.get_method("org.ibm.App", "Foo(string)")
110110
result = method
111111
- "\n\nGenerate a summary of method Foo\n\n"
112-
- model: replicate/ibm-granite/granite-3.1-8b-instruct
112+
- model: ollama_chat/granite3.2:8b
113113
```
114114

115115
Question: Generate a different comment for method Foo(string) in class org.ibm.App?
@@ -121,11 +121,11 @@ text:
121121
method = PDL_SESSION.cldk_state.get_method("org.ibm.App", "Foo(string)")
122122
result = method
123123
- "\n\nGenerate a different comment for method Foo(string)\n\n"
124-
- model: replicate/ibm-granite/granite-3.1-8b-instruct
124+
- model: ollama_chat/granite3.2:8b
125125
```
126126

127127
If the query contains something about a field be sure to call a model.
128-
128+
129129
Question: ${ query }
130130

131131

@@ -135,10 +135,10 @@ text:
135135
- "\n\n***Executing the above PDL code:\n\n"
136136
- lang: python
137137
contribute: [result]
138-
code: |
138+
code: |
139139
from pdl.pdl import exec_str
140140
s = """${ PDL }"""
141141
pdl = s.split("```")[1]
142142
result = exec_str(pdl)
143-
143+
144144
until: ${ query == 'quit' }

examples/demo/10-sdg.pdl

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
defs:
22
teacher_sys_prompt: You are a very knowledgeable AI Assistant that will faithfully assist the user with their task.
3-
teacher_model: replicate/ibm-granite/granite-3.1-8b-instruct
3+
teacher_model: ollama_chat/granite3.2:8b
44
teacher_template:
55
function:
66
sys_prompt: str
@@ -29,13 +29,13 @@ defs:
2929
* The questions should not be template-based or generic, it should be very diverse.
3030
* Simply return the questions, do not return any answers or explanations.
3131
* Strictly adhere to the prompt and generate responses in the same style and format as the example.
32-
Use this format to generate the questions:
33-
### Question 1:
32+
Use this format to generate the questions:
33+
### Question 1:
3434
examples: |
3535
To better assist you with this task, here is an example:
3636
### Question 1: ${icl_question}
3737
generation: |
38-
Now generate ${num_samples} such questions, remember to follow the principles mentioned above and use the same format as the examples. Remember to use the same style and format as the example above.
38+
Now generate ${num_samples} such questions, remember to follow the principles mentioned above and use the same format as the examples. Remember to use the same style and format as the example above.
3939
max_new_tokens: 10000
4040

4141
gen_questions_freeform_inner:
@@ -203,7 +203,7 @@ defs:
203203
spec: {introduction: str, principles: str, examples: str, generation: str, max_new_tokens: int, additional_stop_tokens: [str]}
204204
return:
205205
data:
206-
introduction: Your task is to faithfully follow the user's prompt and generate a response.
206+
introduction: Your task is to faithfully follow the user's prompt and generate a response.
207207
principles: |
208208
Please follow these guiding principles when generating responses:
209209
* Use proper grammar and punctuation.
@@ -299,7 +299,7 @@ defs:
299299
introduction: |
300300
Please act as an impartial judge and evaluate the quality of the answer provided by an AI assistant to the questions displayed below. Evaluate whether or not the answer is a good example of how AI Assistant should respond to the user's instruction. Please assign a score using the following 3-point scale.
301301
principles: |
302-
1: It means the answer is incorrect, irrelevant, unsafe or provides incomplete and garbage information. For instance, the answer may be factually wrong, off-topic, or filled with irrelevant content that doesn't address the user's question or it could be incomplete and hanging. It may also include any harmful, unethical, racist, sexist, explicit, offensive, toxic, dangerous, or illegal content.
302+
1: It means the answer is incorrect, irrelevant, unsafe or provides incomplete and garbage information. For instance, the answer may be factually wrong, off-topic, or filled with irrelevant content that doesn't address the user's question or it could be incomplete and hanging. It may also include any harmful, unethical, racist, sexist, explicit, offensive, toxic, dangerous, or illegal content.
303303

304304
2: It means the answer provides the correct answer, but it is brief and to the point without explanations. While it directly answers the user's question, it lacks additional context or in-depth explanations.
305305

@@ -401,7 +401,7 @@ text:
401401
- def: qa_pairs
402402
call: ${gen_answers}
403403
args:
404-
questions: ${filtered_questions}
404+
questions: ${filtered_questions}
405405
- "\n\n----- Filtering QA pairs -----\n\n"
406406
- call: ${filter_question_answer_pair}
407407
args:

examples/demo/8-tools.pdl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ text:
1717
contribute: [context]
1818
- "Out of 1400 participants, 400 passed the test. What percentage is that?\n"
1919
- def: actions
20-
model: replicate/ibm-granite/granite-3.1-8b-instruct
20+
model: ollama_chat/granite3.2:8b
2121
parser: json
2222
spec: [{ name: str, arguments: { expr: str }}]
2323
parameters:

examples/demo/9-react.pdl

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -63,12 +63,12 @@ text:
6363
- repeat:
6464
text:
6565
- def: thought
66-
model: replicate/ibm-granite/granite-3.1-8b-instruct
66+
model: ollama_chat/granite3.2:8b
6767
parameters:
6868
stop_sequences: "Action:"
6969
- "Action:\n"
7070
- def: action
71-
model: replicate/ibm-granite/granite-3.1-8b-instruct
71+
model: ollama_chat/granite3.2:8b
7272
parameters:
7373
stop_sequences: "\n"
7474
parser: json

examples/react/demo.pdl

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -63,12 +63,12 @@ text:
6363
- repeat:
6464
text:
6565
- def: thought
66-
model: replicate/ibm-granite/granite-3.1-8b-instruct
66+
model: ollama_chat/granite3.2:8b
6767
parameters:
6868
stop_sequences: "Action:"
6969
- "Action:\n"
7070
- def: action
71-
model: replicate/ibm-granite/granite-3.1-8b-instruct
71+
model: ollama_chat/granite3.2:8b
7272
parameters:
7373
stop_sequences: "\n"
7474
parser: json

examples/react/react_call.pdl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,6 @@ text:
55
- call: ${ lib.react }
66
args:
77
question: How many years ago was the discoverer of the Hudson River born? Keep in mind we are in 2025.
8-
model: replicate/ibm-granite/granite-3.1-8b-instruct
8+
model: ollama_chat/granite3.2:8b
99

1010

examples/react/react_fun.pdl

Lines changed: 13 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -13,13 +13,13 @@ defs:
1313
- name: Calc
1414
description: Calculator function
1515
arguments:
16-
expr:
16+
expr:
1717
type: string
1818
description: Arithmetic expression to calculate
1919
- name: Search
2020
description: Wikipedia search
2121
arguments:
22-
topic:
22+
topic:
2323
type: string
2424
description: Topic to search
2525
- for:
@@ -46,15 +46,17 @@ defs:
4646
- def: thought
4747
model: ${ model }
4848
parameters:
49+
temperature: 0
4950
stop_sequences: "Action:"
5051
- "Action:\n"
5152
- def: action
5253
model: ${ model }
5354
parameters:
55+
temperature: 0
5456
stop_sequences: "\n"
5557
parser: json
5658
- if: ${ action != prev_action}
57-
then:
59+
then:
5860
def: observation
5961
if: ${ action[0].name == "Search" }
6062
then:
@@ -85,39 +87,39 @@ defs:
8587
contribute: []
8688
data: ${ action }
8789
until: ${ action[0].name == "Finish" or exit }
88-
90+
8991
react:
9092
function:
9193
question: str
9294
model: str
9395
return:
94-
defs:
96+
defs:
9597
examples:
9698
array:
97-
- text:
99+
- text:
98100
|
99101
What profession does Nicholas Ray and Elia Kazan have in common?
100102
Thought: I need to search Nicholas Ray and Elia Kazan, find their professions, then find the profession they have in common.
101-
Action:
103+
Action:
102104
<tool_call>[{"name": "Search", "arguments": {"topic": "Nicholas Ray"}}]
103105
Observation: Nicholas Ray (born Raymond Nicholas Kienzle Jr., August 7, 1911 - June 16, 1979) was an American film director, screenwriter, and actor best known for the 1955 film Rebel Without a Cause.
104106
Thought: Professions of Nicholas Ray are director, screenwriter, and actor. I need to search Elia Kazan next and find his professions.
105-
Action:
107+
Action:
106108
<tool_call>[{"name": "Search", "arguments": {"topic": "Elia Kazan"}}]
107109
Observation: Elia Kazan was an American film and theatre director, producer, screenwriter and actor.
108110
Thought: Professions of Elia Kazan are director, producer, screenwriter, and actor. So profession Nicholas Ray and Elia Kazan have in common is director, screenwriter, and actor.
109-
Action:
111+
Action:
110112
<tool_call>[{"name": "Finish", "arguments": {"topic": "director, screenwriter, actor"}}]
111113

112114

113115
What is the elevation range for the area that the eastern sector of the Colorado orogeny extends into?
114116
Thought: I need to search Colorado orogeny, find the area that the eastern sector of the Colorado ...
115-
Action:
117+
Action:
116118
<tool_call>[{"name": "Search", "arguments": {"topic": "Colorado orogeny"}}]
117119
Observation: The Colorado orogeny was an episode of mountain building (an orogeny) ...
118120
Thought: It does not mention the eastern sector. So I need to look up eastern sector.
119121
Thought: High Plains rise in elevation from around 1,800 to 7,000 ft, so the answer is 1,800 to 7,000 ft.
120-
Action:
122+
Action:
121123
<tool_call>[{"name": "Finish", "arguments": {"topic": "1,800 to 7,000 ft"}}]
122124

123125
call: ${ react_inner }

examples/sdk/hello_dict.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
"text": [
55
"Hello\n",
66
{
7-
"model": "replicate/ibm-granite/granite-3.1-8b-instruct",
7+
"model": "ollama_chat/granite3.2:8b",
88
"parameters": {"stop_sequences": "!"},
99
},
1010
]

0 commit comments

Comments
 (0)