Allow passing a selection of books and chapters to the PT quote detector #236

pmachapman · 2025-10-02T03:08:11Z

Fixes #229.

This change is

codecov-commenter · 2025-10-02T03:12:54Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.93%. Comparing base (edcf6c4) to head (6fa5689).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #236      +/-   ##
==========================================
+ Coverage   90.91%   90.93%   +0.02%     
==========================================
  Files         337      337              
  Lines       21642    21724      +82     
==========================================
+ Hits        19675    19754      +79     
- Misses       1967     1970       +3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ddaspit

@ddaspit reviewed 5 of 5 files at r1, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @benjaminking and @Enkidu93)

tests/punctuation_analysis/test_paratext_project_quote_convention_detector.py line 7 at r1 (raw file):

from machine.corpora import ParatextProjectSettings, UsfmStylesheet
from machine.punctuation_analysis import ParatextProjectQuoteConventionDetector, QuoteConventionAnalysis
from machine.punctuation_analysis.quote_convention import QuoteConvention

You should import these from the machine.punctuation_analysis package.

tests/punctuation_analysis/test_paratext_project_quote_convention_detector.py line 10 at r1 (raw file):

from machine.punctuation_analysis.standard_quote_conventions import STANDARD_QUOTE_CONVENTIONS
from machine.scripture import ORIGINAL_VERSIFICATION, Versification
from machine.scripture.parse import parse_selection

You should import this from the machine.scripture package.

benjaminking

It would probably be good to update the unit tests for UsfmStructureExtractor to test the include_chapters parameter.

@benjaminking reviewed all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @Enkidu93 and @pmachapman)

Enkidu93

@Enkidu93 reviewed 5 of 5 files at r1, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @pmachapman)

tests/punctuation_analysis/test_paratext_project_quote_convention_detector.py line 64 at r1 (raw file):

    assert analysis is not None
    assert analysis.best_quote_convention_score > 0.66
    assert analysis.best_quote_convention.name == "standard_french"

Maybe extend this test to confirm that the quote convention is indeterminate with all chapters or a different mix?

pmachapman

It would probably be good to update the unit tests for UsfmStructureExtractor to test the include_chapters parameter.

@benjaminking Good idea. I've added two tests to cover what I think is the scope of the changes. Please let me know if you think there should be more.

Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @ddaspit and @Enkidu93)

tests/punctuation_analysis/test_paratext_project_quote_convention_detector.py line 7 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

You should import these from the machine.punctuation_analysis package.

Done. Thanks!

tests/punctuation_analysis/test_paratext_project_quote_convention_detector.py line 10 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

You should import this from the machine.scripture package.

Done. I had to modify scripture/__init__.py to allow this.

tests/punctuation_analysis/test_paratext_project_quote_convention_detector.py line 64 at r1 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

Maybe extend this test to confirm that the quote convention is indeterminate with all chapters or a different mix?

Done. Good idea.

benjaminking

@benjaminking reviewed 3 of 3 files at r2, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @ddaspit and @Enkidu93)

tests/punctuation_analysis/test_usfm_structure_extractor.py line 17 at r2 (raw file):

    usfm_structure_extractor.text(verse_text_parser_state, "test")

    actual_chapters = usfm_structure_extractor.get_chapters({41: [1]})

Super minor thing, but it took me a long time to realize why this test would result in no chapters, specifically that book numbers were zero-indexed. A different book number might make it more clear why the test should be failing. But don't feel like you have to change the test.

Enkidu93

@Enkidu93 reviewed 3 of 3 files at r2, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @ddaspit)

ddaspit

@ddaspit reviewed 3 of 3 files at r2, all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @pmachapman)

Enkidu93

Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @pmachapman)

machine/punctuation_analysis/text_segment.py line 81 at r2 (raw file):

        def set_chapter(self, number: str) -> "TextSegment.Builder":
            self._text_segment.chapter = int(number)

I'm OK with leaving it, but I think it might be a bit cleaner to have this take an int and have it parse the string as an int in the USFM parser.

Enkidu93

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @pmachapman)

machine/punctuation_analysis/paratext_project_quote_convention_detector.py line 20 at r2 (raw file):

    def get_quote_convention_analysis(
        self, handler: Optional[QuoteConventionDetector] = None, include_chapters: Optional[Dict[int, List[int]]] = None

Throughout - why a list of ints and not a set?

Enkidu93

Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @pmachapman)

machine/scripture/__init__.py line 56 at r2 (raw file):

    "NULL_VERSIFICATION",
    "ORIGINAL_VERSIFICATION",
    "parse_selection",

I think this was intentionally not exported because it's a 'private' python method, as it were. get_chapters() is the public method, I think. Could you use it instead?

Enkidu93

Sorry - while porting this, I've noticed a few things.

Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @pmachapman)

pmachapman

Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @Enkidu93)

machine/punctuation_analysis/paratext_project_quote_convention_detector.py line 20 at r2 (raw file):

Throughout - why a list of ints and not a set?

I used the same return type as get_chapters.

machine/punctuation_analysis/text_segment.py line 81 at r2 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

I'm OK with leaving it, but I think it might be a bit cleaner to have this take an int and have it parse the string as an int in the USFM parser.

Good idea - I have changed it.

machine/scripture/__init__.py line 56 at r2 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

I think this was intentionally not exported because it's a 'private' python method, as it were. get_chapters() is the public method, I think. Could you use it instead?

Done. I am not sure why I didn't see that.

tests/punctuation_analysis/test_usfm_structure_extractor.py line 17 at r2 (raw file):

Previously, benjaminking (Ben King) wrote…

Super minor thing, but it took me a long time to realize why this test would result in no chapters, specifically that book numbers were zero-indexed. A different book number might make it more clear why the test should be failing. But don't feel like you have to change the test.

I've changed it to Genesis and Exodus, as I the numbers are clearer for these books.

Enkidu93

@Enkidu93 reviewed 5 of 5 files at r3, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @pmachapman)

machine/punctuation_analysis/paratext_project_quote_convention_detector.py line 20 at r2 (raw file):

Previously, pmachapman (Peter Chapman) wrote…

Throughout - why a list of ints and not a set?

I used the same return type as get_chapters.

OK, sounds good.

machine/punctuation_analysis/text_segment.py line 81 at r2 (raw file):

Previously, pmachapman (Peter Chapman) wrote…

Good idea - I have changed it.

Great! Thanks.

machine/scripture/__init__.py line 56 at r2 (raw file):

Previously, pmachapman (Peter Chapman) wrote…

Done. I am not sure why I didn't see that.

Great - no worries

machine/punctuation_analysis/paratext_project_quote_convention_detector.py line 23 at r3 (raw file):

    ) -> Optional[QuoteConventionAnalysis]:
        handler = QuoteConventionDetector() if handler is None else handler
        for book_id in get_scripture_books():

Sorry - this is my mistake in the initial implementation. This function ought to return all scripture books, but it's actually only returning OT and NT books (not the DCs).

pmachapman

@pmachapman dismissed @Enkidu93 from a discussion.
Reviewable status: 7 of 8 files reviewed, 1 unresolved discussion (waiting on @benjaminking, @ddaspit, and @Enkidu93)

machine/punctuation_analysis/paratext_project_quote_convention_detector.py line 23 at r3 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

Sorry - this is my mistake in the initial implementation. This function ought to return all scripture books, but it's actually only returning OT and NT books (not the DCs).

Done. Thank you for spotting!

machine/scripture/__init__.py line 56 at r2 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

Great - no worries

Done.

Enkidu93

@Enkidu93 reviewed 1 of 1 files at r4, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @pmachapman)

machine/punctuation_analysis/paratext_project_quote_convention_detector.py line 23 at r3 (raw file):

Previously, pmachapman (Peter Chapman) wrote…

Done. Thank you for spotting!

Great - thank you!

pmachapman

Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @Enkidu93)

machine/punctuation_analysis/paratext_project_quote_convention_detector.py line 23 at r3 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

Great - thank you!

Done.

pmachapman

@pmachapman dismissed @Enkidu93 from a discussion.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @pmachapman)

pmachapman

@pmachapman reviewed all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @pmachapman)

* Port sillsdev/machine.py#236 * Add an overload to take a chapter-by-id dict for convenience in Serval

pmachapman requested review from Enkidu93 and ddaspit October 2, 2025 03:08

pmachapman force-pushed the quote_detector_filter_by_book branch from 114792f to 8fdb273 Compare October 2, 2025 03:10

pmachapman force-pushed the quote_detector_filter_by_book branch from 8fdb273 to f82cdb4 Compare October 2, 2025 03:18

Enkidu93 requested a review from benjaminking October 2, 2025 13:11

ddaspit requested changes Oct 2, 2025

View reviewed changes

benjaminking requested changes Oct 2, 2025

View reviewed changes

Enkidu93 requested changes Oct 2, 2025

View reviewed changes

pmachapman force-pushed the quote_detector_filter_by_book branch from f82cdb4 to 6a848c8 Compare October 5, 2025 21:24

pmachapman commented Oct 5, 2025

View reviewed changes

benjaminking approved these changes Oct 6, 2025

View reviewed changes

Enkidu93 approved these changes Oct 6, 2025

View reviewed changes

ddaspit approved these changes Oct 6, 2025

View reviewed changes

Enkidu93 mentioned this pull request Oct 6, 2025

Port book/chapter-level quote convention detection sillsdev/machine#348

Closed

Enkidu93 approved these changes Oct 6, 2025

View reviewed changes

Enkidu93 requested changes Oct 6, 2025

View reviewed changes

Enkidu93 reviewed Oct 6, 2025

View reviewed changes

pmachapman force-pushed the quote_detector_filter_by_book branch from 6a848c8 to b3f9010 Compare October 6, 2025 18:38

pmachapman commented Oct 6, 2025

View reviewed changes

Enkidu93 requested changes Oct 6, 2025

View reviewed changes

pmachapman commented Oct 6, 2025

View reviewed changes

Enkidu93 approved these changes Oct 6, 2025

View reviewed changes

pmachapman commented Oct 6, 2025

View reviewed changes

Enkidu93 added a commit to sillsdev/machine that referenced this pull request Oct 6, 2025

Port sillsdev/machine.py#236

9d729bb

pmachapman added 2 commits October 7, 2025 09:41

Allow passing a selection of books and chapters to the PT quote detector

5ce4d69

Fix get_scripture_books to return DCs

6fa5689

pmachapman force-pushed the quote_detector_filter_by_book branch from 2279f60 to 6fa5689 Compare October 6, 2025 20:41

pmachapman commented Oct 6, 2025

View reviewed changes

pmachapman merged commit 11d3611 into main Oct 6, 2025
14 checks passed

pmachapman deleted the quote_detector_filter_by_book branch October 6, 2025 21:25

Enkidu93 added a commit to sillsdev/machine that referenced this pull request Oct 8, 2025

Port sillsdev/machine.py#236

8022b1b

Enkidu93 added a commit to sillsdev/machine that referenced this pull request Oct 9, 2025

Filter quote convention analysis by books/chapter (#349)

333171a

* Port sillsdev/machine.py#236 * Add an overload to take a chapter-by-id dict for convenience in Serval

Uh oh!

Allow passing a selection of books and chapters to the PT quote detector #236

Allow passing a selection of books and chapters to the PT quote detector #236

Uh oh!

Conversation

pmachapman commented Oct 2, 2025 • edited by ddaspit Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ddaspit left a comment

Choose a reason for hiding this comment

Uh oh!

benjaminking left a comment

Choose a reason for hiding this comment

Uh oh!

Enkidu93 left a comment

Choose a reason for hiding this comment

Uh oh!

pmachapman left a comment

Choose a reason for hiding this comment

Uh oh!

benjaminking left a comment

Choose a reason for hiding this comment

Uh oh!

Enkidu93 left a comment

Choose a reason for hiding this comment

Uh oh!

ddaspit left a comment

Choose a reason for hiding this comment

Uh oh!

Enkidu93 left a comment

Choose a reason for hiding this comment

Uh oh!

Enkidu93 left a comment

Choose a reason for hiding this comment

Uh oh!

Enkidu93 left a comment

Choose a reason for hiding this comment

Uh oh!

Enkidu93 left a comment

Choose a reason for hiding this comment

Uh oh!

pmachapman left a comment

Choose a reason for hiding this comment

Uh oh!

Enkidu93 left a comment

Choose a reason for hiding this comment

Uh oh!

pmachapman left a comment

Choose a reason for hiding this comment

Uh oh!

Enkidu93 left a comment

Choose a reason for hiding this comment

Uh oh!

pmachapman left a comment

Choose a reason for hiding this comment

Uh oh!

pmachapman left a comment

Choose a reason for hiding this comment

Uh oh!

pmachapman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

pmachapman commented Oct 2, 2025 •

edited by ddaspit

Loading

codecov-commenter commented Oct 2, 2025 •

edited

Loading