Skip to content

Conversation

@pmachapman
Copy link
Collaborator

@pmachapman pmachapman commented Oct 2, 2025

Fixes #229.


This change is Reviewable

@pmachapman pmachapman requested review from Enkidu93 and ddaspit October 2, 2025 03:08
@pmachapman pmachapman force-pushed the quote_detector_filter_by_book branch from 114792f to 8fdb273 Compare October 2, 2025 03:10
@codecov-commenter
Copy link

codecov-commenter commented Oct 2, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.93%. Comparing base (edcf6c4) to head (6fa5689).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #236      +/-   ##
==========================================
+ Coverage   90.91%   90.93%   +0.02%     
==========================================
  Files         337      337              
  Lines       21642    21724      +82     
==========================================
+ Hits        19675    19754      +79     
- Misses       1967     1970       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@pmachapman pmachapman force-pushed the quote_detector_filter_by_book branch from 8fdb273 to f82cdb4 Compare October 2, 2025 03:18
@Enkidu93 Enkidu93 requested a review from benjaminking October 2, 2025 13:11
Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ddaspit reviewed 5 of 5 files at r1, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @benjaminking and @Enkidu93)


tests/punctuation_analysis/test_paratext_project_quote_convention_detector.py line 7 at r1 (raw file):

from machine.corpora import ParatextProjectSettings, UsfmStylesheet
from machine.punctuation_analysis import ParatextProjectQuoteConventionDetector, QuoteConventionAnalysis
from machine.punctuation_analysis.quote_convention import QuoteConvention

You should import these from the machine.punctuation_analysis package.


tests/punctuation_analysis/test_paratext_project_quote_convention_detector.py line 10 at r1 (raw file):

from machine.punctuation_analysis.standard_quote_conventions import STANDARD_QUOTE_CONVENTIONS
from machine.scripture import ORIGINAL_VERSIFICATION, Versification
from machine.scripture.parse import parse_selection

You should import this from the machine.scripture package.

Copy link
Collaborator

@benjaminking benjaminking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would probably be good to update the unit tests for UsfmStructureExtractor to test the include_chapters parameter.

@benjaminking reviewed all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @Enkidu93 and @pmachapman)

Copy link
Collaborator

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Enkidu93 reviewed 5 of 5 files at r1, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @pmachapman)


tests/punctuation_analysis/test_paratext_project_quote_convention_detector.py line 64 at r1 (raw file):

    assert analysis is not None
    assert analysis.best_quote_convention_score > 0.66
    assert analysis.best_quote_convention.name == "standard_french"

Maybe extend this test to confirm that the quote convention is indeterminate with all chapters or a different mix?

@pmachapman pmachapman force-pushed the quote_detector_filter_by_book branch from f82cdb4 to 6a848c8 Compare October 5, 2025 21:24
Copy link
Collaborator Author

@pmachapman pmachapman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would probably be good to update the unit tests for UsfmStructureExtractor to test the include_chapters parameter.

@benjaminking Good idea. I've added two tests to cover what I think is the scope of the changes. Please let me know if you think there should be more.

Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @ddaspit and @Enkidu93)


tests/punctuation_analysis/test_paratext_project_quote_convention_detector.py line 7 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

You should import these from the machine.punctuation_analysis package.

Done. Thanks!


tests/punctuation_analysis/test_paratext_project_quote_convention_detector.py line 10 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

You should import this from the machine.scripture package.

Done. I had to modify scripture/__init__.py to allow this.


tests/punctuation_analysis/test_paratext_project_quote_convention_detector.py line 64 at r1 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

Maybe extend this test to confirm that the quote convention is indeterminate with all chapters or a different mix?

Done. Good idea.

Copy link
Collaborator

@benjaminking benjaminking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

@benjaminking reviewed 3 of 3 files at r2, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @ddaspit and @Enkidu93)


tests/punctuation_analysis/test_usfm_structure_extractor.py line 17 at r2 (raw file):

    usfm_structure_extractor.text(verse_text_parser_state, "test")

    actual_chapters = usfm_structure_extractor.get_chapters({41: [1]})

Super minor thing, but it took me a long time to realize why this test would result in no chapters, specifically that book numbers were zero-indexed. A different book number might make it more clear why the test should be failing. But don't feel like you have to change the test.

Copy link
Collaborator

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

@Enkidu93 reviewed 3 of 3 files at r2, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @ddaspit)

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

@ddaspit reviewed 3 of 3 files at r2, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @pmachapman)

Copy link
Collaborator

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @pmachapman)


machine/punctuation_analysis/text_segment.py line 81 at r2 (raw file):

        def set_chapter(self, number: str) -> "TextSegment.Builder":
            self._text_segment.chapter = int(number)

I'm OK with leaving it, but I think it might be a bit cleaner to have this take an int and have it parse the string as an int in the USFM parser.

Copy link
Collaborator

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @pmachapman)


machine/punctuation_analysis/paratext_project_quote_convention_detector.py line 20 at r2 (raw file):

    def get_quote_convention_analysis(
        self, handler: Optional[QuoteConventionDetector] = None, include_chapters: Optional[Dict[int, List[int]]] = None

Throughout - why a list of ints and not a set?

Copy link
Collaborator

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @pmachapman)


machine/scripture/__init__.py line 56 at r2 (raw file):

    "NULL_VERSIFICATION",
    "ORIGINAL_VERSIFICATION",
    "parse_selection",

I think this was intentionally not exported because it's a 'private' python method, as it were. get_chapters() is the public method, I think. Could you use it instead?

Copy link
Collaborator

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry - while porting this, I've noticed a few things.

Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @pmachapman)

@pmachapman pmachapman force-pushed the quote_detector_filter_by_book branch from 6a848c8 to b3f9010 Compare October 6, 2025 18:38
Copy link
Collaborator Author

@pmachapman pmachapman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @Enkidu93)


machine/punctuation_analysis/paratext_project_quote_convention_detector.py line 20 at r2 (raw file):

Throughout - why a list of ints and not a set?

I used the same return type as get_chapters.


machine/punctuation_analysis/text_segment.py line 81 at r2 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

I'm OK with leaving it, but I think it might be a bit cleaner to have this take an int and have it parse the string as an int in the USFM parser.

Good idea - I have changed it.


machine/scripture/__init__.py line 56 at r2 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

I think this was intentionally not exported because it's a 'private' python method, as it were. get_chapters() is the public method, I think. Could you use it instead?

Done. I am not sure why I didn't see that.


tests/punctuation_analysis/test_usfm_structure_extractor.py line 17 at r2 (raw file):

Previously, benjaminking (Ben King) wrote…

Super minor thing, but it took me a long time to realize why this test would result in no chapters, specifically that book numbers were zero-indexed. A different book number might make it more clear why the test should be failing. But don't feel like you have to change the test.

I've changed it to Genesis and Exodus, as I the numbers are clearer for these books.

Copy link
Collaborator

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Enkidu93 reviewed 5 of 5 files at r3, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @pmachapman)


machine/punctuation_analysis/paratext_project_quote_convention_detector.py line 20 at r2 (raw file):

Previously, pmachapman (Peter Chapman) wrote…

Throughout - why a list of ints and not a set?

I used the same return type as get_chapters.

OK, sounds good.


machine/punctuation_analysis/text_segment.py line 81 at r2 (raw file):

Previously, pmachapman (Peter Chapman) wrote…

Good idea - I have changed it.

Great! Thanks.


machine/scripture/__init__.py line 56 at r2 (raw file):

Previously, pmachapman (Peter Chapman) wrote…

Done. I am not sure why I didn't see that.

Great - no worries


machine/punctuation_analysis/paratext_project_quote_convention_detector.py line 23 at r3 (raw file):

    ) -> Optional[QuoteConventionAnalysis]:
        handler = QuoteConventionDetector() if handler is None else handler
        for book_id in get_scripture_books():

Sorry - this is my mistake in the initial implementation. This function ought to return all scripture books, but it's actually only returning OT and NT books (not the DCs).

Copy link
Collaborator Author

@pmachapman pmachapman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmachapman dismissed @Enkidu93 from a discussion.
Reviewable status: 7 of 8 files reviewed, 1 unresolved discussion (waiting on @benjaminking, @ddaspit, and @Enkidu93)


machine/punctuation_analysis/paratext_project_quote_convention_detector.py line 23 at r3 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

Sorry - this is my mistake in the initial implementation. This function ought to return all scripture books, but it's actually only returning OT and NT books (not the DCs).

Done. Thank you for spotting!


machine/scripture/__init__.py line 56 at r2 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

Great - no worries

Done.

Copy link
Collaborator

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

@Enkidu93 reviewed 1 of 1 files at r4, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @pmachapman)


machine/punctuation_analysis/paratext_project_quote_convention_detector.py line 23 at r3 (raw file):

Previously, pmachapman (Peter Chapman) wrote…

Done. Thank you for spotting!

Great - thank you!

Copy link
Collaborator Author

@pmachapman pmachapman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @Enkidu93)


machine/punctuation_analysis/paratext_project_quote_convention_detector.py line 23 at r3 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

Great - thank you!

Done.

Copy link
Collaborator Author

@pmachapman pmachapman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmachapman dismissed @Enkidu93 from a discussion.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @pmachapman)

Enkidu93 added a commit to sillsdev/machine that referenced this pull request Oct 6, 2025
@pmachapman pmachapman force-pushed the quote_detector_filter_by_book branch from 2279f60 to 6fa5689 Compare October 6, 2025 20:41
Copy link
Collaborator Author

@pmachapman pmachapman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmachapman reviewed all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @pmachapman)

@pmachapman pmachapman merged commit 11d3611 into main Oct 6, 2025
14 checks passed
@pmachapman pmachapman deleted the quote_detector_filter_by_book branch October 6, 2025 21:25
Enkidu93 added a commit to sillsdev/machine that referenced this pull request Oct 8, 2025
Enkidu93 added a commit to sillsdev/machine that referenced this pull request Oct 9, 2025
* Port sillsdev/machine.py#236

* Add an overload to take a chapter-by-id dict for convenience in Serval
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pass a selection of books/chapters to ParatextProjectQuoteConventionDetector

6 participants