-
-
Notifications
You must be signed in to change notification settings - Fork 3
Allow passing a selection of books and chapters to the PT quote detector #236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
114792f to
8fdb273
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #236 +/- ##
==========================================
+ Coverage 90.91% 90.93% +0.02%
==========================================
Files 337 337
Lines 21642 21724 +82
==========================================
+ Hits 19675 19754 +79
- Misses 1967 1970 +3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
8fdb273 to
f82cdb4
Compare
ddaspit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ddaspit reviewed 5 of 5 files at r1, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @benjaminking and @Enkidu93)
tests/punctuation_analysis/test_paratext_project_quote_convention_detector.py line 7 at r1 (raw file):
from machine.corpora import ParatextProjectSettings, UsfmStylesheet from machine.punctuation_analysis import ParatextProjectQuoteConventionDetector, QuoteConventionAnalysis from machine.punctuation_analysis.quote_convention import QuoteConvention
You should import these from the machine.punctuation_analysis package.
tests/punctuation_analysis/test_paratext_project_quote_convention_detector.py line 10 at r1 (raw file):
from machine.punctuation_analysis.standard_quote_conventions import STANDARD_QUOTE_CONVENTIONS from machine.scripture import ORIGINAL_VERSIFICATION, Versification from machine.scripture.parse import parse_selection
You should import this from the machine.scripture package.
benjaminking
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would probably be good to update the unit tests for UsfmStructureExtractor to test the include_chapters parameter.
@benjaminking reviewed all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @Enkidu93 and @pmachapman)
Enkidu93
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Enkidu93 reviewed 5 of 5 files at r1, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @pmachapman)
tests/punctuation_analysis/test_paratext_project_quote_convention_detector.py line 64 at r1 (raw file):
assert analysis is not None assert analysis.best_quote_convention_score > 0.66 assert analysis.best_quote_convention.name == "standard_french"
Maybe extend this test to confirm that the quote convention is indeterminate with all chapters or a different mix?
f82cdb4 to
6a848c8
Compare
pmachapman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would probably be good to update the unit tests for UsfmStructureExtractor to test the
include_chaptersparameter.
@benjaminking Good idea. I've added two tests to cover what I think is the scope of the changes. Please let me know if you think there should be more.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @ddaspit and @Enkidu93)
tests/punctuation_analysis/test_paratext_project_quote_convention_detector.py line 7 at r1 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
You should import these from the
machine.punctuation_analysispackage.
Done. Thanks!
tests/punctuation_analysis/test_paratext_project_quote_convention_detector.py line 10 at r1 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
You should import this from the
machine.scripturepackage.
Done. I had to modify scripture/__init__.py to allow this.
tests/punctuation_analysis/test_paratext_project_quote_convention_detector.py line 64 at r1 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
Maybe extend this test to confirm that the quote convention is indeterminate with all chapters or a different mix?
Done. Good idea.
benjaminking
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benjaminking reviewed 3 of 3 files at r2, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @ddaspit and @Enkidu93)
tests/punctuation_analysis/test_usfm_structure_extractor.py line 17 at r2 (raw file):
usfm_structure_extractor.text(verse_text_parser_state, "test") actual_chapters = usfm_structure_extractor.get_chapters({41: [1]})
Super minor thing, but it took me a long time to realize why this test would result in no chapters, specifically that book numbers were zero-indexed. A different book number might make it more clear why the test should be failing. But don't feel like you have to change the test.
Enkidu93
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Enkidu93 reviewed 3 of 3 files at r2, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @ddaspit)
ddaspit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ddaspit reviewed 3 of 3 files at r2, all commit messages.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on @pmachapman)
Enkidu93
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @pmachapman)
machine/punctuation_analysis/text_segment.py line 81 at r2 (raw file):
def set_chapter(self, number: str) -> "TextSegment.Builder": self._text_segment.chapter = int(number)
I'm OK with leaving it, but I think it might be a bit cleaner to have this take an int and have it parse the string as an int in the USFM parser.
Enkidu93
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @pmachapman)
machine/punctuation_analysis/paratext_project_quote_convention_detector.py line 20 at r2 (raw file):
def get_quote_convention_analysis( self, handler: Optional[QuoteConventionDetector] = None, include_chapters: Optional[Dict[int, List[int]]] = None
Throughout - why a list of ints and not a set?
Enkidu93
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @pmachapman)
machine/scripture/__init__.py line 56 at r2 (raw file):
"NULL_VERSIFICATION", "ORIGINAL_VERSIFICATION", "parse_selection",
I think this was intentionally not exported because it's a 'private' python method, as it were. get_chapters() is the public method, I think. Could you use it instead?
Enkidu93
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry - while porting this, I've noticed a few things.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @pmachapman)
6a848c8 to
b3f9010
Compare
pmachapman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @Enkidu93)
machine/punctuation_analysis/paratext_project_quote_convention_detector.py line 20 at r2 (raw file):
Throughout - why a list of ints and not a set?
I used the same return type as get_chapters.
machine/punctuation_analysis/text_segment.py line 81 at r2 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
I'm OK with leaving it, but I think it might be a bit cleaner to have this take an int and have it parse the string as an int in the USFM parser.
Good idea - I have changed it.
machine/scripture/__init__.py line 56 at r2 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
I think this was intentionally not exported because it's a 'private' python method, as it were.
get_chapters()is the public method, I think. Could you use it instead?
Done. I am not sure why I didn't see that.
tests/punctuation_analysis/test_usfm_structure_extractor.py line 17 at r2 (raw file):
Previously, benjaminking (Ben King) wrote…
Super minor thing, but it took me a long time to realize why this test would result in no chapters, specifically that book numbers were zero-indexed. A different book number might make it more clear why the test should be failing. But don't feel like you have to change the test.
I've changed it to Genesis and Exodus, as I the numbers are clearer for these books.
Enkidu93
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Enkidu93 reviewed 5 of 5 files at r3, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @pmachapman)
machine/punctuation_analysis/paratext_project_quote_convention_detector.py line 20 at r2 (raw file):
Previously, pmachapman (Peter Chapman) wrote…
Throughout - why a list of ints and not a set?
I used the same return type as
get_chapters.
OK, sounds good.
machine/punctuation_analysis/text_segment.py line 81 at r2 (raw file):
Previously, pmachapman (Peter Chapman) wrote…
Good idea - I have changed it.
Great! Thanks.
machine/scripture/__init__.py line 56 at r2 (raw file):
Previously, pmachapman (Peter Chapman) wrote…
Done. I am not sure why I didn't see that.
Great - no worries
machine/punctuation_analysis/paratext_project_quote_convention_detector.py line 23 at r3 (raw file):
) -> Optional[QuoteConventionAnalysis]: handler = QuoteConventionDetector() if handler is None else handler for book_id in get_scripture_books():
Sorry - this is my mistake in the initial implementation. This function ought to return all scripture books, but it's actually only returning OT and NT books (not the DCs).
pmachapman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pmachapman dismissed @Enkidu93 from a discussion.
Reviewable status: 7 of 8 files reviewed, 1 unresolved discussion (waiting on @benjaminking, @ddaspit, and @Enkidu93)
machine/punctuation_analysis/paratext_project_quote_convention_detector.py line 23 at r3 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
Sorry - this is my mistake in the initial implementation. This function ought to return all scripture books, but it's actually only returning OT and NT books (not the DCs).
Done. Thank you for spotting!
machine/scripture/__init__.py line 56 at r2 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
Great - no worries
Done.
Enkidu93
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Enkidu93 reviewed 1 of 1 files at r4, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @pmachapman)
machine/punctuation_analysis/paratext_project_quote_convention_detector.py line 23 at r3 (raw file):
Previously, pmachapman (Peter Chapman) wrote…
Done. Thank you for spotting!
Great - thank you!
pmachapman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @Enkidu93)
machine/punctuation_analysis/paratext_project_quote_convention_detector.py line 23 at r3 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
Great - thank you!
Done.
pmachapman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pmachapman dismissed @Enkidu93 from a discussion.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on @pmachapman)
2279f60 to
6fa5689
Compare
pmachapman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pmachapman reviewed all commit messages.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on @pmachapman)
* Port sillsdev/machine.py#236 * Add an overload to take a chapter-by-id dict for convenience in Serval
Fixes #229.
This change is