Skip to content

Fix unescaped "<" in MathML in PDF #18520

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

NSoiffer
Copy link
Contributor

@NSoiffer NSoiffer commented Jul 21, 2025

Link to issue number:

Fixes #18511

Summary of the issue:

If the structure tree is used for MathML in PDF, calls to getValue return the interpreted &gt; so that it becomes < and we have an HTML syntax error.

Description of user facing changes:

The bug is fixed.

Description of developer facing changes:

None.

Description of development approach:

Called html.escape() for the result of the getValue (when it was not "none")

Testing strategy:

I used the pdf doc that was part of the issue and tested to make sure I saw the error before the fix and that there was no error after the fix.b

Known issues with pull request:

Code Review Checklist:

  • Documentation:
    • Change log entry
    • User Documentation
    • Developer / Technical Documentation
    • Context sensitive help for GUI changes
  • Testing:
    • Unit tests
    • System (end to end) tests
    • Manual testing
  • UX of all users considered:
    • Speech
    • Braille
    • Low Vision
    • Different web browsers
    • Localization in other languages / culture than English
  • x] API is compatible with existing add-ons.
  • Security precautions taken.

@coderabbitai summary

NSoiffer added 4 commits July 19, 2025 14:32
…ce`. It isn't used in speech (although maybe it should trigger a pause if wide), but it is used in some braille notations as a signal that this is a "fill in the blank" space.

I also added the elementary math attributes used in MathPlayer. Neither MathCAT nor Access8Math currently support the elementary math notations, but it is on the list of things to implement for MathCAT.

Note: potentially this could go into the beta, but I can't get the beta to build on my machine so I can't test the fix there.
@NSoiffer NSoiffer requested a review from a team as a code owner July 21, 2025 04:57
@NSoiffer NSoiffer requested a review from seanbudd July 21, 2025 04:57
@seanbudd
Copy link
Member

This seems to have code changes from #18508

Could you clarify if that PR should be closed in favour of this one? There will be merge conflicts when we squash merge these.

@seanbudd seanbudd changed the title Bug 18511 -- unescaped "<" in MathML in PDF Fix unescaped "<" in MathML in PDF Jul 22, 2025
@seanbudd seanbudd added the conceptApproved Similar 'triaged' for issues, PR accepted in theory, implementation needs review. label Jul 22, 2025
@NSoiffer
Copy link
Contributor Author

My apologies. I think I forgot to checkout master before creating the new branch. They are separate bugs, but both deal with PDF math and are in proximity to each other. Hence, they are slightly related. Please do whatever is simplest.

Comment on lines +16 to 18
* Fixed '<' not being escaped in MathML in PDF documents. Most likely this will happen in `mo` elements.
* Fixed missing MathML attributes for `mspace` in PDF documents. Also passed along attributes for elementary math elements.
* Fixed support for paragraph mouse text unit in Java applications. (#18231, @hwf1324)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add issue/PR numbers and attribution if you wish. Also put new sentences on new lines

@seanbudd
Copy link
Member

@NSoiffer - it would probably be better to remove the duplication to make it easier to track and revert the separate changes

@seanbudd seanbudd marked this pull request as draft July 31, 2025 05:55
@seanbudd seanbudd added this to the 2025.3 milestone Jul 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
conceptApproved Similar 'triaged' for issues, PR accepted in theory, implementation needs review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Text content is not correctly linearised as XML for MathML reading
2 participants