-
-
Notifications
You must be signed in to change notification settings - Fork 486
Use markdown + html parsers to detect codeblocks #107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Both of these tests should be fixed before we can consider thlorenz#37, thlorenz#38, and thlorenz#88 as resolved. Credit goes to @yomed for `test/fixtures/readme-with-code.md`.
|
/cc @Anthropic |
8321b9c to
8612165
Compare
Fixes thlorenz#37, thlorenz#38, thlorenz#84, thlorenz#88, thlorenz#99. This commit builds upon the markdown-to-ast integration we added in thlorenz#91, using markdown-to-ast in `lib/get-html-headers.js`. We first filter our lines to only look for HTML sections (using the markdown parser), then in these HTML-only sections, we search for headers, as long as they're not inside pre tags. This is accomplished by keeping a stack of tags we've descended through. We can peek at the top of the stack to see if we're directly under a `pre` tag, meaning we should ignore everything until we see the closing `pre`. Otherwise, we can collect text from inside heading tags as normal. Thus, the core changes in this commit are: - `grabbing` is a stack now, instead of a string - `markdown-to-ast` lets us filter for only HTML blocks - `htmlparser2` lets us check if we're inside a heading *or* pre tag
8612165 to
5ae9a34
Compare
|
Awesome work - fixes so many issues :) ! However correctness is preferred either way (just interested how big the hit is), so please merge to master and I'll publish a new version. Thanks. |
Good call. Not sure what the best way to test for regressions is, but it seems to still do fine on the test suite and on one ~600 line README from one of my other projects: # origin/master
npm test &> /dev/null 1.55s user 0.20s system 104% cpu 1.661 total
doctoc.js test/fixtures/cmugpi.github.io-README.md 0.30s user 0.04s system 96% cpu 0.352 total
# jez/fiz/headers-in-codeblocks
npm test &> /dev/null 1.61s user 0.20s system 104% cpu 1.726 total
doctoc.js test/fixtures/cmugpi.github.io-README.md 0.42s user 0.05s system 102% cpu 0.453 total
The test suite includes a lot of small tests, as well as a couple a couple longer fixture files. The extra README I found is ~2.5x longer than any of the fixtures. We're not going up by more than 0.1s in either case. If we really cared about performance, we could probably markdown parse the file once (we're doing it twice right now). |
|
Thanks published as minor upgrade (since some behavior may have changed). |
Relevant Issues
Fixes #37, #38, #84, #88, #99.
Summary of Changes
This commit builds upon the markdown-to-ast integration we added in #91,
using markdown-to-ast in
lib/get-html-headers.js.We first filter our lines to only look for HTML sections (using the
markdown parser), then in these HTML-only sections, we search for headers,
as long as they're not inside pre tags.
This is accomplished by keeping a stack of tags we've descended through. We
can peek at the top of the stack to see if we're directly under a
pretag, meaning we should ignore everything until we see the closingpre. Otherwise, we can collect text from inside heading tags as normal.Thus, the core changes in this commit are:
grabbingis a stack now, instead of a stringmarkdown-to-astlets us filter for only HTML blockshtmlparser2lets us check if we're inside a heading or pre tagTest Plan
This pull request adds two passing tests:
excluding headings that should have been included
Basically, the first checks for false positives, and the second checks
for false negatives.