Skip to content

Conversation

@jez
Copy link
Collaborator

@jez jez commented May 30, 2016

Relevant Issues

Fixes #37, #38, #84, #88, #99.

Summary of Changes

This commit builds upon the markdown-to-ast integration we added in #91,
using markdown-to-ast in lib/get-html-headers.js.

We first filter our lines to only look for HTML sections (using the
markdown parser), then in these HTML-only sections, we search for headers,
as long as they're not inside pre tags.

This is accomplished by keeping a stack of tags we've descended through. We
can peek at the top of the stack to see if we're directly under a
pre tag, meaning we should ignore everything until we see the closing
pre. Otherwise, we can collect text from inside heading tags as normal.

Thus, the core changes in this commit are:

  • grabbing is a stack now, instead of a string
  • markdown-to-ast lets us filter for only HTML blocks
  • htmlparser2 lets us check if we're inside a heading or pre tag

Test Plan

This pull request adds two passing tests:

  • one that makes sure we eliminate bad headings nested in pre tags, and
  • one that would fail a non-robust, regex-based search for code blocks,
    excluding headings that should have been included

Basically, the first checks for false positives, and the second checks
for false negatives.

Both of these tests should be fixed before we can consider thlorenz#37, thlorenz#38,
and thlorenz#88 as resolved.

Credit goes to @yomed for `test/fixtures/readme-with-code.md`.
@jez
Copy link
Collaborator Author

jez commented May 30, 2016

/cc @Anthropic

@jez jez force-pushed the fix/headers-in-codeblocks branch from 8321b9c to 8612165 Compare May 30, 2016 23:04
Fixes thlorenz#37, thlorenz#38, thlorenz#84, thlorenz#88, thlorenz#99.

This commit builds upon the markdown-to-ast integration we added in thlorenz#91,
using markdown-to-ast in `lib/get-html-headers.js`.

We first filter our lines to only look for HTML sections (using the
markdown parser), then in these HTML-only sections, we search for
headers, as long as they're not inside pre tags.

This is accomplished by keeping a stack of tags we've descended through.
We can peek at the top of the stack to see if we're directly under a
`pre` tag, meaning we should ignore everything until we see the closing
`pre`. Otherwise, we can collect text from inside heading tags as
normal.

Thus, the core changes in this commit are:

- `grabbing` is a stack now, instead of a string
- `markdown-to-ast` lets us filter for only HTML blocks
- `htmlparser2` lets us check if we're inside a heading *or* pre tag
@jez jez force-pushed the fix/headers-in-codeblocks branch from 8612165 to 5ae9a34 Compare May 30, 2016 23:17
@thlorenz
Copy link
Owner

thlorenz commented Jun 1, 2016

Awesome work - fixes so many issues :) !
LGTM just hoping that the markdown to ast part doesn't affect performance especially for large Readmes too much. Have you done a quick time ... test of this branch against master?

However correctness is preferred either way (just interested how big the hit is), so please merge to master and I'll publish a new version.

Thanks.

@jez
Copy link
Collaborator Author

jez commented Jun 1, 2016

LGTM just hoping that the markdown to ast part doesn't affect performance especially for large Readmes too much. Have you done a quick time ... test of this branch against master?

Good call. Not sure what the best way to test for regressions is, but it seems to still do fine on the test suite and on one ~600 line README from one of my other projects:

# origin/master
npm test &> /dev/null  1.55s user 0.20s system 104% cpu 1.661 total
doctoc.js test/fixtures/cmugpi.github.io-README.md  0.30s user 0.04s system 96% cpu 0.352 total

# jez/fiz/headers-in-codeblocks
npm test &> /dev/null  1.61s user 0.20s system 104% cpu 1.726 total
doctoc.js test/fixtures/cmugpi.github.io-README.md  0.42s user 0.05s system 102% cpu 0.453 total
npm test cmugpi
origin/master 1.661s 0.352s
jez/fix/headers-in-codeblocks 1.726 0.453s

The test suite includes a lot of small tests, as well as a couple a couple longer fixture files. The extra README I found is ~2.5x longer than any of the fixtures. We're not going up by more than 0.1s in either case. If we really cared about performance, we could probably markdown parse the file once (we're doing it twice right now).

@thlorenz
Copy link
Owner

thlorenz commented Jun 1, 2016

Thanks published as minor upgrade (since some behavior may have changed).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants