Skip to content

Conversation

@basile-henry
Copy link
Contributor

The PR adds a new iterator: UnicodeWordIndices (and the function unicode_word_indices). It is similar to UnicodeWords but also provides byte offsets for each word.

The motivation for this PR was making nushell/reedline#5 in which I used split_word_bound_indices and then filtered the result using logic that is internal to unicode_words. I believe that PR would have been trivial using unicode_word_indices. Hopefully it can also be useful to others.

Should I add more tests for unicode_word_indices? Or are the existing tests for unicode_words and the doc test for unicode_word_indices sufficient?

The iterator UnicodeWordIndices is similar to UnicodeWord but also provides byte offsets for each word
@Manishearth Manishearth closed this Mar 7, 2021
@Manishearth Manishearth reopened this Mar 7, 2021
@Manishearth
Copy link
Member

Retriggering GHA

@Manishearth Manishearth merged commit cea3ce6 into unicode-rs:master Mar 9, 2021
@basile-henry basile-henry deleted the basile/unicode-word-indices branch March 9, 2021 06:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants