Add unicode_word_indices #91

basile-henry · 2021-03-07T18:31:34Z

The PR adds a new iterator: UnicodeWordIndices (and the function unicode_word_indices). It is similar to UnicodeWords but also provides byte offsets for each word.

The motivation for this PR was making nushell/reedline#5 in which I used split_word_bound_indices and then filtered the result using logic that is internal to unicode_words. I believe that PR would have been trivial using unicode_word_indices. Hopefully it can also be useful to others.

Should I add more tests for unicode_word_indices? Or are the existing tests for unicode_words and the doc test for unicode_word_indices sufficient?

The iterator UnicodeWordIndices is similar to UnicodeWord but also provides byte offsets for each word

Manishearth · 2021-03-07T23:02:31Z

Retriggering GHA

Add unicode_word_indices

8bd6e3a

The iterator UnicodeWordIndices is similar to UnicodeWord but also provides byte offsets for each word

Manishearth closed this Mar 7, 2021

Manishearth reopened this Mar 7, 2021

Manishearth approved these changes Mar 7, 2021

View reviewed changes

Manishearth merged commit cea3ce6 into unicode-rs:master Mar 9, 2021

basile-henry deleted the basile/unicode-word-indices branch March 9, 2021 06:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add unicode_word_indices #91

Add unicode_word_indices #91

Uh oh!

basile-henry commented Mar 7, 2021

Uh oh!

Manishearth commented Mar 7, 2021

Uh oh!

Uh oh!

Add unicode_word_indices #91

Add unicode_word_indices #91

Uh oh!

Conversation

basile-henry commented Mar 7, 2021

Uh oh!

Manishearth commented Mar 7, 2021

Uh oh!

Uh oh!