-
Notifications
You must be signed in to change notification settings - Fork 30.5k
🚨🚨 🚨🚨 [Tokenizer
] attemp to fix add_token issues🚨🚨 🚨🚨
#23909
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
268 commits
Select commit
Hold shift + click to select a range
4aea714
fix test for bart. Order is correct now let's skip BPEs
ArthurZucker 0df212f
ouf
ArthurZucker c04bd58
styling
ArthurZucker 4a54920
fix bert....
ArthurZucker 189b259
slow refactoring
ArthurZucker f1c362a
current updates
ArthurZucker f5b178a
Merge branch 'main' of https://github.com/huggingface/transformers in…
ArthurZucker 0047eb9
massive refactoring
ArthurZucker 0e34205
update
ArthurZucker 81e73d7
NICE!
ArthurZucker 1f3ff93
update to see where I am at
ArthurZucker 3701e26
updates
ArthurZucker 0369a51
update
ArthurZucker b621c96
update
ArthurZucker 83feef4
Merge branch 'main' of https://github.com/huggingface/transformers in…
ArthurZucker 2e7b733
revert
ArthurZucker 582bd26
Merge branch 'main' of https://github.com/huggingface/transformers in…
ArthurZucker 4c78792
updates
ArthurZucker 25b9836
updates
ArthurZucker 7759304
start supporting legacy_save
ArthurZucker 53b29e5
styling
ArthurZucker a7ab994
big update
ArthurZucker 6fb6b31
revert some changes
ArthurZucker 9b2e138
nits
ArthurZucker dffd6d8
nniiiiiice
ArthurZucker b86bb3a
small fixes
ArthurZucker b9bb598
kinda fix t5 with new behaviour
ArthurZucker 55ea2f5
major update
ArthurZucker ff13cb1
fixup
ArthurZucker 5b46dc9
fix copies
ArthurZucker 70252e5
today's updates
ArthurZucker d76b414
fix byt5
ArthurZucker 496679b
upfate
ArthurZucker 521907d
update
ArthurZucker 1d4e947
update
ArthurZucker 6ae5e51
updates
ArthurZucker c42800e
update vocab size test
ArthurZucker 754b219
Barthez does not use not need the fairseq offset ids
ArthurZucker 80ff47a
super calll must be after
ArthurZucker 14726cc
calll super
ArthurZucker 1f24347
move all super init
ArthurZucker d2ed9c6
move other super init
ArthurZucker 74b22e1
fixup
ArthurZucker 162e61d
nits
ArthurZucker 1643432
more fixes
ArthurZucker 5db9604
nits
ArthurZucker 957cb54
more fixes
ArthurZucker 29e32f6
Merge branch 'main' of https://github.com/huggingface/transformers in…
ArthurZucker f0de438
nits
ArthurZucker d31acd2
more fix
ArthurZucker a6f0094
remove useless files
ArthurZucker 38b8942
ouch all of them are affected
ArthurZucker 8096378
and more!
ArthurZucker 2d55c90
small imporvements
ArthurZucker 31ac52d
no more sanitize token
ArthurZucker da42a86
more changes around unique no split tokens
ArthurZucker e34c6ea
partially fix more things
ArthurZucker becb4a5
keep legacy save but add warning
ArthurZucker 2ddcc28
so... more fixes
ArthurZucker 7bd0b15
updates
ArthurZucker 365f815
guess deberta tokenizer could be nuked
ArthurZucker 5ff1e04
fixup
ArthurZucker 9284bbf
fixup did some bad things
ArthurZucker dbdafeb
nuke it if it breaks
ArthurZucker 83310c9
remove prints and pretrain fast from slow with new format.
ArthurZucker 9a461f5
fixups
ArthurZucker 015e796
Apply suggestions from code review
ArthurZucker 6e17f4e
fiou
ArthurZucker 05e1c56
Merge branch 'fix-add-tokens' of https://github.com/ArthurZucker/tran…
ArthurZucker 181c112
nit
ArthurZucker e9887e8
by default specials should not be normalized?
ArthurZucker df035b7
update
ArthurZucker d7a2458
remove brakpoint
ArthurZucker f5c8f2c
updates
ArthurZucker 64237fc
a lot of updates
ArthurZucker fc01d58
fixup
ArthurZucker f101206
fixes revert some changes to match fast
ArthurZucker 4834e64
small nits
ArthurZucker 1acbe48
that makes it cleaner
ArthurZucker 4c9a61e
fix camembert accordingly
ArthurZucker 62b98b0
update
ArthurZucker 62cedf3
some lest breaking changes
ArthurZucker 395259a
update
ArthurZucker f4b3c85
fixup
ArthurZucker e6c7b28
fix byt5 and whisper mostly
ArthurZucker 8626851
some more fixes, canine's byte vocab
ArthurZucker 0d4e360
fix gpt2
ArthurZucker d639ea6
fix most of the perceiver tests (4 left)
ArthurZucker 54a7ac8
fix layout lmv3
ArthurZucker e2b04c5
fixup
ArthurZucker 70ca39b
fix copies for gpt2 style
ArthurZucker cd248db
make sure to only warn once
ArthurZucker 02185cf
fix perciever and gpt2 tests
ArthurZucker 7105eea
some more backward compatibility: also read special tokens map becaus…
ArthurZucker 8f5d96b
fixup
ArthurZucker 9f4f1e4
add else when reading
ArthurZucker f0c41a1
nits
ArthurZucker 2812c70
fresh updates
ArthurZucker 984a307
fix copies
ArthurZucker 2fc2349
will this make everything faster?
ArthurZucker 3861d1d
fixes
ArthurZucker 0e20614
more fixes
ArthurZucker 6dbb814
update
ArthurZucker 40270d8
more fixes
ArthurZucker 7b92445
fixup
ArthurZucker 7b9d373
is the source of truth right?
ArthurZucker 6631d42
Merge branch 'main' of https://github.com/huggingface/transformers in…
ArthurZucker 58abfe7
sorry camembert for the troubles
ArthurZucker fd4d7f8
current updates
ArthurZucker 8e1225f
fixup
ArthurZucker 98c869f
update led
ArthurZucker da0414c
update
ArthurZucker b1b2779
fix regression
ArthurZucker 161e09b
fix single word
ArthurZucker 0ca9c8a
more model specific fixes
ArthurZucker 87bc36d
fix t5 tests
ArthurZucker 63dce31
fixup
ArthurZucker 6c6f70a
more comments
ArthurZucker f5252a1
update
ArthurZucker c7c8ebe
Merge branch 'main' of https://github.com/huggingface/transformers in…
ArthurZucker 38a013f
fix nllb
ArthurZucker 3026e49
rstrip removed
ArthurZucker d3c9024
small fixes
ArthurZucker 5242be7
better handle additional_special_tokens and vocab sizes
ArthurZucker 769e215
fixing
ArthurZucker bf2cb8a
styling
ArthurZucker d840605
fix 4 / 21
ArthurZucker 3bf7591
fixup
ArthurZucker 40d55f7
fix nlbb's tests
ArthurZucker 0b65a21
some fixes
ArthurZucker 7a706a6
fix t5
ArthurZucker 53d4b13
fixes
ArthurZucker 448cfd4
style
ArthurZucker f521b60
fix canine tests
ArthurZucker 384ea05
damn this is nice
ArthurZucker 5c05f4a
nits
ArthurZucker 1b1aad0
m2m100 nit
ArthurZucker 6961910
fixups
ArthurZucker 969d46e
fixes!
ArthurZucker d957860
fixup
ArthurZucker c9732fb
stash
ArthurZucker dd99191
Merge branch 'main' of https://github.com/huggingface/transformers in…
ArthurZucker 463cbb4
fix merge
ArthurZucker 965bca1
Merge branch 'main' of https://github.com/huggingface/transformers in…
ArthurZucker 3a79633
revert bad change
ArthurZucker bf7ba15
Merge branch 'main' of https://github.com/huggingface/transformers in…
ArthurZucker 717b4b9
fixup
ArthurZucker 3d10fb0
correct order for code Llama
ArthurZucker 78d4215
fix speecht5 post merge
ArthurZucker 3605c75
styling
ArthurZucker abfe019
revert source of 11 fails
ArthurZucker 677fc8e
small nits
ArthurZucker 548e062
all changes in one go
ArthurZucker 07e4d1e
fnet hack
ArthurZucker b871e7e
fix 2 more tests
ArthurZucker 1520d14
Merge branch 'main' of https://github.com/huggingface/transformers in…
ArthurZucker 136c877
update based on main branch of tokenizers
ArthurZucker b6a6ec7
Merge branch 'main' of https://github.com/huggingface/transformers in…
ArthurZucker a304a3c
fixup
ArthurZucker a9f47e2
fix VITS issues
ArthurZucker 47387ec
more fixes
ArthurZucker c8acd2c
Merge branch 'main' of https://github.com/huggingface/transformers in…
ArthurZucker 0c584d7
fix mgp test
ArthurZucker de0df2f
fix camembert issues
ArthurZucker fa2c424
oups camembert still has 2 failing tests
ArthurZucker 4498ad8
Merge branch 'main' of https://github.com/huggingface/transformers in…
ArthurZucker d4de85c
mluke fixes
ArthurZucker 6df2fe3
decode fixes
ArthurZucker 4167a74
small nits
ArthurZucker d3e1bf1
nits
ArthurZucker 0d65b83
fix llama and vits
ArthurZucker a52bb94
fix camembert
ArthurZucker bf58e4e
smal nits
ArthurZucker 25adcb8
more fixes when initialising a fast from a slow and etc
ArthurZucker fc799f1
fix one of the last test
ArthurZucker 205a6ed
Merge branch 'main' of https://github.com/huggingface/transformers in…
ArthurZucker ecbee31
fix CPM tokenizer test
ArthurZucker 6170200
fixups
ArthurZucker e384d13
fix pop2piano
ArthurZucker b69895c
fixup
ArthurZucker c0c77a5
⚠️ Change tokenizers required version ⚠️
ArthurZucker 957329f
⚠️ Change tokenizers required version ⚠️
ArthurZucker ae48fb2
"tokenizers>=0.14,<0.15", don't forget smaller than
ArthurZucker 8e22a05
Merge branch 'main' of https://github.com/huggingface/transformers in…
ArthurZucker 2b69acb
fix musicgen tests and pretraiendtokenizerfast
ArthurZucker abefa05
fix owlvit and all
ArthurZucker 52c2f56
update t5
ArthurZucker ddb4a2b
fix 800 red
ArthurZucker 6029ee3
fix tests
ArthurZucker 3bebdc1
fix the fix of the fix of t5
ArthurZucker b4aa11f
styling
ArthurZucker 0f7b941
documentation nits
ArthurZucker ca33d35
Skip failing whisper test by Merge branch 'main' of https://github.co…
ArthurZucker 9ae3380
cache _added_tokens_encoder
ArthurZucker f7d9a6f
fixups
ArthurZucker 5136647
Nit
ArthurZucker ca4835c
fix red tests
ArthurZucker c03e549
one last nit!
ArthurZucker 8569508
make eveything a lot simpler
ArthurZucker a5dcdab
Now it's over :wink:
ArthurZucker f8638b3
few small nits
ArthurZucker 68b5f54
Apply suggestions from code review
ArthurZucker de40bf2
updates that work for now
ArthurZucker 9c6e48f
Merge branch 'fix-add-tokens' of https://github.com/arthurzucker/tran…
ArthurZucker 0c661ec
tests that should no be skipped / changed and fixed next
ArthurZucker 3bbe7f0
fixup
ArthurZucker 9b22a80
i am ashamed
ArthurZucker f75d9bd
pushe the fix
ArthurZucker 23d08be
update
ArthurZucker b68d937
fixups
ArthurZucker 39ec344
nits
ArthurZucker bdb36a1
fix added_tokens_encoder
ArthurZucker 60f7b70
fix canine test
ArthurZucker 8ef282f
fix pegasus vocab
ArthurZucker 3b82eac
fix transfoXL
ArthurZucker 719ce61
fixup
ArthurZucker 3555fc6
whisper needs to be fixed for train new
ArthurZucker f206fd1
pegasus nits
ArthurZucker cf42578
more pegasus fixes
ArthurZucker 03de2ec
minor update
ArthurZucker a21d611
better error message in failed test
ArthurZucker 604a677
fix whisper failing test
ArthurZucker f062446
fix whisper failing test
ArthurZucker d2367f1
fix pegasus
ArthurZucker 04de249
fixup
ArthurZucker d1ccd62
fix **** pegasus
ArthurZucker ea83357
reset things
ArthurZucker cfd2e82
remove another file
ArthurZucker 650d403
attempts to fix the strange custome encoder and offset
ArthurZucker 6896a75
nits here and there
ArthurZucker d73fac1
update
ArthurZucker 55fa70c
fixup
ArthurZucker a8cd1f0
nit
ArthurZucker a95460c
fix the whisper test
ArthurZucker 5292ea6
nits nits
ArthurZucker 6e6a0d6
Apply suggestions from code review
ArthurZucker d519be5
updates based on review
ArthurZucker 589be23
some small update to potentially remove
ArthurZucker d20e8d7
Merge branch 'fix-add-tokens' of github.com:ArthurZucker/transformers…
ArthurZucker 1f007bf
nits
ArthurZucker 459b082
Merge branch 'main' of github.com:huggingface/transformers into fix-a…
ArthurZucker 1155d3c
import rlu cache
ArthurZucker 8516d1c
Update src/transformers/tokenization_utils_base.py
ArthurZucker db19f2b
move warning to `from_pretrained`
ArthurZucker 70f303b
Merge branch 'main' of https://github.com/huggingface/transformers in…
ArthurZucker 51eda03
update tests results now that the special tokens are always added
ArthurZucker 24ac840
Merge branch 'main' of https://github.com/huggingface/transformers in…
ArthurZucker 04bef33
Merge branch 'main' of github.com:huggingface/transformers into fix-a…
ArthurZucker 7b86b04
Merge branch 'main' of https://github.com/huggingface/transformers in…
ArthurZucker 9424fdf
Merge branch 'fix-add-tokens' of https://github.com/ArthurZucker/tran…
ArthurZucker File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -166,4 +166,4 @@ tags | |
.DS_Store | ||
|
||
# ruff | ||
.ruff_cache | ||
.ruff_cache |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.