-
Notifications
You must be signed in to change notification settings - Fork 31.2k
AutoImageProcessor #20111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AutoImageProcessor #20111
Conversation
sgugger
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this. On the list of minimal things, I'd add the doc page for iamge processors and test file of AutoImageProcessor.
docs/source/en/_toctree.yml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll need to write this doc ;-)
Should probably remove the one for feature extractor while we're at it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'll still need the feature extractor one for the audio models.
2018283 to
b0d1f98
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renaming to match the naming pattern of FeatureExtractionMixin
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was removed as it's repeat code from lines 72-78
sgugger
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! Just have two comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block should go before trying to get the config (so at line 287).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be cool to also check it works if there is code for a dynamic feature extractor (using "hf-internal-testing/test_dynamic_feature_extractor"),
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can use hf-internal-testing/test_dynamic_feature_extractor directly, as it's for wav2vec2. I could add an equivalent vision feature extractor under e.g. hf-internal-testing/test_dynamic_feature_extractor_vision ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, good point. Adding a new repo sounds fine!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit tricky because of the renaming that happens in AutoImageProcessor.from_pretrained from FeatureExtractor -> ImageProcessor in this line.
It tries to import NewImageProcessor which doesn't exist in https://huggingface.co/hf-internal-testing/test_dynamic_feature_extractor_vision/blob/main/feature_extractor.py
So either:
- https://huggingface.co/hf-internal-testing/test_dynamic_feature_extractor_vision/blob/main/feature_extractor.py has
class NewImageProcessor(CLIPFeatureExtractor) - There's some logic which doesn't rename if the image processor isn't
IMAGE_PROCESSOR_MAPPING_NAMES- although this leads loading a feature extractor, which seems like unwanted behaviour.
What do you think the intended behaviour should be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or we can just ignore custom code for feature extractors (I don't think there are any out for now). It shouldn't slow down the PR as it is a edge case we can deal with later!
|
Still reviewing, but in around this, there is no |
1ebc7bb to
015c77e
Compare
| config = AutoImageProcessor.from_pretrained(tmpdirname) | ||
| self.assertIsInstance(config, CLIPImageProcessor) | ||
|
|
||
| def test_image_processor_from_local_directory_from_feature_extractor_key(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
ydshieh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @amyeroberts for this great work!
LGTM overall, but I have a small doubt
#20111 (comment)
@ydshieh Yes, you're right. I've added a check now here. Can you confirm if this matches with what you think should have been added? |
Yes! |
|
One comment (no need to be done in this PR): I think it would be great if we can remove the from transformers import CLIPModel, AutoProcessor, CLIPProcessor, CLIPImageProcessor, CLIPFeatureExtractor, AutoImageProcessor
p = CLIPImageProcessor.from_pretrained("openai/clip-vit-base-patch32")
print(p.feature_extractor_type)
p.save_pretrained("temp-clip")gives CLIPFeatureExtractoron the terminal, and in the output file "feature_extractor_type": "CLIPFeatureExtractor",
"image_processor_type": "CLIPImageProcessor", |
| ("swin", "ViTImageProcessor"), | ||
| ("swinv2", "ViTImageProcessor"), | ||
| ("van", "ConvNextImageProcessor"), | ||
| ("videomae", "VideoMAEImageProcessor"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I know, videomae and xclip are video models and video feature extractor classes are different than image feature extractor classes. Would that be better to separate them as VideoProcessors or they should it stay as it is? @NielsRogge
* AutoImageProcessor skeleton * Update references * Add mapping in init * Add model image processors to __init__ for importing * Add AutoImageProcessor tests * Fix up * Image Processor documentation * Remove pdb * Update docs/source/en/model_doc/mobilevit.mdx * Update docs * Don't add whitespace on json files * Remove fixtures * Move checking model config down * Fix up * Add check for image processor * Remove FeatureExtractorMixin in docstrings * Rename model_tmpfile to config_tmpfile * Don't make None if not in image processor map
What does this PR do?
Adds the
AutoImageProcessorclass and makes model image processors available to import.Fixes # (issue)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.