Skip to content

Conversation

@amyeroberts
Copy link
Contributor

@amyeroberts amyeroberts commented Nov 7, 2022

What does this PR do?

Adds the AutoImageProcessor class and makes model image processors available to import.

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. On the list of minimal things, I'd add the doc page for iamge processors and test file of AutoImageProcessor.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll need to write this doc ;-)
Should probably remove the one for feature extractor while we're at it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll still need the feature extractor one for the audio models.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renaming to match the naming pattern of FeatureExtractionMixin

@amyeroberts amyeroberts marked this pull request as ready for review November 8, 2022 12:47
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was removed as it's repeat code from lines 72-78

@amyeroberts amyeroberts requested a review from ydshieh November 8, 2022 15:42
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Just have two comments.

Comment on lines +296 to +289
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block should go before trying to get the config (so at line 287).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved 👍

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be cool to also check it works if there is code for a dynamic feature extractor (using "hf-internal-testing/test_dynamic_feature_extractor"),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can use hf-internal-testing/test_dynamic_feature_extractor directly, as it's for wav2vec2. I could add an equivalent vision feature extractor under e.g. hf-internal-testing/test_dynamic_feature_extractor_vision ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, good point. Adding a new repo sounds fine!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit tricky because of the renaming that happens in AutoImageProcessor.from_pretrained from FeatureExtractor -> ImageProcessor in this line.

It tries to import NewImageProcessor which doesn't exist in https://huggingface.co/hf-internal-testing/test_dynamic_feature_extractor_vision/blob/main/feature_extractor.py

So either:

What do you think the intended behaviour should be?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we can just ignore custom code for feature extractors (I don't think there are any out for now). It shouldn't slow down the PR as it is a edge case we can deal with later!

@ydshieh
Copy link
Collaborator

ydshieh commented Nov 8, 2022

Still reviewing, but in processing_auto.py/from_pretrained, the beginning part

https://github.com/huggingface/transformers/blob/1ebc7bb995c5e43961a7c8079ca3bf29f06f2411/src/transformers/models/auto/processing_auto.py#L197

around this, there is no ImageProcessingMixin. I feel this is a miss and should appear here?

config = AutoImageProcessor.from_pretrained(tmpdirname)
self.assertIsInstance(config, CLIPImageProcessor)

def test_image_processor_from_local_directory_from_feature_extractor_key(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

Copy link
Collaborator

@ydshieh ydshieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @amyeroberts for this great work!

LGTM overall, but I have a small doubt
#20111 (comment)

@amyeroberts
Copy link
Contributor Author

Still reviewing, but in processing_auto.py/from_pretrained, the beginning part

https://github.com/huggingface/transformers/blob/1ebc7bb995c5e43961a7c8079ca3bf29f06f2411/src/transformers/models/auto/processing_auto.py#L197

around this, there is no ImageProcessingMixin. I feel this is a miss and should appear here?

@ydshieh Yes, you're right. I've added a check now here. Can you confirm if this matches with what you think should have been added?

@ydshieh
Copy link
Collaborator

ydshieh commented Nov 8, 2022

Can you confirm if this matches with what you think should have been added?

Yes!

@ydshieh
Copy link
Collaborator

ydshieh commented Nov 8, 2022

One comment (no need to be done in this PR): I think it would be great if we can remove the feature_extractor_type key after loading the image processor.

from transformers import CLIPModel, AutoProcessor, CLIPProcessor, CLIPImageProcessor, CLIPFeatureExtractor, AutoImageProcessor

p = CLIPImageProcessor.from_pretrained("openai/clip-vit-base-patch32")
print(p.feature_extractor_type)
p.save_pretrained("temp-clip")

gives

CLIPFeatureExtractor

on the terminal, and in the output file preprocessor_config.json, we have

  "feature_extractor_type": "CLIPFeatureExtractor",
  "image_processor_type": "CLIPImageProcessor",

@amyeroberts amyeroberts merged commit 4eb918e into huggingface:main Nov 8, 2022
@amyeroberts amyeroberts deleted the autoimageprocessor branch November 8, 2022 19:54
("swin", "ViTImageProcessor"),
("swinv2", "ViTImageProcessor"),
("van", "ConvNextImageProcessor"),
("videomae", "VideoMAEImageProcessor"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know, videomae and xclip are video models and video feature extractor classes are different than image feature extractor classes. Would that be better to separate them as VideoProcessors or they should it stay as it is? @NielsRogge

mpierrau pushed a commit to mpierrau/transformers that referenced this pull request Dec 15, 2022
* AutoImageProcessor skeleton

* Update references

* Add mapping in init

* Add model image processors to __init__ for importing

* Add AutoImageProcessor tests

* Fix up

* Image Processor documentation

* Remove pdb

* Update docs/source/en/model_doc/mobilevit.mdx

* Update docs

* Don't add whitespace on json files

* Remove fixtures

* Move checking model config down

* Fix up

* Add check for image processor

* Remove FeatureExtractorMixin in docstrings

* Rename model_tmpfile to config_tmpfile

* Don't make None if not in image processor map
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants