Skip to content

Conversation

@pkarw
Copy link
Contributor

@pkarw pkarw commented Jan 17, 2025

Added:

  • easyOCR support
  • language flag - as it's required by the easyOCR details - it supports multi language options like en or en,de,it - it's for loading the proper OCR language weights

Removed:

  • tesseract - as it's quality was very low and not frequently used, removed to save the maintenance resources,
  • marker - as it implies the GPL3 license - looking forward for an example adding marker as a 3rd party strategy!

Changed:

  • LICENSE - license changed to MIT

@pkarw pkarw changed the title feat: easyOCR feat: easyOCR added, tesseract - removed, marker - removed, license changed to MIT Jan 17, 2025
@pkarw pkarw requested a review from choinek January 17, 2025 13:34
Copy link
Collaborator

@choinek choinek Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I'll commit similar settings for PyCharm :)

@choinek
Copy link
Collaborator

choinek commented Jan 18, 2025

Works for me, thx

 ✔ Read text from images using ocr request method with data set "read example-invoice.pdf with extract strategy: easyocr and model: llama3.2-vision saved in default"
 ✔ Read text from images using ocr request method with data set "read example-invoice.pdf with extract strategy: easyocr and model: llama3.2-vision saved in s3"
 ✔ Read text from images using ocr request method with data set "read example-invoice.pdf with extract strategy: easyocr and model: llama3.1 saved in default"
 ✔ Read text from images using ocr request method with data set "read example-invoice.pdf with extract strategy: easyocr and model: llama3.1 saved in s3"
 ✔ Read text from images using ocr request method with data set "read example-mri.pdf with extract strategy: easyocr and model: llama3.2-vision saved in default"
 ✔ Read text from images using ocr request method with data set "read example-mri.pdf with extract strategy: easyocr and model: llama3.2-vision saved in s3"
 ✔ Read text from images using ocr request method with data set "read example-mri.pdf with extract strategy: easyocr and model: llama3.1 saved in default"
 ✔ Read text from images using ocr request method with data set "read example-mri.pdf with extract strategy: easyocr and model: llama3.1 saved in s3"

@choinek choinek merged commit 316884b into main Jan 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[cleanup] Remove Tesseract support after adding easyOCR [feat] Change license to MIT by removing dependency to marker [feat] Add EasyOCR strategy

3 participants