Skip to content

Conversation

@sudoskys
Copy link
Member

@sudoskys sudoskys commented Mar 29, 2025

  • No more manual line breaks management!
-detect("hello world", low_memory=False, use_strict_mode=True)
+detect("hello world", low_memory=False, config=LangDetectConfig(allow_fallback=False)

Normalize text input to improve detection accuracy, particularly
for all-uppercase text. This prevents misdetection as Japanese
by converting uppercase text to lowercase. This enhancement
ensures more reliable language predictions.
Reordered imports for better readability and alignment with PEP 8.
This change enhances maintainability by ensuring consistent import
order, making the codebase easier to navigate and understand. 🛠️
Enhanced text normalization by removing newline characters and
lowercasing uppercase text to improve prediction accuracy. Added
warnings for deprecated parameters and improved configuration
management using LangDetectConfig.

These changes enhance text preprocessing and ensure better
configuration management.
Removed the `test_newline` function from `tests/test_real_detection.py` as it was deemed unnecessary. This streamlines our test suite by eliminating redundant checks, ensuring more focused and efficient test execution.
@sudoskys sudoskys changed the title ✨ feat(app): add input normalization to language detection ✨ feat(app): [Compatibility changes] add input normalization to language detection Mar 29, 2025
Improved `_normalize_text` to static method and refined logging
messages for better clarity. This change enhances text processing
by explicitly handling newline characters and long inputs,
as well as aligning with issue #14. 🛠️

Refactoring ensures better code maintainability and readability.
Introduced `_preprocess_text` method to clean and validate text
before detection. This ensures removal of newline characters and
warns if text length exceeds 100 characters, enhancing prediction
accuracy and preventing errors.
@sudoskys sudoskys linked an issue Mar 29, 2025 that may be closed by this pull request
@sudoskys sudoskys merged commit f4fc032 into main Mar 29, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Some upper-case English-language texts detected as Chinese

2 participants