Skip to content

fix: add batching support for BanCompetitors to handle long input text #272

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 31, 2025

Conversation

abdokaseb
Copy link
Contributor

Change Description

Previously, the BanCompetitors class failed to work with long text because it attempted to process the entire input text in a single request. This caused incomplete processing when the input text exceeded the model's token limit as it truncates input text.

This PR introduces batching logic to split the input text into manageable chunks before sending it to the model. This ensures that BanCompetitors works correctly even with long inputs and models that have lower token limits. Also added a test case for long text that fail for old code and now it pass correctly.

Assumptions:
For fast processing I used approximate token count equation that I already mentioned in the code 1 word ~ 4 characters ~ 2 tokens

Issue reference

N/A – discovered during usage with large text inputs. Please let me know if you'd like me to open a tracking issue.

Checklist

  • I have reviewed the contribution guidelines
  • My code includes unit tests
  • All unit tests and lint checks pass locally
  • My PR contains documentation updates / additions if required

@abdokaseb abdokaseb requested a review from asofter as a code owner July 30, 2025 15:00
@asofter asofter merged commit 53af270 into protectai:main Jul 31, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants