Add binary file filtering, prevent AI hallucination, and implement modern web UI #2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
This PR addresses three critical improvements to DocSifter:
Problem Statement
The original implementation had several issues:
Solution
1. Binary File Filtering
Added
is_text_based_file()method that distinguishes between text-based and binary files:Text-based files (AI-enabled):
Binary files (heuristic-only):
Result: Binary files like
photo.jpgnow only get basic tags["image", "photo"]instead of hallucinated content descriptions.2. Conservative AI Prompting
Updated the AI system to be more conservative and evidence-based:
Changes:
Prompt examples:
This ensures the AI only tags what it can clearly infer from metadata, not content it cannot see.
3. Modern Web UI
Created a beautiful, responsive web interface using Flask + HTML/CSS/JS:
Backend (
web_ui.py):Frontend (
templates/index.html):API Endpoints:
GET /api/status- System status and configurationPOST /api/scan- Start scanning a folderGET /api/progress- Get real-time scan progressGET /api/search?q=query- Search tagged filesGET /api/files- Get all tagged filesScreenshots
Initial State
Clean, modern interface with status badges and folder selection.
After Scanning Files
Notice the intelligent tagging:
No hallucinated content for files the AI cannot read!
Search Functionality
Real-time search filters 8 files down to 2 matching "image" tag with instant visual feedback.
Usage
Start the web UI (recommended):
python web_ui.py # Open http://localhost:5000 in your browserOr use the desktop UI (legacy):
Technical Details
New Files:
web_ui.py- Flask backend with DocSifterService classtemplates/index.html- Modern single-page web applicationtest_web_ui.py- Test suite for web UI and binary file detectionUSAGE.md- Comprehensive usage guideModified Files:
docsifter.py- Addedis_text_based_file()method and improvedgenerate_tags_ai()promptrequirements.txt- Addedflask>=2.3.0dependencyREADME.md- Updated with web UI instructionsBackward Compatibility:
The original tkinter UI is preserved and fully functional for users who prefer desktop applications.
Testing
All tests pass successfully:
test_docsifter.py) - 100% passtest_web_ui.py) - 100% passExample test results:
Benefits
Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.