[DISCARDED] Pull Request: Fix ArXiv Fetch Script - Replace urllib with requests and improve error handling #192
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes
Description
This PR modernizes the
arxiv_fetch.pyscript by replacing the deprecatedurllibimplementation with the more robustrequestslibrary and implements several critical improvements for reliability and maintainability.Issues Fixed
urllib.requestwithrequestslibrary for better HTTP handlingTechnical Changes
Core Library Migration
urllib.requesttorequestslibraryHTTPAdapterwith retry strategy (5 retries, exponential backoff)Data Structure Improvements
PLAN_INDEXcolumn from all CSV headers and outputHEADER_COUNT,HEADER_CATEGORY,HEADER_YEAR,HEADER_AUTHORsave_count_data()functionLicense Detection Enhancement
extract_license_info()function to properly handle case conversion.lower()to.upper()for consistent license matchingError Handling & Rate Limiting
requests.Session()andHTTPAdapterrequests.RequestExceptionhandlingFiles Modified
scripts/1-fetch/arxiv_fetch.py- Complete refactor with modernized HTTP handlingChecklist
Update index.md).mainormaster).visible errors.
Developer Certificate of Origin
For the purposes of this DCO, "license" is equivalent to "license or public domain dedication," and "open source license" is equivalent to "open content license or public domain dedication."
Developer Certificate of Origin