Skip to content

Backfill missing Pypi dependencies #3045

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Feb 3, 2023

Conversation

papiro
Copy link
Contributor

@papiro papiro commented Feb 2, 2023

Rewrote the rake task for the backfill based on learnings from the past couple days.

This query doesn't find broken projects. It finds projects which are potentially broken and effectively short-lists them to be resynced (by PackageManagerDownloadWorker).

It will find affected projects in batches of 120, group those into 2, and then run 2 every second for 1 minute, and then repeat for the next minute, and so on. It will fix 2 projects per second until all projects are fixed.

There are currently 103999 affected projects, so it is expected to take ~14.5 hours.

The reason we are not running a script which calls Pypi#save_dependencies (like in a previous commit is because it doesn't set an indicator which allows it to be filtered out in subsequent queries in case the backfill unexpectedly stops or needs to be stopped and restarted for any reason.

The reason we are not querying for versions instead of projects and running PackageManagerDownloadWorker with the version arg is to avoid a situation where a project has multiple affected versions and one gets completed by the worker and the other fails or the task needs to be stopped and restarted. In that case the query will not pick the project back up since its last_resync_at timestamp got updated and it wouldn't be possible to query for such an anomaly.

7/6/2022 is the date the Pypi api changed
2/1/2022 is the day after the fix to Libraries (#3040) went live

@@ -135,7 +135,7 @@ def self.parse_pep_508_dep_spec(dep)

def self.dependencies(name, version, _mapped_project = nil)
api_response = get("https://pypi.org/pypi/#{name}/#{version}/json")
deps = api_response.dig("info", "requires_dist")
deps = api_response.dig("info", "requires_dist") || []
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just prevents us from logging if requires_dist is null. It was raising on the deps.map below, but it seems like it's Pypi's way (or a way) of indicating that there are no dependencies.

Copy link
Member

@tiegz tiegz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! Getting projects instead of versions seems like a good strategy

@papiro papiro merged commit a3320ce into main Feb 3, 2023
@papiro papiro deleted the pp/sc-29698/dependencies-not-showing-up-in-release-api branch February 3, 2023 17:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants