Skip to content

Import all torrents statistics from the tracker (even with high load and many torrents) #469

@josecelano

Description

@josecelano

Discussed in #428

Originally posted by josecelano January 4, 2024
The Index has a cronjob called "statistics importer". It imports torrent statistics from the associated tracker. The default configuration executed the process every hour:

[tracker_statistics_importer]
torrent_info_update_interval = 3600
port = 3002

This tasks:

  1. Gets all torrents from the database. A compact version with only the internal ID and the infohash (SELECT torrent_id, info_hash FROM torrust_torrents).
  2. It iterates over all torrents requesting the statistics from the Tracker REST API.

That solution has some problems:

  • We do not have the same torrents in the Index and the Tracker. The tracker could have 1 torrent and the Index 1000 or the opposite. If the tracker only has one we are making 999 extra requests. The opposite case is not a problem because we are only making one request.
  • If we have too many torrents in the Index making all the requests could take more the the interval you set in the configuration (default to 1h). That means it could take more than 1 hour to import all the statistics.

It seems that the process does scale well and it's not even efficient. There are different solutions depending on which aspect you want to improve. For example, if you want to scale it you can add an importation date and just fetch statistics continuously only importing the torrents that have not been imported in the last hour. This solution does not solve the problem of making request to the tracker for torrents it does not have.

Proposed solution 1

A naive implementation could be:

  • Ask the tracker for the torrent statistics that have changed in the last hour.
  • Use pagination for that response. The tracker could have millions of torrents.
  • The index only has to process the tracker torrents and update the statistics.

Proposed solution 2

We could also not import statistics at all for all torrents and only import statistics for a single torrent when the users load the torrent details page. That would require an extra request to the tracker every time you load the page. It could work if we do not need statistics for other cases.

Other solutions

  • Use other types of APIs where you can subscribe to changes in statistics.
  • Use message queues to subscribe to statistics changes events.
  • Etcetera

I will go with solution 2 and that would make the tracker API even simpler. This is related to using a DashMap for the tracker data. We would not need to use a BTreeMap to order torrents, because right now we only need it for the API. In addition, background tasks are hard to maintain:

  • It's hard to detect problems.
  • There can be deadlock, overlapping, ...
  • Performance can be a problem as I described.
  • ...

cc @da2ce7 @WarmBeer @mario-nt @grmbyrn

Metadata

Metadata

Assignees

Type

No type

Projects

Status

Done

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions