The Storage Provider Url Finder is a microservice designed to test the retrievability of files claimed by Storage Providers (SPs) on the Filecoin network. Its primary goal is to verify whether files associated with deals are accessible via HTTP endpoints advertised by SPs. This service focuses exclusively on HTTP retrievability.
DATABASE_URL- The URL of the DMOB database
NoCidContactData- There is not data incid.contactgivenpeer_idMissingAddrFromCidContact- There are no addresses inExtendedProvidersincid.contactresponseMissingHttpAddrFromCidContact- There are no http addresses inExtendedProvidersincid.contactresponseNoDealsFound- No deals found for the given SP addressFailedToGetWorkingUrl- No working URLs foundSuccess- The URL was found successfully and is returned with the response
FailedToGetPeerId- Failed to get the peer ID from the lotus rpcFailedToRetrieveCidContactData- Failed to retrieve the miner info from the database
- Async Job: User requests a job, which is processed in the background. The user receives a
job_idthat is used to check the status and results later. The jobs are processed one by one, allowing for multiple jobs to be created without blocking the service - Direct Call: User requests measurement in synchronous mode, which is processed immediately. The result is not stored or cached and its returned as a response to the request. Response might take up to several minutes and might time out if the request takes too long
- Input: The service accepts requests for a Storage Provider address, a Client address, or both.
- SP Endpoint Discovery: For each SP, the service:
- Retrieves the peer ID using Lotus RPC (
Filecoin.StateMinerInfo) - Calls cid.contact for HTTP endpoints associated with the peer ID (from
ExtendedProviderssection) - Parses multiaddresses to extract usable HTTP endpoints
- Retrieves the peer ID using Lotus RPC (
- Deal and PieceCID Selection:
- Calls the database for deals matching the SP (and optionally the client)
- Selects up to
100random pieceCIDs per SP/client pair to ensure a representative sample
- URL Construction:
- Constructs URLs by combining each HTTP endpoint with each pieceCID (
http://{HTTP_ENPOINT}/piece/{pieceCID})
- Constructs URLs by combining each HTTP endpoint with each pieceCID (
- Retrievability Testing:
- Concurrently tests up to
20URLs at a time using HTTP HEAD requests - Records which URLs are reachable (return a successful HTTP response)
- Saves one working URL and calculates the retrievability percentage (working URLs / total tested)
- Concurrently tests up to
- Result:
- Returns/Saves the working URL and retrievability percentage
- Provides detailed result codes and error codes when applicable
- Random Sampling: Testing a random subset of pieceCIDs ensures that retrievability metrics are not biased by deal ordering or selection
- Async Job: Using async jobs allows the process to run in the background, mitigating timeouts and allowing to create multiple jobs at once that will be processed one by one
- External Data Sources: Lotus RPC and cid.contact are authoritative sources for miner info and endpoints
- HTTP Endpoint Reliability: Not all SPs maintain reliable or up-to-date HTTP endpoints.
- HEAD Requests Only: HEAD requests check for basic reachability, but do not guarantee that the file can be fully downloaded or is valid
- Scheduled Checks: Implement a scheduler to periodically re-check URLs for retrievability, ensuring that the data remains current
- Database Integration: Store results in a database to allow for faster response times
- File Header Analysis: Extend the service to analyze file headers for additional metadata e.g. size, diff between deal size and actual file size
- Historical Metrics: Track retrievability metrics over time for trend analysis