Intermittent timeouts on API endpoints

Incident Report for The Rust Programming Language

Resolved

crates.io has returned to normal service with the processing of the download count backlog from earlier and the completion of the long running background jobs. Investigations will continue during normal hours for the crates.io team to ascertain what is causing elevated database load.
Posted Feb 15, 2024 - 02:43 UTC

Update

We are continuing to monitor for any further issues.
Posted Feb 15, 2024 - 00:12 UTC

Monitoring

The long running background job has completed, and response times for the summary endpoint have returned to normal.

The next invocation of the relevant background job will be at 00:30 UTC (so in just under 20 minutes); we will be monitoring that closely to see if any problems resurface at that point.
Posted Feb 15, 2024 - 00:12 UTC

Identified

We believe this issue is being caused by excess database load related to a bug fix deployed earlier today around download counting. This bug fix has caused the normal background processing of per-crate download count totals to take significantly longer and require more resources than usual.

We will shortly be temporarily disabling the summary endpoint to alleviate some of the load on the database.
Posted Feb 14, 2024 - 22:13 UTC

Update

(If this looks suspiciously similar to https://status.crates.io/incidents/t49v2pfpv0vl, the same issue reappeared on the summary endpoint literally within seconds of resolving that incident. C'est la vie.)
Posted Feb 14, 2024 - 19:30 UTC

Investigating

Some crates.io endpoints are timing out at present, including the summary route that drives the crates.io home page. We are investigating.
Posted Feb 14, 2024 - 19:30 UTC
This incident affected: crates.io.