PWS Experiencing API Issues
Incident Report for Pivotal Web Services


From approximately 2018/05/12 02:15 to 05:46 UTC (3h 31m), users were unable to push new apps to PWS. There was an initial outage, a brief recovery, and a second (expected) outage while the fix was put in place. The y-axis of the chart below is a measure of success; "1" means pushes are succeeding, and "0" means pushes are failing.

screen shot 2018-05-14 at 11 23 32 am

Root Cause

CloudOps, which maintains PWS, mistakenly deployed a known bad version of CAPI (the Cloud Controller API).

Version 1.56.0 of CAPI introduced a bug where repeated requests to v3 endpoints result in too many open statsd connections, which in turn exhausted the number of available open file descriptors (1024). This meant new sockets could not be established, which resulted failed requests and/or crashed CCs.

The broken CAPI release was part of a larger cf-deployment release (1.32.0), which contained a UAA security fix. CloudOps intended to add an operations manifest file to exclude the CAPI update, but didn't.


Users were unable to push apps to PWS during the outage or otherwise interact with the Cloud Controller. Existing apps were unaffected.

As the file descriptor exhaustion caused Cloud Controller API to become a "bad actor", the error messages were legion:

  • Failed to perform blobstore operation after three retries.
  • Stats server temporarily unavailable.
  • The UAA service is currently unavailable
  • An unknown error occurred.


CloudOps deployed the old, good version (1.55.0) of CAPI. This required a manual database rollback and down migrations aren't supported.

Posted 9 months ago. May 18, 2018 - 13:55 PDT

This incident has been resolved.
Posted 10 months ago. May 12, 2018 - 12:27 PDT
Our fix appears to have worked, we'll continue to monitor. If you continue to see any issues, please contact
Posted 10 months ago. May 11, 2018 - 23:01 PDT
We are rolling out a fix that will incur a slight amount of API downtime. You may experience HTTP 500 for certain requests during the update.
Posted 10 months ago. May 11, 2018 - 22:35 PDT
We've made a temporary mitigation while we work on repairing the API
Posted 10 months ago. May 11, 2018 - 20:43 PDT
We are continuing to investigate this issue.
Posted 10 months ago. May 11, 2018 - 20:10 PDT
We are currently investigating an issue with the PWS API. We are observing push failures.
Posted 10 months ago. May 11, 2018 - 20:08 PDT
This incident affected: Pivotal Web Services API.