From roughly 2018/04/11 18:13 UTC until 20:40 UTC transaction time on the Cloud Controller (CC) was elevated. This caused timeouts on some requests to the CC API. At 20:40, the CC issue was resolved, but the next two hours (until 22:15), Apps Manager was stopped several times in order to make changes to avoid placing undue pressure on the Cloud Controller.
A change to Apps Manager introduced long-running queries to the Cloud Controller database (CCDB). These queries caused an increase in load on the CCDB and cause transaction time on the Cloud Controller to increase substantially, which in turn caused requests to the Cloud Controller API (CAPI) to timeout.
Apps Manager disabled the updated feature. CloudOps and CAPI killed the remaining long-running queries on the Cloud Controller database. The load on the CC_DB began dropping at 20:39 UTC and had returned to normal by 20:44 UTC. Cloud Controller transaction time had returned to normal by 20:40 UTC.
For the next 90 minutes, Apps Manager experimented with resolving the issue, ultimately disabling events proxying and turning polling off.
At 20:02 UTC, a customer opened a ticket because they were unable to log into Apps Manager