At 2018-11-06 06:48 UTC we were alerted to CF Push failures. We also observed the number of routes on the platform dropping more than 25% compared to normal during the previous 10 minutes at 06:38 UTC. The platform mostly recovered without intervention by approximately 07:01 UTC, however. there were several intermittent CF Push failures until approximately 07:56 UTC.
All PWS customers were vulnerable to the
cf push failure scenario during this period and many apps would likely have experienced sporadic 404 errors.
We determined that a backend API node became unreachable by backend application hosts and have a bug to determine how the backend application runners can failover to a healthy API node more quickly.