On July 22 2024 customers on Kustomer’s Prod1 environment experienced elevated latency and error rates on multiple features of the Kustomer product.
One of Kustomer’s primary databases experienced a hardware failure, resulting in a switchover to a secondary database. Requests made during the 90 second period between the failure and successful switchover were unsuccessful. The service responsible for rendering the Kustomer timeline failed to immediately switch over to the new primary node for an additional 8 minutes, resulting in additional time till the customer timelines were usable
Jul 22, 2024
Improved internal monitoring for database failures - Our team was alerted of the failures and began investigating immediately, but did not have immediate visibility into the cause of the failures. We’ve improved our database monitoring to allow for quicker response times in the case of a future hardware failure like this.
Perform additional failover testing - We intend to perform additional testing of failover scenarios in non-production environments to discover additional opportunities to optimize this process and reduce disruption to customers.