Searches and Reporting are currently not updating for users on Prod1 of the platform.

Incident Report for Kustomer

Postmortem

Summary

On January 17th, 2022 at approximately 12:37 PM EST, the Kustomer team identified that customers on the PROD 1 pod were not receiving up to date search results. Upon investigating, the engineers discovered that a shard containing data for some orgs on prod1 was dangerously large in size which resulted in the cluster entering an unhealthy state. Accordingly, engineers needed to make adjustments so that our system could accommodate more data on the affected shard.

Root Cause

A shard containing data for some orgs on PROD 1 was discovered to be dangerously large in size which resulted in the cluster entering an unhealthy state. Once the problem was identified, the engineering team provisioned a new cluster and restored the data for the affected orgs to this new cluster. A subsequent script was run to recover data that was not indexed over the incident.

Timeline

01/17 12:37 PM EST - Multiple alarms signaled issues with the PROD 1 elasticsearch cluster and reports from customers came in indicating that some customers on prod1 were not receiving up to date search results.

01/17 1:33 PM EST - Engineers identified the problematic shard on the cluster and began to work on a solution to restore the cluster’s health

01/17 4:27 PM EST - A solution was deployed to production after previous testing in lower environments. Search results began to have their data caught up to the present.

01/18 1:06 AM EST - All affected customers were receiving new data in search results.

01/19 1:23 AM EST - All affected orgs had search results caught up to reflect system changes that occurred during the incident.

Lessons/Improvements

The overall health of the search cluster is monitored carefully, but this incident exposed weakness in a specific monitor which led to a prolonged recovery time. Accordingly, the engineering team is responding by:

Adding advanced monitoring for shard sizes.
Enabling more detailed auditing on search clusters to identify the root cause of an issue faster.
Speeding up the process of restoring data to a new cluster to significantly reduce recovery time.

Posted Jan 25, 2022 - 10:31 EST

Resolved

This issue is now resolved and the searches are now up to date.

Please reach out to support at support@kustomer.com if you experience anything out of the ordinary with Searches and Reporting.

Posted Jan 18, 2022 - 01:09 EST

Update

Our team is still monitoring the recovery. We will continue to provide updates until the issue is fully resolved.

Please expect further updates and reach out to our Support team with any additional questions by going to https://help.kustomer.com and clicking "Contact Support" at the top of the page.

Posted Jan 17, 2022 - 22:49 EST

Update

Our team is currently releasing a resolution to this issue on PROD1. You should start seeing searches and reporting start to pull in new data. Please expect further updates within the next 30 minutes and reach out to our Support team with any additional questions by going to https://help.kustomer.com and clicking "Contact Support" at the top of the page.

Posted Jan 17, 2022 - 21:18 EST

Update

Our team is currently testing a resolution to this issue. Please expect further updates within the next 30 minutes and reach out to our Support team with any additional questions by going to https://help.kustomer.com and clicking "Contact Support" at the top of the page.

If you are using queues and routing, conversations will continue to be routed to your agents and they will be able to respond to conversations and update statuses as normal.

If you are not using Queues and Routing, you can enable routing and add the "Default Queue" to your teams and have members log into an "Available" state and they will receive conversations automatically. Here is our documentation on setting up queues and routing:https://help.kustomer.com/en_us/queue-rules-HyLuJ1KHX and https://help.kustomer.com/en_us/configuring-queues-BkCVqbJ9L. Please reach out to the support team if you need help setting this up!

During this time you can expect any existing items in searches to be cached with items that may not apply to the search criteria. The agent inbox is powered through a search and while we see no impacts with queues & routing at this time, agents may report that they are not able to view new conversations assigned to them. Reporting is also affected while the search functionality is not returning the most up-to-date results. All new messages are still coming into the platform and being sent out as well via all channels.

Posted Jan 17, 2022 - 20:14 EST

Update

Posted Jan 17, 2022 - 19:29 EST

Update

Kustomer has identified an event affecting Searches that is preventing updated results from being returned in all searches on the platform for PROD1 users. Our team is currently testing a resolution. Please expect further updates within the next 30 minutes and reach out to our Support team with any additional questions by going to https://help.kustomer.com/ and clicking "Contact Support" at the top of the page.

During this time you can expect any existing items in searches to be cached with items that may not apply to the search criteria. The agent inbox is powered through a search and while we see no impacts with queues & routing at this time, agents may report that they are not able to view new conversations assigned to them. Reporting is also affected while the search functionality is not returning the most up-to-date results. All new messages are still coming into the platform and being sent out as well via all channels.

Posted Jan 17, 2022 - 18:53 EST

Update

Posted Jan 17, 2022 - 18:13 EST

Update

Posted Jan 17, 2022 - 17:40 EST

Update

Posted Jan 17, 2022 - 17:06 EST

Update

Posted Jan 17, 2022 - 16:33 EST

Update

Posted Jan 17, 2022 - 15:58 EST

Update

Kustomer has identified an event affecting Searches that is preventing updated results from being returned in all searches on the platform for PROD1 users. Our team is currently working to implement a resolution. Please expect further updates within 30 minutes. Please reach out to our Support team with any additional questions. You can reach us by going to https://help.kustomer.com/ and clicking "Contact Support" at the top of the page.

During this time you can expect any existing items in searches to be cached with items that may not apply to the search criteria. The agent inbox is powered through a search and while we see no impacts with queues & routing at this time, agents may report that they are not able to view new conversations assigned to them. Reporting is also affected while the search functionality is not returning the most up-to-date results. All new messages are still coming into the platform and being sent out as well via all channels.

Posted Jan 17, 2022 - 15:27 EST

Identified

Kustomer has identified an event affecting Searches that is preventing updated results from being returned in searches on the platform for PROD1 users. Our team is currently working to implement a resolution. Please expect further updates within 30 minutes. Please reach out to our Support team with any additional questions. You can reach us by going to https://help.kustomer.com/ and clicking "Contact Support" at the top of the page.

Posted Jan 17, 2022 - 14:50 EST

Update

The Kustomer team is still investigating the root cause of the issue preventing Searches from returning the latest information on PROD1. During this time you can expect any existing items in searches to be cached with items that do not apply to the search criteria. The agent inbox is powered through a search and while we see no impacts with queues & routing at this time, agents may report that they are not able to view new conversations assigned to them.

Please reach out to our Support team with any additional questions. You can reach us by going to https://help.kustomer.com/ and clicking "Contact Support" at the top of the page.

Posted Jan 17, 2022 - 14:29 EST

Update

Posted Jan 17, 2022 - 13:54 EST

Investigating

Correction: Searches and Reporting were available but the searching was taking place on stale data within the search index. During this time clients on PROD1 would have experienced searches and reports not populating with the latest data in their searches and reports.

Searches and Reporting are currently unavailable for most users of the platform. We are working to resolve the issue as quickly as possible. During this time you may experience searches and reports not populating as expected.

Please reach out to our Support team with any additional questions. You can reach us by going to https://help.kustomer.com/ and clicking "Contact Support" at the top of the page.

Posted Jan 17, 2022 - 13:13 EST

This incident affected: Prod1 (US) (Search).