Graph, alert and dashboard irregularities for some US customers
Incident Report for DataSet
Resolved
A majority of the time series have been re-built with Dashboard performance having been restored for most customers and Alert evaluation success rates at pre-incident levels. We're marking the issue resolved and expect 100% return to pre-incident levels in the next 24 hours.
Posted Aug 30, 2022 - 14:16 UTC
Monitoring
The summary service has been restarted and we are beginning to rebuild time series data for all alerts. Once the time series for an alert has been recreated, we will begin evaluating it again. Alerts with longer look back periods will be the last to successfully be evaluated. Loading a dashboard will trigger the time series on that dashboard to be recreated, so that dashboard will initially load more slowly at first. We are continuing to monitor this process.
Posted Aug 29, 2022 - 23:24 UTC
Update
At 15:00 PDT (22:00 Universal Coordinated Time) we will be restarting our summary service, which powers our alerts and speeds up dashboard rendering. The summary service will be unavailable for approximately 10 minutes, after which it will begin rebuilding time series data for all alerts. Each alert will not be evaluated until its time series is rebuilt, so alerts with longer look back periods will be the last to successfully be evaluated. Loading a dashboard will trigger the time series on that dashboard to be recreated, so that dashboard will initially load more slowly at first.
Posted Aug 29, 2022 - 21:38 UTC
Identified
We have identified the issue in our timeseries database and are working on remediation.
Posted Aug 29, 2022 - 17:40 UTC
Investigating
Some customers in the US cluster are reporting unexpected behavior including false alarms, inconsistent graphs, and missing search results. We are currently investigating the issue.
Posted Aug 29, 2022 - 13:47 UTC
This incident affected: Main Site.