InfluxData apologizes for deleting cloud regions without performing 'scream test'

Despite attempts to inform customers, some say they never got the memo

InfluxData has lost the data of customers using its services in Australia while users in Belgium are struggling to figure out if they can restore the last 100 days.

The vendor behind the InfluxDB time series DBMS has now apologized to customers caught out when it discontinued InfluxDB Cloud service in the two regions: AWS Sydney and GCP Belgium.

According to a blog from CTO Paul Dix, the company notified customers of its decision to discontinue these services for economic reason on February 23, April 6, and May 15. It also contacted customers for whom it had other details and updated the homepage of the UI for InfluxDB Cloud 2 in those regions with a notice that the service was going to be shut down on June 30.

"In hindsight, our assumption that the emails, sales outreach, and web notifications would be sufficient to ensure all users were aware of and acted on the notifications was overly optimistic," he said.

"Our engineering team is looking into whether they can restore the last 100 days of data for GCP Belgium. It appears at this time that for AWS Sydney users, the data is no longer available."

Users took to online forums to vent their frustration at the move.

On the vendor's community forum, one said: "We were never informed about this. We have a running use case and are not in the habit of checking the documentation every week just in case our service gets cancelled without prior warning. InfluxData should also have seen that instances in these regions still had read and write access and informed all affected customers. This is highly unprofessional."

On Hacker News, a user opined: "Your number one expectation as a cloud database provider is to keep data safe and recoverable." Another said: "This is pretty much corporate suicide. I really don't understand what they are trying to achieve with this and their attitude in this thread is baffling."

CTO Dix, who responded to comments on the thread, initially appeared to be defensive. "I realize that it's not ideal that we've shut down this system, but we made our best efforts to notify affected users and give them options to move over to other regions," he said.

He later seemed more contrite. "It's a terrible situation and we failed on many levels on this one. We will improve our process from here and conduct a more full postmortem."

A number of comments on both threads pointed to InfluxData failing to carry out a "scream test," where a service provider turns off access to a service, but does not kill the service itself. When those who are locked out "scream" via email or the phone, the company can tell them the services will be turned off for good at a later date, giving customers time to back up their data and migrate applications.

In the blog, Dix promised to do things differently in the future. He said the company would create a separate category of "Service Notification" emails that customers could not opt out of. He promised to improve email processes and clarity. The company would redouble its efforts to contact users who have not reduced their reads or writes within the 30 or 45 days before the end-of-life date for the region.

He also said the company would use a scream test and implement a 30-day data retention grace period, and publish a banner at the top of the status.influxdata.com page as soon as the initial notifications went out.

While it may be reassuring that InfluxData hopes to improve its customer communication next time it turns off a service, it will be little consolation to those who have lost data.

The Register has asked InfluxData to comment. ®

More about

More about

More about

TIP US OFF

Send us news


Other stories you might like