Dump Your Cloud Before Your Cloud Dumps You

Hey, You, Get off of My Cloud!

A CTO’s Guide to the Unanticipated Risk

10 min readMay 22, 2021

Alan Nekhom

This paper, from Data Privacy Journal, underwritten by the Privacy Co-op Media Staff, is re-issued here for free viewing with the author’s permission. Site of record can be found here: https://doi.org/10.52785/A7S3D2G4. While linking to this article is permitted, please do not distribute or copy without express written permission of author. Formal citations can be copied from https://www.academia.edu/48765398, © 2021.

Disaster Recovery Planning:

The Unanticipated Risk

As CTOs we are tasked with keeping our company’s business safe, protecting confidential data, and maintaining service availability for our coworkers, business partners, and clients. We plan for power outages, civil unrest, bad weather, and even for that theoretical asteroid strike on our data center.

But have we ever imagined that the cloud provider we CONTRACTED for services might just pull our plug, with no meaningful notice, just because one of our customers voices controversial beliefs? Perhaps your competitor could even invest in your cloud provider, and pressure them to de-platform your business.

What would happen if a major cloud provider pulled the plug on a major telecom firm because their clients were talking about something “objectionable” on, or even posting something “questionable” from the phone?

What is your firm’s liability if that should happen to YOU?? Some legal experts have thrown around terms such as tortious interference and the like. I’ll be honest. I am not a lawyer, and I don’t play one on TV, so I will just share our solution. Pros, cons, and alternatives based on business size.

I am the CTO of the Privacy Co-op, an international cooperative non-profit association, focused on creating and nurturing a global community of consent. Our mission is to free our B2B partners from the risk of onerous privacy fines, while enabling our B2C members to manage their privacy and provide their Affirmative Express Consent to your company, for the licensed responsible use of their data from ALL devices, an ANY jurisdiction.

When Amazon AWS shut down parler.com, and Discord shut down Wall Street Bets in February 2021, it was like lightning striking in our own home! Rather than continuing our long-term plan to increase our partnership with AWS advanced serverless platforms, this new reality brought our efforts to a screeching halt and caused a swift about face. Pausing all further development, we immediately pulled our architectural team together to assess the risk, current contingency plans, and to develop new approaches to risk mitigation. As AWS was grooming us as a select tier partner for their marketplace, redirecting resources was a challenging decision for our executive team.

Here are the services we use today in the cloud:

1) Virtual and Dedicated Servers

present the lowest risk and cost for conversion. The primary exposure within an architecture was determined to be the use of cloud vendor provided OS images. By eliminating these images, and reconfiguring using commercially available images, you can make your servers entirely cloud vendor agnostic. The pitfalls to this approach are centered on any special services you leverage from cloud providers which will be addressed later.

The true benefit from being in the cloud comes from “automagic” scaling and load balancing. This allows your infrastructure to scale up and down with your usage demand, which saves the substantial investment of standing up hardware in your own data center to be ready for potential maximum peak demand, a costly exercise at best. This also implies that abandoning the cloud and returning to the days of on-premises data centers is just not practical. So, we ask ourselves, “Does it make sense to have anything on-prem?”. The answer is yes! Because internal financial data, management, and reporting software, is critical to all operations, has a very predictable growth pattern, and demand profile (think “end of month” operations) we can safeguard our core business data by keeping it under our own control. For businesses ranging from small mom and pop shops to even large enterprises, we recommend building or retaining enough on premise or collocated servers to handle your financial and core business tracking needs. Please remember to include additional capacity for auto-failover, and appropriate backups. Finally, consider whether you require geographic redundancy to ensure adequate disaster recovery. (Honest boss! We never dreamed a meteor would hit our data center!)

Having safeguarded our core operational data, how do we ensure suitable, scalable and redundant resources to meet fluctuating system and service demand?

Simple. Use the Cloud! Of course, I mean multi-cloud. Spread your workload across multiple vendors. Several vendors, including IBM, Cisco, AWS, Azure, and of course, Google provide management solutions for multi-cloud. A brief overview of these technologies uncovers more than 15 vendors providing software to help make managing multi-cloud (pardon the pun) a breeze.

Finally consider that in a multi-cloud scenario you can seize the opportunity to analyze usage against specific vendor service costs to optimally place workloads into the lowest cost provider.

2) Database Services

are possibly the easiest to port to alternate vendors, colo, or on-prem. Replication and database dumps provide low risk porting for both structured and unstructured databases, like MongoDB. Database systems can be set up cross cloud, as well as replicated to your colo or on-prem systems. Bear in mind, if you disperse your databases you will need to partition the data in such a way that your compute is located with the same vendor. Otherwise, latency will drastically impact the performance.

Of course, easy comes with a price. Be careful, that when you architect your new solution, that you think through the cost of provider-to-provider bandwidth. You will have an initial, one time cost for porting, but there are also ongoing costs to keep various cross provider databases synchronized.

3) Serverless Platforms

represent the highest risk factor, should you have an issue with your current cloud provider. Serverless architectures are highly attractive for microservices. In fact, we use a variety of serverless functionality to provide extremely high speed, consent responses for our affiliate companies. We also make use of serverless functions to provide highly parallel processing for Natural Language Processing, and Machine Learning tasks.

So, as I seem to sing the praises of serverless functions, what are the downsides?

· Serverless functionality is the most likely to leverage vendor specific architectures and programming libraries making these capabilities the most difficult to unexpectedly and quickly migrate to alternate vendors.

· Serverless functions are usually time limited to short bursts of processing, so think about a higher up-front investment for your development community to thoughtfully plan stateless behavior.

· Cost per CPU hour is higher when compared to reserved instances, so if you can saturate your systems, use reserved or “burst” availability to save money over serverless approaches.

At the Privacy Co-op, we feel our use model does require some serverless functionality. Our solution is to compartmentalize vendor specific functions and create a “wrapper” around them. This allows any vendor-specific function set to be wrapped in a common interface, making our code portable across multiple vendor platforms.

4) Data Lakes

frequently exceed multiple petabytes. The sheer enormity of data size implies that a return to collocation or on-premises is just not practical. While every company will need to evaluate their own risk of data loss, versus the cost to port or even duplicate data, there are several key questions which may aid your evaluation:

1. Can the data, or portions of it, be rebuilt?
If so, are those portions low risk? Just one time compute costs for recovery?
2. Do you perform full or changes-only backups?
This impacts the overall size, hence the cost of storing and moving your data.
3. Which Big Data systems do you use?
Underlying tool and data structures may significantly impact your ability to use multi-cloud.
4. Can you tolerate the inherent latency if you distribute data across vendor networks?
5. What are your data retention policies?

We know you’ve been wanting to do “housekeeping” on your data but have been putting it off. If you plan to port, or replicate data across vendors, you may be able to offset some of the cleanup cost by reducing your costs of data porting. As senior management, I have often felt we are in the sanitation business, so you may wish to consider: “Data Janitor, clean up on aisle 3!” Normalizing, organizing, and cleaning your data can provide significant compaction, and increase the effectiveness of your analytics.

While all corporations have data retention policies, the very nature of machine learning demands an exceptional amount of both current, as well as historical data to be effective. Since Machine Learning (M/L) relies heavily on mathematical and statistical abstraction, more data is better! (For the M/L, not necessarily for your budget!)

When considering a hybrid, or multi-cloud model for your data lakes, be sure to evaluate your systems, and how they will respond to the increased latency caused by partitioning across networks. Our current solution is based on Hadoop, which provides the benefit of HDFS, a redundant, distributed file system. Hadoop’s base technology handles load distribution for Map-Reduce, based on the physical location of the data, allowing us to be far more resilient to node outages, as well as dramatically reducing the impact of network delays. Seek the advice of your network architects, as well as your data engineers to tailor a solution for your business needs.

5) Queue and Messaging

services are extremely useful for inter-process communications, as well as communication between your vendor’s infrastructure and your company’s applications. Whether it is Azure Queue Service, Amazon Simple Queue Service, or IBM’s MQ messaging, all are unique to each vendor.

We use SQS to simplify handling of invoicing, usage, and authentication data for marketplace clients, and offer PubSub services allowing:

· Affiliate companies to subscribe to opt-in data updates.

· Members to subscribe to their data usage notifications, including capped published data usage.

· Members may also publish additional contact information and preferences allowing affiliates to use data for secondary purposes. Just think of the benefits, when potential clients tell you how they want to hear from you, provide contact frequency, alternative methods for contacting them, and even publish their interests for you to reach out to them about your products and services!

Because queuing and messaging services vary significantly between suppliers, they must be wrapped by generic functions to ensure your applications remain vendor agnostic.

6) Storage Services

come in a variety of flavors. Note, flavor selections aka performance, can significantly impact your cost! AWS Elastic Block Storage, S3, Elastic Files Storage, Azure BLOB, core, and Table & File, all bear a striking similarity to Network Attached Storage in your own data centers. Some configuration is required, but this will map easily between cloud vendors. Your cloud architects can easily choose the performance and cost trade-offs to optimize your flavor selections.

Backups and Tigers and Bears — Oh My!

While this may seem to be the easiest element to manage for multi-cloud implementation, bear in mind that backups need special consideration. If you are vendor locked, and someone trips over your plug, will the IRS call for an audit on the day you get de-platformed?

Give due consideration to hybridization and distribution of backups, especially to mission critical backups being duplicated on-premises.

7) General Infrastructure

includes all the individual network niceties we have come to love about the cloud. Private Clouds, Load Balancing, Geo Redundancy, Autoscaling, Proxies, Encryption, Access Control, Identity Management, and on, and on, and on. Suffice to say, scalability, redundancy, and geographic distribution are the features which keep us in the cloud. Leveraging someone else’s investment in infrastructure to be able to handle un-forecasted loads, without massive and unplanned investment are the things of dreams.

When we seriously consider alternate architectures, it rapidly becomes clear why returning to on-premises or colocation is no longer practical. When using a hybrid or multi-cloud model, allow adequate budget to support multiple initial configuration efforts, as well as increased maintenance costs, but be equally prepared to take advantage of discounts brokered from competitive offers for individual components.

How do we best protect our company and clients when considering the following factors?

· Risk mitigation cost vs. estimated cost of broad area failures

· Historical usage (by region, client type, patterns over time, etc.)

· Steady-state usage vs. peak burst usage

· Future plans for: growth or decline in usage, changing cloud provider, changing instance, families, moving regions, shifting to another computer-model (like containers, serverless architecture, etc.)

· Balance between savings over time and cash payments up front

· Level of flexibility required

If you are not using a hybrid cloud model, or multiple cloud providers, you should consider this as your next evolution. Most enterprise sized companies (76%) are either using or moving to some combination of traditional mixed with modern (microservices/containerized) architectures. The combination model is quite well suited to multi-cloud because you can distribute your required architectures to the lowest cost provider for each one.

So, to wrap up with some suggestions . . . Yes! Finally!

Avoid Vendor lock-in

Use containers, wrapper your vendor specific functionality, keep your data portable and distributed. Consider allowing potential lock-in for specialty workloads when it can substantially reduce cost.

Build Resilience

All cloud providers suffer outages, running the risk of a mission-critical applications becoming unavailable. A multi-cloud strategy will bring deployment and maintenance pain, but managed with care, it improves security, failover, and disaster recovery.

Optimize Performance by Making Tradeoffs

Your organization can improve performance metrics, such as jitter, latency, and packet loss by choosing cloud providers with data centers geographically near your clients, as performance is inversely proportional to the number of network hops between servers.

Ensure Compliance

Data governance regulations, like the EU’s GDPR, CCPA, or Brazil’s Lei Geral de Proteção de Dados Pessoais (LGPD), often require customer data to be held within geo-political boundaries. This can be resolved with a multi-cloud approach. Not for Profit Registered Agents, like the Privacy Co-op, can help your organization secure Affirmative Express Consent with cross jurisdictional opt-ins for your data use, eliminating the risk of governmental fines, as well as providing compliance with worldwide privacy laws, across all uses of data.