cloudification: emergent problems

Photo by Stephan Seeber on Pexels.com
Photo by Stephan Seeber on Pexels.com
 

The great philosopher of science, Sir Karl Popper said that all problems are either like clock or clouds. The former can the taken apart and its sub problems can be addressed independently. The latter is more complicated, dynamic; and their sub parts are inter-related and inter-locked. We need solutions for clouds problem as a whole and not solely for its parts. In the field of computer science, many problems are clock based, we are trained to break complex problems apart, model them and find good algorithms and/or heuristics to get solutions. However, cloudification may be an emergent one because a cloud solution usually contains many different subcomponents and they are inter-related. Any changes in one of these components affect the other ones.

Hosting solutions in the cloud (not the clouds that Sir Popper was referring to) may be an emergent problem; and cloud vendors are providing tools to alleviate the situations. In this article, we share some of the challenges, solutions and talking points.

It is evident that the contributions of Hashicorp to Terraform is driving many successes of cloud deployments. Almost all the software vendors are embracing Terraform by adding and maintaining Terraform modules. These modules can be tested independently and reused so that automated infrastructure deployments can be developed with scripts and maintained. Human errors can be eliminated; and scripts can be reviewed and versioned (git is usually used). We often hear about Infrastructure as Code (IaC) and it is the pre-requisite for cloud infrastructure deployments for many enterprises.

It is nice to automate setting up and tearing down cloud infrastructures. This naturally leads us to this question: “Who is/are authorized to perform these tasks?”. Authorization is needed otherwise someone who does not have the right permissions can

  • bring up a huge deployment and the organization has to pay for it
  • bring out a deployment abruptly and cause chaos

Service principal (or technical user) and an entity can be given the permissions to act accordingly. This requires careful thinking and proper policy enforcements. The authorization model shall be different for different scenarios such as Production, Canary, Development, QA, etc. It is obvious that this can be a complicated thing to deal with if we do not have proper procedures in place. Fortunately, in many large enterprises, authorization models are designed and implemented in a comprehensive manner. The security experts grant the correct permissions which are provided by cloud vendor to a group of users/identities. This goes beyond granting permissions for infrastructure deployments; it includes granting identities (which can be users, applications, etc.) to perform other tasks such as approving git‘s pull requests; creating new pipelines; etc. Then the next thing to consider would be Authentication and Identity Management. For Microsoft, Azure Active Directory is usually the Identity Service Provider (IdP) and there may be an Authentication Layer such as OAuth 2.0 and SAML2 on top of it.

From these few points (we have not touched on many other compositions in the cloud), it is obvious that different components are inter-related. Infrastructure is related to Authorization which depends on Authentication and Identity Management. Any changes in authentication and authorization may affect infrastructure. On the other hand, changes in infrastructure may need change in authorization.

It is crucial for component owners to understand the inter-dependencies and requirements as we need to constantly adapt to the dynamic environment. Therefore, new cloud deployment needs time to stabilize as administrators continuously monitor the cloud activities. The good news is that all major cloud vendors have monitoring tools and dashboards for tracking cloud related activities; and we can create triggers for abnormalities.

We like to think that we can bring in different components into our cloud infrastructure; and they would work together as a whole. In fact, we need to add more layers for orchestration, authorization (for component to component communication), notifications, triggers, etc. Moreover, there are often more that one way to do these; and we need to weigh the advantages and disadvantages of these solutions.

Transitioning to the cloud can be a mountainous task if we do not know the procedures and available tools. I suggest that you reach out to your cloud vendors for assistance; and they can provide you with more insights and best practices. In my opinion, deploying a cloud solution is not the most difficult task. People are struggling with debugging and handling logistic matters (such as renewing .X509 certificate). Many components are inter-related; failure in one component may have a domino effect. It is important to carry out what-if analysis, simulated failures, and testing.


Comments

Popular posts from this blog

OpenAI: Functions Feature in 2023-07-01-preview API version

Storing embedding in Azure Database for PostgreSQL

Happy New Year, 2024 from DALL-E