GitOps: Improving Your Developer Experience
To better developer experience, you'll need GitOps. It's the next big thing.
Effective automation lies at the core of success in platform engineering. In this post, we'll delve into the steps for implementing automation successfully using GitOps, and share insights from our experience in significantly improving the developer experience at a large.
GitOps empowers teams to deliver value by enhancing visibility into their infrastructure, optimizing key DevOps Research and Assessment (DORA) metrics, and establishing a repeatable pattern for automating tasks. Before we explore its benefits, let's first understand the essence of GitOps.
Why Go for GitOps?
GitOps sets out to store the state of your infrastructure in Git. All changes to the infrastructure occur through the Git repository, ensuring clarity and creating an audit trail for each and every change made. Automation in a GitOps environment involves making infrastructure changes through automated Git commits.
Typically, GitOps is paired with a deployment tool that translates the Git repository content into automated deployments.
By maintaining infrastructure configuration in Git and executing changes through Git, even newcomers or team members from different departments can quickly grasp the infrastructure by examining the repository. This practice also facilitates staff engineers in gaining a clearer understanding of the high-level system architecture.
DORA Metrics Based on GitOps
GitOps proves instrumental in elevating key DORA metrics outlined in the DevOps Research and Assessment report:
1. Deployment Frequency: How frequently an organization successfully releases to production.
2. Lead Time for Changes: The time it takes for a commit to reach production.
3. Change Failure Rate: The percentage of deployments causing failures in production.
4. Time to Restore Service: The duration of production failure recovery.
These metrics are pivotal for a positive developer experience. GitOps contributes to their improvement significantly. For instance, when managing a Kubernetes cluster through GitOps, in the event of a cluster failure, restoring service involves pointing a rebuilt or new cluster to the Git repository. This streamlined process enhances the Time to Restore Service metric.
In contrast, if cluster administration relies on traditional CI/CD pipelines that directly create a namespace on the cluster or install a Helm chart, restoring service becomes a more complex task.
Without GitOps, multiple pipelines may need to be triggered in the correct order, resulting in substantial delays, especially in enterprise-scale systems with numerous namespaces, role bindings, and policies.
In discussions about GitOps, the focus often centers on specific tools; however, it's crucial to note that the only essential tool is Git itself. At Contino, we prioritize selecting the most suitable tool for a given task rather than adhering to a particular vendor. As a result, we emphasize discussing the fundamental principles that lead to success in GitOps rather than advocating for a specific tool.
Let's delve into the key principles that form the foundation of GitOps success:
1. Developer Focus: Concentrate on the goal of automation and adopt a pragmatic approach to streamline tasks. Map out and optimize workflows to enhance the developer experience.
2. Use Git for Everything: Embrace Git as the central hub for all changes. Whether automated or manual, utilizing Git ensures auditability, version control, and a clear understanding of the system's configuration.
3. Design Shared Responsibility: Clearly define the responsibilities of both development and platform teams. Establishing this shared responsibility model fosters clarity and promotes collaboration.
4. Focus on DevOps Fundamentals: Implement a well-defined branching strategy, such as trunk-based development. Take ownership and collaborate closely with teams consuming your infrastructure.
Enhancing The Experience for Developers
Platform engineering strives to enhance the developer’s experience by bridging the gaps between Development and Operations. This involves implementing an automated workflow that simplifies how developers interact with infrastructure, ultimately reducing cognitive load. Overcoming challenges in large enterprises is pivotal for improving software systems.
At the core of DevOps is the collaboration between developers and operations. When building a platform, it's crucial to avoid reintroducing silos from the pre-DevOps era. GitOps plays a vital role in breaking down these silos by increasing infrastructure visibility and automating processes.
While visibility alone doesn't guarantee collaboration, it sets the stage by creating a system visible to both developers and operators.
GitOps in Practice: A Case Study
In a financial services client project, visibility emerged as a significant challenge. Multiple clusters lacked insight into their current or desired states. To address this, we implemented ArgoCD for Continuous Deployment and developed GitOps automation for cluster administration.
Focusing on well-defined tasks, such as creating namespaces and network policies, allowed us to deliver value promptly. This approach enabled us to automate environments for developers, providing flexibility for changes as requirements evolved.
The initial development phase garnered support and feedback, empowering the team to upskill and contribute to the system. At Contino, our approach involves starting with a small task, and simultaneously upskilling the team through a dual delivery and upskilling model, an integral part of our momentum framework.
As the project evolved, integration with the broader system became necessary. Integrating a GitOps automated system with a slower-moving system poses several considerations. The pipeline serves as the interface, ideally remaining consistent. Versioning, using semantic versioning for triggers and variables, ensures stability. The Git repository containing records of subscriptions created by automation aligns with the mainline/trunk branch.
This comprehensive approach ensures the seamless integration of GitOps into the development and operational workflows, showcasing the practicalities and considerations for success.
Tools for Continuous Deployment
While GitOps primarily centers on principles rather than specific tools, acknowledging projects that complement GitOps is valuable. GitOps originated from Kubernetes, offering an approach for automated changes tracked through version control.
Continuous Deployment systems like ArgoCD and Flux play a pivotal role in GitOps by automatically deploying Git repository contents. Their objective is to align the system's state with the desired state stored in the Git repository. Although GitOps is widely associated with Kubernetes, it extends beyond.
Crossplane is a versatile Continuous Deployment tool deploying infrastructure to Azure, AWS, and GCP. Leveraging Crossplane allows the application of similar techniques for cloud-native applications and infrastructure.
GitOps can also be implemented using Terraform, notably with the Flux Terraform Controller. Detailed information on these tools and their utilization on internal developer platforms is available in our comprehensive documentation.
Complexity and Problems
Configuring Continuous Deployment tools can prove unexpectedly intricate. This complexity arises from transforming a point-in-time manifest, such as a YAML file, and ensuring the continuous deployment tool aligns the system with that state.
Our experiences revealed challenges in scenarios like creating Kubernetes clusters from scratch. The Git repository encompassed everything necessary for setting up the cluster, including installations for Istio, Cert-Manager, and other required projects.
Security demands routing all traffic through Istio, which includes traffic for ArgoCD (the CD tool). Installing ArgoCD before Istio posed challenges, requiring a subsequent Istio installation and an ArgoCD restart to activate the Istio sidecar.
Testing in a Nutshell
Highlighting the importance of a high deployment frequency, sustaining it necessitates robust automated testing and validation mechanisms. Ensuring confidence in system changes requires early testing, aligning with principles like shift left. Initial testing should commence even before code commits, as demonstrated in our use of Kustomize.
We implemented a Kustomize build as a pre-commit hook for any automation script changes. This approach prevented committing changes that broke the Kustomize build, providing developers with immediate feedback during the commit stage rather than discovering issues later when creating a pull request.
Two key tests are crucial:
1. A test to validate output files against expected output, using defined input parameters.
2. A test to deploy changes to a cluster, ensuring correct deployment. This is particularly important when altering YAML templates to guarantee not only passing linting but also practical functionality.
Shifting left also applies to the validation of Kubernetes manifests. A KubeCon 2023 talk highlighted how The New York Times achieved this using Kubeconform, enhancing developer productivity by detecting errors earlier in the development cycle.
In Between Security and Reliability
GitOps involves tradeoffs between reliability and security. Implicit assumptions include the elevated privilege of the pipeline or compute writing to the repository, capable of writing to the mainline branch. To mitigate risks, safeguards like branch protection or GitHub deployments should be in place.
Properly generating and storing secrets ensures restricted access, while reducing the blast radius involves separating automation into different repositories or utilizing different pipelines with path-scoped roles, especially in mono repos.
Choosing between access controls and flexibility defines the primary tradeoff in security and reliability. Path-scoped roles might be necessary, even without a mono repo, differentiating roles for production and development/integration environments. While not universally applicable, CODEOWNERS branch protection can achieve this in specific paths.
Utilizing CD Tooling
Highlighting the dependency between Continuous Deployment (CD) tools and GitOps, it's essential to acknowledge design tradeoffs when placing infrastructure state in code and employing CD tools. CD tools like ArgoCD and Flux might introduce initial complexity due to learning curves and mindset differences. Teams must consider version control, branching strategy, and tooling configuration.
The initial investment pays off with simplified operations, acknowledging that complexity must exist somewhere and handling it upfront is preferable to burdening operations teams in the future. This tradeoff between initial and sustained velocity underscores the importance of the initial investment, enabling rapid movement and high deployment frequency with adequate automated testing and validation.
A commonly overlooked scenario is managing the deletion of resources. With FinOps gaining prominence for cost optimization, attention is shifting to reducing cloud costs by automatically deleting resources. Deletion is a litmus test for a system's adaptability to change.
CD tools offer configuration options for automatic or manual deletion. Manual deletion requires users to review the items to be deleted and press a sync button for removal. Deciding on deletion mechanisms aligns with risk appetite.
While automatic deletion can be secure with appropriate guardrails, it demands time for development and should be a product-driven choice.
GitOps Teams' Shared Responsibility Model
The concept of DevOps originated from the idea of collaboration between Developers and Operations personnel. The initial discussions by John Allspaw and Paul Hammond underscored the importance of breaking down silos across disciplines to collaboratively build products.
In defining an ideal model and delineating responsibilities for teams adopting GitOps, we advocate for a shared approach across multiple teams, each with GitOps expertise distributed among them.
Our schematic representation features three core teams: a Cloud Platform Team, a Kubernetes Platform Team, and a Development Team. While ArgoCD management falls under the purview of the Platform Team, the responsibility for GitOps is collectively shared among all three teams.
Here's a summary of how each team leverages GitOps:
1. Azure Platform Team: Manages Subscription Vending (commit adds Terraform Config).
2. Kubernetes Platform Team: Oversees Development Team Workspaces (commit adds Namespace, ArgoCD Application, Role Bindings, ArgoCD Application).
3. Development Team: Handles Application deployment (on release, updates deployment reference, and adds expected container signature).
In this shared responsibility model, GitOps is a collaborative effort, and although specific aspects may be owned by individual teams (e.g., subscription vending), multiple teams interact with tools like ArgoCD.
The Development Cycle and Policy
Git serves as the source of truth for both platform and development teams. The Platform Team is tasked with creating policy as part of their feature work. For instance, when crafting a template for team egress from a cluster using Istio, part of this work involves establishing policies to prevent the admission of incorrectly configured objects.
Simply put, security practices and policies should be integrated into the development cycle, addressing them as part of individual pieces of work rather than leaving other teams to interpret their structure. It is crucial to ensure that policies do not override configurations in Git.
Using GitOps for Modern Infrastructure Development
Embracing GitOps in modern infrastructure development empowers teams to construct higher-quality, more manageable systems. This methodology facilitates improvements in all DORA metrics, leading to a more high-performing platform team.
Drawing from our experience with a large, heavily regulated financial services client, we initially implemented GitOps through a phased approach, starting with small automation components and progressively expanding while simultaneously upskilling team members.
GitOps significantly enhanced our systems and overall quality but demanded a comprehensive understanding of DevOps principles and Git. The adoption of GitOps involves making certain tradeoffs that might reduce initial velocity but ultimately contribute to increased velocity in the long run.
Facing Challenges in Cloud, DevOps, or Security?get free consultation sessions
Let’s tackle them together!
We will contact you shortly.