Ante Miličević
January 5, 2024

Sharding the Clusters Across Argo CD Application Controller Replicas

In this article, we will delve into a particular issue that might appear with your Argo CD setup when managing multiple clusters.

Argo CD is an open-source GitOps continuous delivery tool that simplifies the automation of application deployment to Kubernetes clusters. With the rising adoption of GitOps and Kubernetes, Argo CD has carved out its position as one of the most favored options in the GitOps ecosystem. This article is part of a series where we explore various Argo CD-related issues that we have encountered while delivering Argo CD enterprise support to our diverse clientele.

Problem Statement

There was an occasion when a customer encountered an issue with the inconsistent sharding process of the Argo CD Application Controller. The customer reported a sluggish synchronization problem despite having multiple Argo CD Application Controller replicas operating for their clusters.At first glance, it appeared that the Application Controller was managing too many clusters initially, resulting in the consumption of excessive resources. Hence, the customer decided to scale up the Argo CD Application Controller statefulset.

The expectation here was that each replica of the Application Controller would concentrate on a subset of clusters, thereby dispersing the workload and memory usage - this process is known as sharding. The official Argo CD documentation even suggests utilizing sharding. Nevertheless, the sharding mechanism of the Argo CD Application Controller did not deliver much assistance.When our Argo CD support engineers delved deeply into the problem, they discovered that some of the Argo CD Application Controller replicas were overseeing more clusters compared to other replicas.

Some replicas were not supervising any clusters at all – suggesting that increasing the number of replicas does not ensure that your clusters will be uniformly sharded across the available replicas.To determine how the clusters are sharded, you can utilize the argocd command line utility. If this utility is not readily available, it can be installed by following the Argo CD CLI installation steps.

Once installed and linked to the Argo CD server, you can execute the following command:

<pre class="codeWrap"><code>argocd admin cluster stats</code></pre>

This command displays the shard assigned to each of the clusters managed by the connected Argo CD instance.

Below is an excerpt from the output of the aforementioned command:

<pre class="codeWrap"><code> SERVER                          SHARD  CONNECTION  NAMESPACES COUNT  APPS COUNT  RESOURCES COUNT
https://kubernetes.default.svc  0                  4                 65          217
<redacted>                      4                  4                 65          217
<redacted>                      4                  5                 73          228
<redacted>                      3                  4                 65          217
<redacted>                      0                  4                 65          217
<redacted>                      1                  5                 73          228
<redacted>                      3                  4                 65          217
<redacted>                      4                  4                 65          217
<redacted>                      4                  5                 73          228
</code></pre>

The following snippet is not a complete excerpt and serves only as a guide for interpreting the output of the argocd admin cluster stats command./code/In the aforementioned excerpt, the first column presents the server address of a specific Kubernetes cluster, while the second column contains the index of the Argo CD Application Controller replica responsible for maintaining the live state of the corresponding cluster.

For example, the first cluster with a server address of https://kubernetes.default.svc is managed by the Argo CD Application Controller's replica with index 0, or in other words, it is argocd-application-controller-0. Take note that all replicas of the Argo CD Application Controller carry an index number as a suffix. As such, shard 0 represents argocd-application-controller-0, shard 1 indicates argocd-application-controller-1, and so forth.

Upon examining the excerpt, you'll observe that four of the clusters are managed by the argocd-application-controller-4 pod. The argocd-application-controller-0 and argocd-application-controller-3 pods each handle two clusters, while argocd-application-controller-1 handles a single cluster only.

Troubleshooting

As an initial step in troubleshooting, our seasoned support engineers chose to examine the logs of the Argo CD Application Controller. While inspecting the logs for further diagnosis, they came across the following log numerous times in all of the replicas of the Argo CD Application Controller:

Note: The timestamps can vary across different logs. Since the log message was the same, we did not include additional logs here.

<pre class="codeWrap"><code>time="2023-07-21T11:27:12Z" level=info msg="Ignoring cluster <cluster-server-address> </code></pre>

Upon diving deeper into the issue to understand the sharding logic, our team discovered that the sharding function is engineered in such a manner that it designates a specific Argo CD Application Controller replica to supervise a cluster. This is based on the UUID of the secret that stores the cluster (assuming no manual intervention in the sharding process).

Logic behind Sharding in Argo CD Application Controller

The illustrated flow diagram below provides a visual representation of how the sharding logic operates internally within the Argo CD codebase.Note: The diagram showcases the sharding logic for Argo CD versions < 2.8.0. With the introduction of Argo CD version 2.8.0, this sharding logic is now referred to as the legacy sharding algorithm.

Image of legacy sharding algorithm

Solution for uniform cluster sharding across Argo CD Application Controller replicas

In the current landscape, there are two methods of addressing such a situation:

  • A. Utilizing a round-robin algorithm
  • B. Manually defining the shardIn our case, our team proceeded with Solution B, as it was the sole option available when the issue arose.

However, with the launch of Argo CD 2.8.0 (released on August 7, 2023), matters have improved significantly. Presently, there are two approaches to handling the sharding problem with the Argo CD Application Controller:

Solution A: Use the Round-Robin sharding algorithm (available only for Argo CD 2.8.0 and later releases)

An issue was highlighted on GitHub regarding the sharding algorithm of the Argo CD Application Controller, which was resolved in Argo CD 2.8.0 with pull request 13018.This implies that users can update to version 2.8.0 or a later version and adjust the sharding algorithm to eliminate this problem. If an upgrade to 2.8.0 is not feasible or desirable, you might consider opting for Solution B.

Nonetheless, it's worth mentioning that, as of the time of writing this post, the new round-robin sharding algorithm is not the default sharding algorithm for the Argo CD Application Controller. The system continues to use the legacy sharding algorithm as the default one.

How to configure the Argo CD Application Controller to use a round-robin sharding algorithm?

To configure the sharding algorithm in Argo CD 2.8.0 or later, it's necessary to set controller.sharding.algorithm to round-robin in the argocd-cmd-params-cm configmap.

If your installation of Argo CD was performed via manifest files, connect to the cluster where Argo CD operates, modify the namespace in the following command, and execute it:

<pre class="codeWrap"><code>kubectl patch configmap argocd-cmd-params-cm -n <argocd-namespace> --type merge -p '{"data":{"controller.sharding.algorithm":"round-robin"}}'</code></pre>

Upon successful update of the configmap, execute a restart of the Argo CD Application Controller statefulset with the following command:

<pre class="codeWrap"><code>kubectl rollout restart -n <argocd-namespace> statefulset argocd-application-controller</code></pre>

To validate that the Argo CD Application Controller is using a round-robin sharding algorithm, execute the next command:

<pre class="codeWrap"><code>kubectl exec -it argocd-application-controller-0 -- env | grep ARGOCD_CONTROLLER_SHARDING_ALGORITHM</code></pre>

The expected output is:

<pre class="codeWrap"><code>ARGOCD_CONTROLLER_SHARDING_ALGORITHM=round-robin</code></pre>

If you're maintaining Argo CD via Helm, simply add controller.sharding.algorithm: "round-robin" key-value pair under .config.params in the values file and either install/upgrade the setup to achieve similar results.In situations where you're maintaining Argo CD with Argo CD Operator, integrate ARGOCD_CONTROLLER_SHARDING_ALGORITHM environment variable under controller in the ArgoCD resource specification and assign its value as 'round-robin'.

Ensure that sharding is enabled for controller using Sharding.enabled flag under controller. Apply the configuration once the modifications are completed.

Solution B: Manually define the shard

This serves as a temporary solution if the user is reluctant to update the existing Argo CD instance or intends to manage the sharding manually.

Define the shard for a new cluster

When introducing a new cluster, specify the index of the application-controller replica required to manage the cluster, against the shard key, whilst defining the specific cluster secret.

For example:

<pre class="codeWrap"><code>apiVersion: v1
kind: Secret
metadata:
 name: <secret-name>
 labels:
   argocd.argoproj.io/secret-type: cluster
 namespace: <secret-namespace>
type: Opaque
stringData:
 name: <cluster-name>
 server: <server-url>
 config: <configuration>
 shard: "<desired-application-controller-replica-index-here>"
</code></pre>

The value of shard would be found at the .stringData.shard location when inputting the data. Upon checking the secret again, the shard key's base64 encoded value can be found at .data.shard within the secret. It's important to note that the shard value should be in string format, not int format. Using quotes for that may be beneficial.

Should you desire to add the cluster imperatively, specify the application-controller replica's index required to manage the cluster, against the --shard argument. For example:

<pre class="codeWrap"><code> argocd cluster add < context-here > \
 --shard <desired-application-controller-replica-index-here>
</code></pre>

Remember that you need to provide an int value if you're adding the cluster imperatively.

Update the shard for an existing cluster

If you possess a pre-existing cluster for which you'd like to manually define the shard, you need to alter the specific cluster secret and add the following block:

<pre class="codeWrap"><code>stringData:
 shard: "<desired-application-controller-replica-index-here>"
</code></pre>

The value of the shard would be found at the .stringData.shard location while inputting data. When you inspect the secret once more, the shard key's base64 encoded value could be discovered at .data.shard within the secret. Keep in mind that the shard value should be in string format, not int format. It may be wise to use quotes in this case.

After the sharding process was concluded, the balanced and efficient distribution of different clusters managed by the Argo CD sharding procedure can be visualized using the following chart:

Image of Argo CD cluster distribution

Conclusion

So, we explored various methods for addressing the improper sharding mechanism of the Argo CD Application Controller. While employing the built-in solution (round-robin sharding algorithm) is typically more sensible, some situations call for manual sharding. For instance, if you have three clusters—two with 400 applications each and a third with 800 applications—it is reasonable to allocate one shard between the first two clusters and dedicate another shard to the third cluster.

At the time of writing, discussions suggest that the round-robin sharding algorithm in Argo CD 2.8.0 is still experiencing logging issues (generating excessive logs). Nevertheless, this change appears to be moving in the right direction and the issue is currently being addressed. It should be resolved soon.

Note: It is crucial to scale the Application Controller prudently and in accordance with your environment's real requirements. Monitoring the performance and resource utilization of the Application Controller can help you make informed decisions about when and how to scale.

Facing Challenges in Cloud, DevOps, or Security?
Let’s tackle them together!

get free consultation sessions

In case you prefer e-mail first:

Thank you! Your message has been received!
We will contact you shortly.
Oops! Something went wrong while submitting the form.
By clicking “Accept”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information. If you wish to disable storing cookies, click here.