Benjamin Kušen
December 27, 2023

Streamlining Kubernetes Jobs with Argo CD: Troubleshooting Using Argo CD Hooks

In this article, we'll delve into the intricacies of managing Kubernetes jobs with Argo CD, addressing the issues, and exploring effective strategies for troubleshooting and optimization.

In the dynamic world of DevOps, GitOps has emerged as a powerful approach to streamline the development process, leveraging Git as the single source of truth for managing infrastructure and applications. Argo CD, a versatile GitOps tool, plays a crucial role in automating the deployment and synchronization of applications with the desired state defined in Git. While Argo CD seamlessly handles various Kubernetes resources, managing Kubernetes jobs can present unique challenges due to their immutable nature.

The Point of Failure: Updating Kubernetes Jobs

Kubernetes jobs serve a vital purpose in orchestrating batch processes and ensuring their successful execution within the cluster. However, a problem is faced by customers during database migration. They have successfully synchronized their Git repository—which includes, among other things, crucial yaml configurations for Kubernetes jobs—with the Kubernetes cluster by implementing Argo CD as their preferred tool.

They tried to make the required changes in the Git repository but discovered that the procedure was very complicated. When the job was first deployed, it was successfully produced and carried out exactly as expected. Subsequent attempts to improve the job's YAML definition within the repository, however, produced unforeseen outcomes. Argo CD overlooked and neglected to implement these changes, resulting in an unexpected obstacle during the database migration procedure.

Troubleshooting

Let's delve into a comprehensive troubleshooting approach to solve this problem.

First, let’s check the controlled logs.

<pre class="codeWrap"><code>kubectl logs argocd-application-controller-0 -n argocd</code></pre>

<pre class="codeWrap"><code>time="2023T04:23:57Z" level=info msg="Updating operation state. phase: Running -> Failed,
message: 'one or more tasks are running' -> 'one or more objects failed to apply, reason: error
when replacing \"/dev/shm/3711963105\": Job.batch \"test-job\" is invalid: [spec.selector:
Required value, spec.template.metadata.labels: Invalid value: map[string]string(nil): `selector`
does not match template `labels`, spec.selector: Invalid value: \"null\": field is immutable,
spec.template: Invalid value: core.PodTemplateSpec{ObjectMeta:v1.ObjectMeta{Name:\"\", GenerateName:\"\", Namespace:\"\", SelfLink:\"\", UID:\"\",
ResourceVersion:\"\", TopologySpreadConstraints:[]core.TopologySpreadConstraint(nil), OS:(*core.PodOS)(nil)}}:</code></pre>

Image of control logs out of sync error
The controlled log gives “out-of-sync error”

Let’s get the app details argocd app get test-job The output gives error while syncing.

<pre class="codeWrap"><code>Job.batch "test-job" is invalid:
[ spec.selector: Required value, spec.template.metadata.labels: Invalid value: map[string]string(nil): `selector` does not match template `labels`, spec.selector: Invalid value: "null": field is immutable ]</code></pre>

Initially, we attempted to resolve the issue by setting the Replace=true flag in the sync options of the Argo CD application and re-syncing it. However, this approach failed to address the problem and resulted in the same error. The underlying cause of the issue was the immutability of certain fields in job specifications. The immutable nature of Kubernetes jobs poses a challenge when it comes to modifying job specifications in response to changes in the source repo.

Immutability refers to the inability to modify certain fields of a resource once it has been created. This restriction is particularly relevant for Kubernetes jobs, where fields like spec.selector and spec.template cannot be altered after their initial instantiation.

Possible Solutions

To address this challenge, we'll explore two effective strategies.

Leveraging generateName Parameter

The generateName parameter allows to specify a prefix for the resource name, ensuring that a new resource is created each time the application is synced. This approach ensures that the latest job definition is always applied. For this approach to work, you'll need to make sure Argo CD is installed in your cluster. Here's what you need to do.

First create an Argo CD application.

<pre class="codeWrap"><code>apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: standalone-job
spec:
  destination:
    name: ''
    namespace: default
    server: 'https://kubernetes.default.svc'
  source:
    path: standalone-job
    repoURL: 'https://github.com/infracloudio/kubernetes-job-with-argocd'
    targetRevision: HEAD
  sources: []
  project: default
  syncPolicy:
    syncOptions:
    - Replace=true
</code></pre>

Once this is done, sync the application using CLI.

<pre class="codeWrap"><code>argocd app sync argocd/standalone-job</code></pre>

This will create a new job with a new name on every sync.

Employing Resource Hooks with Hook Deletion Policy

Argo CD hooks are a powerful feature that allows you to customize the deployment process by adding custom logic. Hooks can be executed at specific points in the deployment lifecycle, such as before, during, or after the application of manifests. The different types of Argo CD hooks are: PreSync Hooks, Sync Hooks, SyncFail Hooks, and PostSync Hooks. If you want to know about them in detail click here.

Another feature of ArgoCD is the hook deletion policy. The deletion policy allows control of the fate of hooks when a resource is deleted. By utilizing the annotation argocd.argoproj.io/hook-delete-policy, hooks can automatically be removed when a resource is deleted. Alternatively, you can manually delete hooks after they have been executed successfully or unsuccessfully.

The hook deletion policy specifies when hooks should be deleted:

  • HookSucceeded: The hook is deleted after a successful execution.
  • HookFailed: The hook is deleted after an unsuccessful execution.
  • BeforeHookCreation: The hook is deleted before a new hook is created, which occurs when a new sync is initiated.

To apply a specific deletion policy, the following annotation can be used: argocd.argoproj.io/hook-delete-policy. For instance, to delete a PostSync hook after a successful execution, set the annotation to:

<pre class="codeWrap"><code>
metadata:
annotations:
argocd.argoproj.io/hook: PostSync argocd.argoproj.io/hook-delete-policy: HookSucceeded
</code></pre>

Resource hooks provide a mechanism for executing custom scripts or commands within the Argo CD application lifecycle. By utilizing the Sync annotation and setting a BeforeHookCreation deletion policy, we can ensure that the existing job is deleted before a new one is created during each sync.

Let’s look at the use cases where these solutions can effectively help in overcoming immutable fields’ challenges.

Use-case 1: Managing Standalone Jobs

Standalone jobs are not part of the Argo CD application definition and are typically managed manually. To handle synchronization for standalone jobs, we can utilize the generateName parameter and set a Replace=true sync option in the Argo CD application. This will ensure that a new job is created with a unique name whenever the application is synced.

Use-case 2: Managing Jobs with Pod Resources

For jobs integrated with pod resources, we can leverage the Sync annotation and BeforeHookCreation deletion policy to achieve automatic job updates. Whenever the pod image associated with the job is changed, the existing job will be deleted, and a new one will be created based on the updated manifest.

Conclusion

By understanding the challenges posed by immutable fields in Kubernetes jobs and employing effective strategies like generateName and resource hooks with hook deletion policy, we can effectively manage and troubleshoot job synchronization issues. Argo CD's versatility and flexibility enable us to adapt to various job scenarios and ensure that our applications are always in the desired state. We hope this article has provided valuable insights into managing Kubernetes jobs with Argo CD.

Facing Challenges in Cloud, DevOps, or Security?
Let’s tackle them together!

get free consultation sessions

In case you prefer e-mail first:

Thank you! Your message has been received!
We will contact you shortly.
Oops! Something went wrong while submitting the form.
By clicking “Accept”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information. If you wish to disable storing cookies, click here.