Benjamin Kušen
January 4, 2024

Migrating Etcd Between Cloud Kubernetes Clusters Without Downtime

Need to migrate etcd between Kubernetes clusters, but can't afford any downtime? Let us show you how to do it.

Have you ever faced the necessity to transfer etcd storage from a Kubernetes cluster in one cloud host to another? During such a process, you typically cannot afford to switch it off, as it can potentially cause a partial or significant disruption of the services depending on it. In this article, we delve into an unusual and relatively less traversed route to migrate etcd* from a Kubernetes cluster in one cloud to another.

The presented method can assist in avoiding downtime and the associated repercussions. Given both clusters are cloud-based, we foresee encountering certain restrictions and challenges, which we will explore in-depth.

*We are not referring to the migration of etcd where Kubernetes stores the entire cluster state. Instead, our focus is on a standalone etcd installation utilized by third-party applications housed within a K8s cluster.

Migrating etcd can be approached in two ways:

  • The most apparent method is taking an etcd snapshot and restoring it at a new location. However, this path involves downtime which we prefer to avoid.
  • The secondary technique includes propagating etcd over two Kubernetes clusters. This requires constructing independent StatefulSets in each of the K8s clusters, followed by merging them into a singular etcd cluster. This method carries inherent risks—any anomaly may negatively impact the already running etcd cluster. Nonetheless, it enables the migration of etcd between clusters without inducing downtime. We will concentrate on this method from henceforth.

Note: While we do use AWS in this article, the process remains virtually identical for any other cloud service provider. The K8s clusters utilized in the given examples are managed via the Deckhouse Kubernetes platform. Consequently, certain functionalities may be specific to this platform. For these situations, we will offer alternative procedures.

We expect that readers have a foundational understanding of etcd and have dealt with this database before. We recommend readers also check the official etcd documentation.

Step 1: Reducing the size of the etcd database

Note: This section may be disregarded if your etcd cluster has a limitation on the count of each key's revisions.

The first thing to consider before initiating the migration pertains to the etcd database size. An extensive database may prolong the bootstrap time for new nodes and possibly trigger complications. Hence, let's explore procedures to reduce database size.

Identify the present revision and get the list of keys:

<pre class="codeWrap"><code> etcdctl get / --prefix --keys-only
/main_production/main/config
/main_production/main/failover
/main_production/main/history…
</code></pre>

Now, let's examine a random key in JSON format:

<pre class="codeWrap"><code>etcdctl get /main_production/main/history -w=json
{"header":{"cluster_id":13812367153619139789,"member_id":7168735187350299418,"revision":5828757,..
</code></pre>

You'll notice the present cluster revision (in our scenario, this is 5828757).

Subtract the quantity of revisions you wish to retain from this figure: From our experience, keeping a thousand revisions is generally adequate.

Execute the etcdctl compaction with the resultant value:

<pre class="codeWrap"><code>etcdctl compaction 5827757</code></pre>

This command applies globally across the entire etcd cluster - just execute it once on any of the nodes. For an in-depth understanding of how compaction (and other etcdctl commands) operates, we recommend referring to the official documentation.

Next, execute defrag to free up space:

<pre class="codeWrap"><code>etcdctl defrag --command-timeout=90s</code></pre>

You need to run this command sequentially on each node. We suggest carrying out this step on all nodes except the leader, then switching the leader to a defragmented node using etcdctl move-leader. Finally, you can revert to the last node to minimize the database size there.

In our experience, this process shrunk the database size from 800 MB to about 700 KB, substantially reducing the time necessary for subsequent steps.

The etcd chart to use

etcd operates as a StatefulSet. Provided below is an instance of a StatefulSet implemented in a cluster:

<pre class="codeWrap"><code>apiVersion: apps/v1
kind: StatefulSet
metadata:
 name: etcd
 labels:
   app: etcd
spec:
 serviceName: etcd
 selector:
   matchLabels:
     app: etcd
 replicas: 3
 template:
   metadata:
     labels:
       app: etcd
   spec:
     affinity:
       podAntiAffinity:
         requiredDuringSchedulingIgnoredDuringExecution:
         - labelSelector:
             matchExpressions:
             - key: app
               operator: In
                values:
               - etcd
           topologyKey: kubernetes.io/hostname
     imagePullSecrets:
     - name: registrysecret
     containers:
     - name: etcd
       image: quay.io/coreos/etcd:v3.4.18
       command:
       - sh
       args:
       - -c
       - |
         stop_handler() {
             >&2 echo "Caught SIGTERM signal!"
             kill -TERM "$child"
         }

     trap stop_handler SIGTERM SIGINT

     etcd \
     --name=$HOSTNAME \
     --initial-advertise-peer-urls=http://$HOSTNAME.etcd:2380 \
     --initial-cluster-token=etcd-cortex-prod \
     --initial-cluster etcd-0=http://etcd-0.etcd:2380,etcd-1=http://etcd-1.etcd:2380,etcd-2=http://etcd-2.etcd:2380 \
     --advertise-client-urls=http://$HOSTNAME.etcd:2379 \
     --listen-client-urls=http://0.0.0.0:2379 \
     --listen-peer-urls=http://0.0.0.0:2380 \
     --auto-compaction-mode=revision \
     --auto-compaction-retention=1000 &
     child=$!
     wait "$child"
   env:
   - name: ETCD_DATA_DIR
     value: /var/lib/etcd
   - name: ETCD_HEARTBEAT_INTERVAL
     value: 200
   - name: ETCD_ELECTION_TIMEOUT
     value: 2000
   resources:
     requests:
       cpu: 50m
       memory: 1Gi
     limits:
       memory: 1gi
   volumeMounts:
   - name: data
     mountPath: /var/lib/etcd
   ports:
   - name: etcd-server
     containerPort: 2380
   - name: etcd-client
     containerPort: 2379
   readinessProbe:
     exec:
       command:
       - /bin/bash
       - -c
       - /usr/local/bin/etcdctl endpoint health
     initialDelaySeconds: 10
     periodSeconds: 10
     timeoutSeconds: 10
volumeClaimTemplates:  -
metadata:
     name: data
   spec:
     accessModes: [ "ReadWriteOnce" ]
     resources:
       requests:
         storage: 2Gi
---
apiVersion: v1
kind: Service
metadata:
 name: etcd
spec:
 clusterIP: None
  ports:
  - name: etcd-server
    port: 2380
  - name: etcd-client
    port: 2379
  selector:
    app: etcd
</code></pre>

We will delve into other components of the aforementioned chart as we navigate through the course of this article.

Step 2: Making etcd nodes accessible from the outside

In case etcd is being utilized by clients external to the Kubernetes cluster, it's likely there are specific instances routing traffic to the pods. But, for bootstrapping new nodes, it's crucial for each cluster node to be accessible externally via a predefined IP address. This is the key challenge when operating with a cloud cluster.

In a static Kubernetes cluster, each etcd node is quickly accessible: you simply need a service like NodePort, combined with a firm NodeSelector for the pods. However, in the cloud where a pod can shift to a new node at any given time without the IP address being known beforehand, this approach isn't plausible.

The solution lies in creating three separate LoadBalancer services - we need three due to the "tri-headed" etcd cluster. This leads to LBs being auto-provisioned by the cloud provider. Here's a chart for instance:

<pre class="codeWrap"><code> ---
apiVersion: v1
kind: Service
metadata:
  name: etcd-0
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: nlb
    service.beta.kubernetes.io/aws-load-balancer-internal: "true"
    service.beta.kubernetes.io/aws-load-balancer-subnets: id
spec:
  externalTrafficPolicy: Local
  loadBalancerSourceRanges:
  - 0.0.0.0/0
  ports:
  - name: etcd-server
    port: 2380
  - name: etcd-client
    port: 2379
  selector:
    statefulset.kubernetes.io/pod-name: etcd-0
  type: LoadBalancer
---
apiVersion: v1
kind: Service
metadata:
  name: etcd-1
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: nlb
    service.beta.kubernetes.io/aws-load-balancer-internal: "true"
    service.beta.kubernetes.io/aws-load-balancer-subnets: id
spec:
  externalTrafficPolicy: Local
  loadBalancerSourceRanges:
  - 0.0.0.0/0
  ports:
  - name: etcd-server
    port: 2380
  - name: etcd-client
    port: 2379
  selector:
    statefulset.kubernetes.io/pod-name: etcd-1
  type: LoadBalancer
---
apiVersion: v1
kind: Service
metadata:
  name: etcd-2
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: nlb
    service.beta.kubernetes.io/aws-load-balancer-internal: "true"
    service.beta.kubernetes.io/aws-load-balancer-subnets: id
spec:
  externalTrafficPolicy: Local
  loadBalancerSourceRanges:
  - 0.0.0.0/0
  ports:
  - name: etcd-server
    port: 2380
  - name: etcd-client
    port: 2379
  selector:
    statefulset.kubernetes.io/pod-name: etcd-2
  type: LoadBalancer
</code></pre>

The service.beta.kubernetes.io/aws-load-balancer-internal: "true" annotation outlines the type of load balancer (private IP) to be provisioned. The service.beta.kubernetes.io/aws-load-balancer-subnets: id annotation signifies the network to be utilized by the LB. Most cloud providers support this feature - only the annotations vary.Let's review the resources we possess in the cluster:

Image of cluster resources

Is the etcd client available?

<pre class="codeWrap"><code>telnet 10.100.0.47 2379
Trying 10.100.0.47...
Connected to 10.100.0.47.
Escape character is '^]'.
</code></pre>

Fantastic: our etcd nodes are now accessible from the outside!

Now, let's create identical services in the new cluster. Take note that at this juncture, we are merely establishing services in the new K8s cluster, not StatefulSets. (A StatefulSet with a unique name needs to be created in the new cluster, distinguished from the one in the existing cluster. The hostname in the pods has to differ as we utilize it as the etcd node name.)

Later, we will label our StatefulSet in the new cluster as etcd-main (feel free to choose another name), hence, let's alter the selectors and service names to sync with this new name:

<pre class="codeWrap"><code>…
name: etcd-main-0

 selector:
   statefulset.kubernetes.io/pod-name: etcd-main-0

</code></pre>

Additionally, you need to update the values in the service.beta.kubernetes.io/aws-load-balancer-subnets: id annotation to match the network ID in the new Kubernetes cluster. No modifications are required to any other service resources.

Let's reassess our setup:

Image of cluster resources two

It's not conducive to verify availability at this stage, given that there are no pods to which these services are linked.

Step 3: DNS magic

So, we have successfully enabled new nodes at the IP address level. It's now time to examine the node IDs in etcd. Presented below are the startup parameters:

<pre class="codeWrap"><code>--name=$HOSTNAME \
--initial-advertise-peer-urls=http://$HOSTNAME.etcd:2380 \
--advertise-client-urls=http://$HOSTNAME.etcd:2379 \
</code></pre>

We won't delve into the functionalities of each parameter; you can gain more insights from the official documentation. The key takeaway here is that the node's name is the pod's hostname. Nodes connect with one another using a <hostname>.<namespace> FQDN. To ensure a new node functions, the FQDN has to be accessible from the pods.

There are several methods to achieve this:

  • The most straightforward approach is to add static records to pods' /etc/hosts by modifying the StatefulSet. However, this necessitates pod restarts.
  • An alternative is to resolve names at the kube-dns level. We will employ this technique in our case. In the example below, static records are integrated using the Deckhouse kube-dns module:

<pre class="codeWrap"><code>spec:
 settings:
   hosts:
   - domain: etcd-main-0
      ip: 10.106.0.34
   - domain: etcd-main-1
     ip: 10.106.0.42
   - domain: etcd-main-2
     ip: 10.106.0.47
   - domain: etcd-main-0.etcd-main
     ip: 10.106.0.34
   - domain: etcd-main-1.etcd-main
     ip: 10.106.0.42
   - domain: etcd-main-2.etcd-main
     ip: 10.106.0.47
</code></pre>

Let's see the pod's resolve:

<pre class="codeWrap"><code>host etcd-main-0
etcd-main-0 has address 10.106.0.34
host etcd-main-0.etcd-main
etcd-main-0.etcd-main has address 10.106.0.34
</code></pre>

Everything is in order now! Let's replicate the same procedure in the new cluster and introduce static records for the etcd nodes from the old cluster:

<pre class="codeWrap"><code>spec:
 settings:
   hosts:
   - domain: etcd-0
     ip: 10.100.0.47
   - domain: etcd-1
     ip: 10.100.0.46
   - domain: etcd-2
     ip: 10.100.0.37
   - domain: etcd-0.etcd
     ip: 10.100.0.47
   - domain: etcd-1.etcd
     ip: 10.100.0.46
   - domain: etcd-2.etcd
     ip: 10.100.0.37
</code></pre>

Remarkably, this magic was quite simple, right?

Step 4: Adding new nodes to the etcd cluster

Finally, we have arrived at the point where we can add new nodes to the etcd cluster, extending it across our two Kubernetes clusters. To accomplish this, execute the following command within any currently active etcd pod:

<pre class="codeWrap"><code>etcdctl member add etcdt-main-0 --peer-urls=http://etcd-main-0.etcd-main
:2380
</code></pre>

Since we are already aware of the StatefulSet name (etcd-main), we also have knowledge of the names of the new pods.

Important note: You might question, “Why not insert all the new nodes at once?”. We have a 6-member cluster with a quorum of 4. Introducing four nodes simultaneously will lead to a quorum loss, causing the existing nodes to fail.

Now, let's modify the deployment chart for the new cluster:

<pre class="codeWrap"><code>apiVersion: apps/v1
kind: StatefulSet
metadata:
 name: etcd-main
 labels:
   app: etcd-main
spec:
 serviceName: etcd
 selector:
   matchLabels:
     app: etcd
 replicas: 1
 template:
   metadata:
     labels:
       app: etcd
   spec:
     affinity:
       podAntiAffinity:          requiredDuringSchedulingIgnoredDuringExecution:
         - labelSelector:
             matchExpressions:
             - key: app
               operator: In
                values:
               - etcd
           topologyKey: kubernetes.io/hostname
     imagePullSecrets:
     - name: registrysecret
     containers:
     - name: etcd
       image: quay.io/coreos/etcd:v3.4.18
       command:
       - sh
       args:
       - -c
       - |
         stop_handler() {
             >&2 echo "Caught SIGTERM signal!"
             kill -TERM "$child"
         }

     trap stop_handler SIGTERM SIGINT

     etcd \
     --name=$HOSTNAME \
     --initial-advertise-peer-urls=http://$HOSTNAME.etcd-main:2380 \
     --initial-cluster-state existing \
     --initial-cluster-token=etcd-cortex-prod \
     --initial-cluster etcd-main-0=http://etcd-main-0.etcd-main:2380,etcd-0=http://etcd-0.etcd:2380,etcd-1=http://etcd-1.etcd:2380,etcd-2=http://etcd-2.etcd:2380 \
     --advertise-client-urls=http://$HOSTNAME.etcd:2379 \
     --listen-client-urls=http://0.0.0.0:2379 \
     --listen-peer-urls=http://0.0.0.0:2380 \
     --auto-compaction-mode=revision \
     --auto-compaction-retention=1000 &
     child=$!
     wait "$child"
   env:
   - name: ETCD_DATA_DIR
     value: /var/lib/etcd
   - name: ETCD_HEARTBEAT_INTERVAL
     value: 200
   - name: ETCD_ELECTION_TIMEOUT
     value: 2000
   resources:
     requests:
       cpu: 50m
       memory: 1Gi
     limits:
       memory: 1gi
   volumeMounts:
   - name: data
     mountPath: /var/lib/etcd
   ports:
   - name: etcd-server
     containerPort: 2380
   - name: etcd-client
     containerPort: 2379
   readinessProbe:
     exec:
       command:
       - /bin/bash
       - -c
       - /usr/local/bin/etcdctl endpoint health
     initialDelaySeconds: 10
     periodSeconds: 10
     timeoutSeconds: 10
  volumeClaimTemplates:
- metadata:
 name: data
spec:
 accessModes: [ "ReadWriteOnce" ]
 resources:requests:
 storage: 2Gi
</code></pre>

In this chart, both the name and the start command have been altered. Let's look at the latter more closely:

  • The command now includes the --initial-cluster-state existing flag. This suggests that the existing cluster members are being bootstrapped instead of creating an entirely new one (consult the documentation for further details).
  • The --initial-advertise-peer-urls parameter has changed due to the alteration in the StatefulSet's name.
  • Most importantly, the --initial-cluster flag has been modified. It includes all the existing cluster members, in addition to the newly incorporated etcd-main-0 node.

Since the nodes are added sequentially, the replicas key must be set to 1 for the initial deployment. Verify that the newly inserted node has successfully integrated into the cluster (etcdctl endpoint status):

Image of new node in cluster verification

Let's add two more nodes (the steps mirror the ones described previously):

  • Insert a new node into the cluster using the etcdctl member add command.
  • Modify the new StatefulSet: add another replica and adjust the --initial-cluster key to include a new node.
  • Wait for the node to successfully join the etcd cluster.

It's worth noting that the kubectl scale statefulset command cannot be employed, as it's necessary to modify the parameter in the new StatefulSet's start command and alter the replica count.

Now, let's check the status of the cluster:

Image of checking the cluster status

Everything appears to be in order. You can now switch the etcd leader to one of the new nodes utilizing etcdctl:

Step 5: Rerouting etcd clients

Now, your next task is to redirect the etcd clients to the updated endpoints. In our specific scenario, the client was a PostgreSQL cluster utilizing Patroni. A comprehensive discussion of the necessary changes to its configuration would be outside the bounds of this article.

Step 6: Deleting the old nodes from the etcd cluster

Now, it's time to remove the old nodes. Keep in mind, it's essential to delete them one at a time to prevent the loss of the cluster quorum.

Let's go through the procedure step by step:

  • Fistly, delete one of the old pods by scaling the StatefulSet in the previous K8s cluster:

<pre class="codeWrap"><code>kubectl scale sts etcd –-replicas=2</code></pre>

  • Remove a member from the etcd cluster:

<pre class="codeWrap"><code>etcdctl member remove e93f626220dffb --endpoints=etcd-0:2379,etcd-1:2379,etcd-main-0:2379,etcd-main-1:2379,etcd-main-2:2379</code></pre>

  • Verify the status of the etcd cluster:

<pre class="codeWrap"><code>etcdctl endpoint health</code></pre>

Repeat the process for the remaining nodes.

We suggest retaining the Persistent Volumes of old pods, if possible. They may prove useful if a rollback to the original state is required. After discarding all the old nodes, modify the etcd command in the StatefulSet within the new Kubernetes cluster (exclude the old nodes from it):

<pre class="codeWrap"><code>…
--initial-cluster etcd-main-0=http://etcd-main-0.etcd-main:2380,etcd-main-1=http://etcd-main-1.etcd-main:2380,etcd-main-2=http://etcd-main-2.etcd-main:2380

</code></pre>

Step 7: Deleting the remaining etcd resources in the old Kubernetes cluster

Once the new etcd cluster "settles down" and you are confident that it functions as expected, erase the resources remaining from the old etcd cluster (Persistent Volumes, Services, etc.). With this step, your migration is successfully completed—congratulations!

Conclusion

The approach to migrating etcd among Kubernetes cloud clusters detailed above may not be the most straightforward one. However, it can facilitate the movement of etcd from one cluster to another in a relatively fast manner and without experiencing any downtime.

Facing Challenges in Cloud, DevOps, or Security?
Let’s tackle them together!

get free consultation sessions

In case you prefer e-mail first:

Thank you! Your message has been received!
We will contact you shortly.
Oops! Something went wrong while submitting the form.
By clicking “Accept”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information. If you wish to disable storing cookies, click here.