Benjamin Kušen

December 17, 2023

Prometheus and Centralized Storage: How It Works, When You Need It, and What Is Mimir?

Today, we'll imagine a fictional startup and explain everything you need to know about Prometheus and Mimir in practice.

Let's delve into the fundamentals of Prometheus architecture and continue our exploration of the challenges associated with storage and the solutions available. Our focus is on a fictional startup, its software development endeavors, and the hurdles faced in storing and processing metrics.

Follow along as we track the company's growth and observe the evolution of its monitoring system to meet expanding business needs.

Stage 1: The Question of Monitoring

In this initial phase, our small startup lacks any clients but harbors a promising idea. The team, consisting of two developers and the CTO, Sam, is actively seeking investors and initial customers.

‍

Sam, primarily responsible for investor outreach, encounters a setback during a presentation when the application experiences lag and crashes. Post-presentation, Sam inquires with the developers about the issue. Peter, one of the developers, confesses that they lack a monitoring system, giving rise to the company's first goal: metric collection.

‍

The choice of Prometheus for this purpose appears natural, and its implementation proves relatively straightforward. Prometheus handles data collection, while Grafana facilitates visualization.

Prometheus and Grafana setup in Kubernetes cluster — Prometheus and Grafana setup in a Kubernetes cluster

SAM raises a crucial question: "What if Prometheus crashes?" The answer is straightforward—no Prometheus means no data collection. With the pull model in use, the absence of Prometheus results in the inability to gather metrics.

‍

This architecture exhibits drawbacks such as:

1. Absence of fault tolerance.

2. Limited scalability due to Prometheus being heavily dependent on available machine resources.

‍

These concerns are currently not major issues for our promising startup. Let's continue following the narrative as the story unfolds.

Stage 2: Ensuring Constant Availability of Monitoring

As time swiftly passes, our company reaches its second stage, having celebrated its first anniversary. It now stands as a thriving entity with a customer base, investors, and a considerable community. The team size and infrastructure have both expanded significantly.

‍

In a scenario that resonates with many, a late Friday night brings an unwelcome surprise. Sam receives a customer complaint, prompting him to reach out to Peter for investigation. Given the high priority due to customer impact, Peter explains that their sole Prometheus instance is inaccessible since its virtual machine is down.

‍

Consequently, the monitoring system is currently unavailable, leaving Peter in the dark about the application's status. This sets the stage for the company's next goal: introducing fault tolerance to its monitoring system. The solution is apparent — deploy a second Prometheus instance!

Fault tolerant setup with two Prometheus instances — A fault-tolerant setup featuring two Prometheus instances, a load balancer, and Grafana

Running Dual Prometheus Instances

Given the pull model in use, duplicating the configuration files between the two instances is a straightforward process. Consequently, both instances can commence data collection from identical sources. Let's delve into the mechanics of this setup.

‍

Here, two Prometheus instances operate alongside a Load Balancer responsible for directing traffic between them.

‍

When a request is directed to the first Prometheus instance, it serves the data, and vice versa. The Load Balancer efficiently manages the distribution of requests, ensuring a balanced and reliable data collection process.

Data flow with two Prometheus instances — Data flow in a configuration with two Prometheus instances

Potential Issues After Prometheus Instance Crash

Consider the scenario where one of the Prometheus instances encounters a crash. Initially, things seem to be in order:

1. The load balancer promptly removes the crashed instance from load balancing, halting the reception of any further requests.

2. The remaining instance continues its operations by collecting metrics and generating graphs.

‍

All appears well, but here's the catch—trouble arises when the initially crashed instance is restarted! The primary challenge lies in the pull model, leading to potential data loss.

Upon looking fordata from a temporarily down Prometheus instance, there arises a significant issue—a gap in the graph.

To tackle this concern, there's a straightforward solution:

‍

Utilize Dual Data Sources in Grafana - An effective approach involves setting up two data sources in Grafana. When gaps appear in the graphs, simply switch between these sources. Grafana will then fetch data from the operational Prometheus instance, ensuring seamless and accurate graph rendering.

Two data sources in fault tolerant setup — Two data sources in Grafana with fault-tolerant setup involving two Prometheus instances

Regrettably, the issue goes beyond the previously mentioned solution. Here's the situation: one Prometheus instance may possess certain data, while another instance may have a different set of data. No matter which instance Grafana opts for, gaps in the graphs persist!

‍

Admittedly, the architecture with dual data sources suffices for many scenarios. However, if this proves insufficient, there exists an alternative: establishing a proxy in front of Prometheus.

Extended fault tolerant setup with proxy — Extended fault-tolerant Prometheus and Grafana configuration with proxy

Setting up a Proxy for Prometheus

The proxy needed should be designed specifically for Prometheus, not a generic one such as NGINX.

Once the proxy receives a request, it will: fetch data from all Prometheus instances, combine the data, perform a PromQL query on the merged data, and provide the resulting output.

Grafana Prometheus setup with proxy data flow schematic — Data flow in a Prometheus, Grafana and proxy setup

Drawbacks of the Solution

Prometheus developers do not provide tools for implementing such proxies, leading to a reliance on third-party developments. This introduces an additional dependency since the PromQL version must align with the Prometheus version, which may not always be practical.

‍

Despite testing multiple tools, we haven't incorporated any into our production environment, making it challenging to recommend a specific one. Another aspect to consider is that Prometheus wasn't initially designed for long-term data storage.

‍

The documentation explicitly mentions:

"Prometheus’s local storage is not intended to be durable long-term storage; external solutions offer extended retention and data durability."

‍

While it can retain data for a few weeks, storing it for, let's say, five years is beyond its capability. This limitation could impact our startup, albeit not immediately.

Stage 3: Establishing a Centralized Metrics Storage

The company has evolved from a startup into a sizable corporation, witnessing an increase in both the workforce and the complexity of its infrastructure. With the addition of a full-fledged customer care team, the organization now comprises an operations team and multiple development teams.

‍

Initially, the application operated on a single Kubernetes cluster, efficiently managing the workload. However, the current scenario involves multiple clusters distributed across various data centers, yet the existing monitoring system remains functional.

‍

One day, Sam, the diligent CTO, receives a notification about service disruptions. In response, he initiates an investigation.

‍

Upon consulting the operations team, it becomes evident that with 10 data centers, navigating different Grafanas has become a challenging task. The issue arises from the inadequacy of the current implementation, which employs local Prometheus instances, especially when numerous clusters are spread across diverse locations.

‍

While one potential solution involves deploying a shared Grafana to aggregate data from all data centers, it still falls short of providing a comprehensive overview. The most viable resolution entails consolidating metrics from various sources into a singular location.

‍

Options like Thanos, Cortex, or Mimir present themselves as suitable solutions. Though distinct, they share a fundamental concept: a receptacle for data from Prometheus instances, enabling Grafana connection for unified monitoring.

Prometheus setup with centralized storage

The logical question that arises is whether it's feasible to eliminate local Prometheus instances in a cluster to cut costs.

‍

The feasibility of this depends on various factors. It's possible to substitute Prometheus with lighter applications such as Prometheus Agent, VM Agent, Grafana Agent, or OpenTelemetry Collector. These alternatives can efficiently gather metrics and transmit them to an external repository.

Centralized storage with lightweigh agent — Centralized storage with a lightweight agent

Considering Minor Details in the Setup:

1. Impact on Autoscaling:

Without a responsive entity, the autoscaling features (HPA and VPA in Kubernetes) won't function within the cluster.

2. Potential Metric Accuracy Issues:

The absence of entities for calculations may lead to inaccuracies in pre-calculated metrics.

3. Alerting System Concerns:

There's a risk to the alerting system. A broken connection between Prometheus and the centralized storage may result in a generic connection failure notification, lacking incident details. As an alternative, consider using Prometheus with a 1-day retention period.

‍

‍This adjustment brings several benefits:

Significantly reduces Prometheus resource consumption. Addresses the issues related to autoscaling, pre-calculated metrics, and potential disruptions in the alerting system.

‍

This results in the following refined architecture:

Clusters contribute metrics to a central metrics repository, and Grafana enables the visualization of these metrics. It's time to transition from our startup mindset and concentrate on addressing the long-term storage challenges associated with monitoring metrics.

What is Centralized Metrics Storage?

Centralized metrics storage refers to a system where metrics from various sources are gathered and stored in a single location for easy management and analysis.

‍

Note: While there are various solutions for establishing centralized metrics storage, we will focus on Mimir as an example. Our choice is based on extensive practical experience, and the insights provided should be beneficial even for those exploring similar alternatives.

Addressing Prometheus Limitations

The primary drawback of Prometheus lies in its monolithic structure. To overcome this limitation, developers at Mimir devised a straightforward solution: they transformed Prometheus into a set of microservices. In essence, they broke it down into smaller components, resulting in Mimir, a microservices-based iteration of Prometheus.

‍

Additionally, Mimir introduced data segregation among different tenants. This means that data from distinct clusters or data centers can be stored independently of one another, while still allowing for multi-tenant queries.

How Does Mimir Operate?

Mimir's centralized storage architecture is structured as follows:

It appears complex at first, so let's examine each aspect one by one.

Writing Metrics

All metrics initially enter the Distributor. Its primary role is to validate the metrics' format and choose the Ingester responsible for their onward transmission. Subsequently, the data is transmitted to the Ingester, which constructs data blocks in its memory—essentially, these are the same blocks discussed earlier, resembling Prometheus blocks.

‍

Upon reaching full capacity, Mimir concludes the block, archives it onto disk, and dispatches it to S3 for prolonged storage. Following the storage of metrics in S3, the Compactor takes action to enhance block storage efficiency. It consolidates one-hour blocks into two-hour blocks and continues this optimization process.

Reading Metrics

This is the point where things become a bit more intricate. Every inquiry leads to the Query frontend.

Here's how it works:

It examines the cache for a pre-existing response to the received query and delivers it if the relevant data is present. If no data is discovered, the query is sent on to the Querier.

The Querier obtains the query from the Query frontend and readies the necessary data for executing the query. To accomplish this, it initially seeks the data from the Ingester and subsequently from the Store gateway, serving as the S3 data gateway.

Why not just retrieve metrics directly from S3 and what use does the Store gateway serve? Let's take a somewhat deeper look at it.

Store Gateway

The data blocks find their home in S3. When initiated, the Store gateway retrieves portions of blocks with IDs corresponding to the label sets, typically constituting 5–10% of the overall data volume.

‍Upon a data request:

The Store gateway consults local label and ID match data to determine which data needs retrieval from specific S3 blocks. Subsequently, it fetches the required metrics and forwards them to the Querier. The data then proceeds to the Query frontend, undertaking PromQL calculations and delivering results to the user.

Mimir Architecture Benefits

Why does this architecture exhibit such complexity? There are several reasons:

1. Resiliency: Each Mimir component scales independently, ensuring fault tolerance and adaptability to varying load profiles.

2. Replication Support: All components involved in data writing or reading support replication. For instance, the Distributor transmits data to multiple Ingesters rather than a singular one, facilitating smooth updates and fortifying the system against failures within individual virtual machines.

3. Scaling: All components endorse data sharding, with each component managing a portion, rather than the entirety, of the dataset.

4. Storage Space Optimization: Thanks to the Compactor.

Mimir’s Pros and Cons

Mimir pros:

Fault tolerance by default.

Horizontal scaling (with data sharding capability).

Extensive (decades-long!) storage of monitoring data.

Segmentation of monitoring data

‍

The drawbacks of Mimir:

Complexity

Requirement for additional expertise.

‍

You should also onsider these pending issues when contemplating Mimir as your centralized metrics storage:

Absence of authorization mechanisms.

Elevated resource consumption.

Speed dependency on network performance.

Comparing the storage options for Prometheus

Our tale comes to an end with the table below. It illustrates the benefits and drawbacks of several Prometheus configurations for gathering and storing monitoring metrics.

‍

<div style="overflow-x:auto;text-align:center">
<table style="width:100%">
<tr>
<th></th>
<th>Collecting metrics</th>
<th>High availability</th>
<th>Long-term storage</th>
<th>Centralized storage</th>
<th>Learning curve</th>
</tr>
<tr>
<td>Prometheus</td>
<td>Yes</td>
<td>No</td>
<td>No</td>
<td>No</td>
<td>Easy</td>
</tr>
<tr>
<td>Prometheus HA</td>
<td>Yes</td>
<td>Yes</td>
<td>No</td>
<td>No</td>
<td>Medium</td>
</tr>
<tr>
<td>Prometheus long-term</td>
<td>Yes</td>
<td>No</td>
<td>Yes</td>
<td>No</td>
<td>Medium</td>
</tr>
<tr>
<td>Prometheus long-term + HA</td>
<td>Yes</td>
<td>Yes</td>
<td>Yes</td>
<td>No</td>
<td>Medium</td>
</tr>
<tr>
<td>Mimir</td>
<td>No</td>
<td>Yes</td>
<td>Yes</td>
<td>Yes</td>
<td>Hard</td>
</table>
</div>

This detailed table of comparisons, coupled with our narrative featuring the journey of a hypothetical startup and its step-by-step expansion, can assist you in understanding the significance of incorporating storage solutions like Mimir into your observability stack.

‍

It's important to note that, although we use Mimir as a battle-tested example in this article, you are encouraged to explore alternative options for establishing centralized, long-term storage for your metrics.

‍

Have you encountered any experiences or faced challenges with similar tools? Feel free to share them in the comments below; we would love to hear about them and engage in discussions!