โœ•

Join us at two must-attend conferences this summer and claim your exclusive SWAG! ๐ŸŽ

IOT TECH EXPO, June 5-6 in Santa Clara! Booth 270 | DATA + AI SUMMIT, June 11-13 in San Francisco! Booth e12

โœ•
Skip to content
On this page

Subscribe to our newsletter

Get the latest dates and news about GreptimeDB.
Engineering
โ€ข
February 14, 2023

PromQL for Beginner

PromQL is widely recognized as the most commonly used query language in cloud-native Observability. This article provides a brief analysis of the history and features of PromQL, illustrating its advantages over SQL through simple examples.

History of PromQL โ€‹

PromQL (Prometheus Query Language) is the query language used in the Prometheus monitoring system for querying, analyzing, and aggregating time series data. With its powerful and flexible expression capabilities, PromQL enables developers to quickly extract and manipulate time series data, making it the most widely used query language in Cloud-native Observability. Unlike SQL, which is a domain-specific language for managing data held in general database, PromQL is a domain-specific language designed for monitoring of time series data.

As early as 2012, Prometheus was jointly developed by Julius Volz and Matt Proud while they were at SoundCloud, later, PromQL was born as its query language. However, It is interesting to note that PromQL has its roots in Borgmon, the query language used to monitor the Borg system, which was the predecessor of Kubernetes. Thus, PromQL has a cloud-native observability background from its very inception.

Why PromQL โ€‹

According to the well-known equation of building a popular product, User Value = New Experience - Old Experience - Substitution Cost, switching to a new language requires a significant investment of effort. Since SQL is already powerful and well-established, why do we need to spend time learning PromQL?

PromQL is a domain-specific language tailored for monitoring scenarios of time-series data. In monitoring, a large amount of system metrics are collected in the format of time-series, then aggregated, analyzed, and calculated to find patterns in the system and promptly detect anomalies and locate issues. Monitoring metrics have two key attributes: label and time. A series of metrics is integrated through label, such as the CPU usage of a certain server, they may also aggregated through time, such as the temperature changes in the past 7 days. PromQL provides user-friendly syntax and functions for common operations such as selecting, calculating and analyzing this type of data.

As the saying goes, "talk is cheap, show me the code", let's take some simple examples to better illustrate the advantage of PromQL when dealing with time series scenarios.

Examples โ€‹

Background โ€‹

Assuming that we need to detect the environmental information of a city, data indicators include temperature, humidity, air quality, noise, water pollution, etc. These indicators have been collected through the corresponding sensors and stored in a suitable time-series database. For simplicity, we take the temperature and noise indicators as examples. The name of temperature indicator is city_env_temp_total, and noise indicator is city_env_noise_total. For a traditional database, we assume that the table name is the indicator name, that is, there are two tables city_env_temp_total and city_env_noise_total.

Queries โ€‹

Let's start with a simple query for an imaginary use-case: view the temperature index of all cities at all times. The SQL and PromQL statements are as follows:

SQL:

sql
SELECT *
FROM city_env_temp_total;

PromQL

city_env_temp_total

Simpler than SQL, PromQL can directly retrieve all indicator data through the indicator name. Then, how can we view the temperature indicator data of a certain city at all times?

SQL:

sql
SELECT *
FROM city_env_temp_total
WHERE city = "New York";

PromQL

city_env_temp_total{city="New York"}

PromQL allows for filtering and matching through {}, which is similar to the where clause in SQL. It also supports regular expressions, the following label-matching operators exist:

  • =: Select labels that are exactly equal to the provided string.
  • !=: Select labels that are not equal to the provided string.
  • =~: Select labels that regex-match the provided string.
  • !~: Select labels that do not regex-match the provided string.

For example, if we want to query the temperature of multiple cities, we can write the following expressions: SQL:

sql
SELECT *
FROM city_env_temp_total
WHERE city = "New York" OR city = "London" OR city = "Paris";

PromQL:

sql
city_env_temp_total{city=~"New York|London|Paris}"

In practical situations, we wouldn't analyze all the environmental indicators from all the time, but rather look at the most recent period, such as the last month or year.

SQL:

sql
SELECT *
FROM city_env_temp_total
WHERE city = "New York" and data >= (CURDATE() - INTERVAL 1 MONTH )

PromQL:

city_env_temp_total{city="New York"}[30d]

A time duration is appended in square brackets [] at the end of a vector selector to specify how far back in time values should be fetched for each resulting range vector. Time durations are specified as a number, followed immediately by one of the following units:

  • ms - milliseconds
  • s - seconds
  • m - minutes
  • h - hours
  • d - days - assuming a day has always 24h
  • w - weeks - assuming a week has always 7d
  • y - years - assuming a year has always 365d

PromQL also provides other operators to simplify queries, such as offset among others, these will not be discussed in detail here. If interested, you can refer to the official manual range-vector-selectors.

Analytics โ€‹

Additionally, PromQL provides a vast number of functions for computation, such as

  • sum()
  • min()
  • max()
  • avg()
  • rate()

See more details here

For example, if we want to query cities where the average noise is greater than 100 dB for the past month.

SQL:

sql
SELECT city, AVG(noise) as avg_noise
FROM city_env_noise_total
GROUP BY city, DATE(timestamp)
HAVING AVG(noise) > 100;

PromQL:

avg_over_time(city_env_noise_total[30d]) > 100

Let's try a more complex example: examining the average noise levels in each city over the past month, filtering out cities where the average noise level was over 100 decibels for at least 15 days.

SQL(Generated by ChatGPT):

sql
SELECT city,
       (SUM(CASE WHEN avg_noise > 100 THEN 1 ELSE 0 END) / 30) > 0.5 as above_threshold
FROM (
  SELECT city, AVG(noise) as avg_noise
  FROM city_env_noise_total
  WHERE timestamp >= NOW() - INTERVAL 1 MONTH
  GROUP BY city
) data
GROUP BY city

PromQL:

sum(
  avg_over_time(city_env_noise_total[30d]) > bool 100
) by (city) / 30 > 0.5

The examples above demonstrate that PromQL has powerful capabilities for querying and analyzing time series data. However, compared to SQL, it has some limitations, such as the inability to perform join queries. After all, PromQL is a domain-specific language, thereby, choosing the most appropriate language according to different use-case is crucial to achieve best practices.

Conclusion โ€‹

There's a joke, "Even after Prometheus (the company) is long gone, PromQL will continue to be passed down in its own way." As a developer, I sincerely appreciate Prometheus's community for introducing such a fantastic query language into the time series field. The PromQL community has remained active over the years, optimizing aggregate functions and query performance, with developers extending numerous new syntaxes based on PromQL. PromQL has become the standard language for working with time series data in a cloud-native era. GreptimeDB will fully support PromQL in the release of version 0.1, hoping to provide real value to users.

Join our community

Get the latest updates and discuss with other users.

Subscribe to our newsletter

Get the latest dates and news about GreptimeDB.