To understand the reasons why we created GreptimeDB, please read our blog post "This Time, for Real".
GreptimeDB is a cloud-native time-series database that is distributed, scalable, and efficient. Our solution is optimized to manage vast amounts of time-series data, making it ideal for handling hybrid processing workload of time-series ingestion/query and analysis. Whether your system is working at the edge or in the cloud, GreptimeDB can handle any size of workload with ease. With our focus on scalability and analytics, we are confident to provide solutions and services that fit all your needs.
We followed the following principles when designing and developing GreptimeDB:
- Cloud-native: From day 0, GreptimeDB was designed for the cloud and positioned itself as a cloud-native database that fully utilizes the infrastructure and capabilities of the cloud to run and serve, with excellent scalability and fault tolerance.
- User-friendly: We understand that the developer experience is crucial, and a key principle of GreptimeDB's design is to be developer-friendly, not only in terms of development, deployment and operations, but also in terms of seamless compatibility with data ecosystems, as well as continuously improving documentation and community guidelines.
- High-Performance: Time-series data usually faces large-scale data ingesting, querying, and analysis. Performance optimization in high-concurrency scenarios is also one of our core principles when designing GreptimeDB.
- Flexible architecture: Through well-abstracted layering and encapsulation isolation, GreptimeDB's deployment form can meet various environments from embedded, standalone, and traditional clusters to cloud-native.
The architecture of GreptimeDB
Next, we will briefly introduce these principles.
Cloud-native
GreptimeDB is designed to run in the cloud and take full advantage of the cloud, such as elasticity, scalability, and high availability.
Storage/Compute Disaggregation, Compute/Compute separation
Storage/Compute Disaggregation
Separating the storage and compute resources in the cloud has several benefits:
- Easy and independent scaling based on demand
- Data can be written to cost-effective cloud storage services such as AWS S3 or Azure Blob Storage
- Serverless containers can be used for automatic and elastic scaling of compute resources
Compute/Compute Separation
Separating compute resources for different workloads:
- Isolating different compute resources to avoid contention for tasks such as data ingestion and queries, ad-hoc queries, and data compaction or rollup
- Sharing data among multiple applications
- Providing unlimited concurrency scalability based on demand
User-friendly
Time-Series Table, Schemaless design
Combining the metric (Measurement/Tag/Field/Timestamp) model and the relational data model (Table), GreptimeDB provides a new data model called a time-series table, which represents data in the form of tables consisting of rows and columns, with tags and fields from the metric mapped to columns, and an enforced time index constraint that represents the timestamp.
Time-Series Table
Nevertheless, our definition of a schema is not mandatory but leans more towards the schemaless approach of databases like MongoDB. The table will be created dynamically and automatically when data is ingested, and new columns (tags and fields) will be added as they appear.
PromQL, SQL and Python
GreptimeDB supports PromQL and SQL, both of which rely on the same query engine. The engine employs vectorized execution, which is parallelized, and distributed.
PromQL is a popular query language that allows users to select and aggregate real-time time series data provided by Prometheus. It is much simpler to use than SQL for visualization with Grafana and creating alert rules. GreptimeDB supports PromQL natively and effectively by transforming it into a query plan, which is then optimized and executed by the query engine.
SQL is more powerful in analyzing data that covers a long time span or multiple tables, such as table joins. SQL is also convenient for database management.
Python is very popular among data scientists and AI experts and GreptimeDB enables Python scripts to be run directly in the database. Users can write their own user-defined function (UDF) and use DataFrame API to accelerate data processing by embedding the Python interpreter.
Easy to deploy and maintain
To simplify deployment and maintenance processes, GreptimeDB provides K8s operator, command-line tool, embedded dashboard, and other useful tools for users to configure and manage their databases easily. If you are looking for a fully-managed cloud service, check GreptimeCloud on our official website.
Easy to integrate
Several protocols are supported for database connectivity, including MySQL, PostgreSQL, InfluxDB, OpenTSDB, Prometheus RemoteStorage, and high-performance gRPC. Additionally, SDKs are provided for various programming languages, such as Java, Go, Erlang, and others. We are consistently integrating and connecting with open-source software in the ecosystem to enhance the developer experience.
High-Performance
As for performance optimization, GreptimeDB utilizes different techniques such as, LSM Tree, data sharding and quorum-based WAL design, to handle large workloads of time-series data ingestion.
GreptimeDB also reduces storage costs and addresses high cardinality issues of time-series data through columnar layout storage, adaptive compression algorithms, and smart indexing.
The powerful and fast query engine is powered by vectorized execution and distributed parallel processing, combined with indexing capabilities.
Flexible architecture
Different modules and components can be independently switched on, combined, or separated through modularization and layered design. For example, we can merge the frontend, datanode, and meta server into a standalone binary, and we can also independently enable or disable the WAL for every table.
Such a flexible architecture design allows GreptimeDB to meet deployment and usage requirements in scenarios from the edge to the cloud, while still using the same set of APIs and control panels.