Summary
Together with our global community of contributors, GreptimeDB continues to evolve and flourish as a growing open-source project. We are grateful to each and every one of you.
Below are the highlights among recent commits:
Log Ingestion Support: process and transform log data using Elastic Ingest Pipelines configuration, then ingest typed data into table schemalessly.
Implement
SHOW CREATE FLOW
: GreptimeFlow is continuously getting matured by supportingSHOW CREATE FLOW
.Simplify Parquet Writer: by removing redundant Writer between Arrow and OpenDAL/S3, reduce 6% time consumption in writing Parquet files.
New Projects
Released Grafana GreptimeDB Plugin
We have released the Grafana GreptimeDB plugin, which is based on the Grafana Prometheus plugin. This plugin provides better interaction and functionality support for GreptimeDB, including support for GreptimeDB's multi-value model. It is currently available for local installation. For more details: https://github.com/GreptimeTeam/greptimedb-grafana-datasource/
Contributors
For the past two weeks, our community has been super active with a total of 55 PRs merged. 6 PRs from 5 individual contributors merged successfully and lots pending to be merged.
Congrats on becoming our most active contributors in the past 2 weeks:
👏 Welcome contributor @cjwcommuny @WL2O2O join to the community as the new individual contributor, and congratulations on successfully merging their first PR, more PRs are waiting to be merged.
A big THANK YOU to all our members and contributors! It is people like you who are making GreptimeDB a great product. Let's build an even greater community together.
Highlights of Recent PRs
db#4014 Log Ingestion Support
This PR introduces support for log ingestion. We use Elastic Ingest Pipelines syntax to define process and transform behavior, which we call Pipelines. After uploading Pipeline model to database, we can then use this Pipeline to process log into structured data and insert into tables.
For example, we can create a Pipeline like the following:
curl -X "POST" "http://localhost:4000/v1/events/pipelines/test" \
-H 'Content-Type: application/x-yaml' \
-d 'processors:
- date:
field: time
formats:
- "%Y-%m-%d %H:%M:%S%.3f"
ignore_missing: true
transform:
- fields:
- id1
- id2
type: int32
- fields:
- type
- log
- logger
type: string
- field: time
type: time
index: timestamp
'
It also supports putting the content into a file and uploading the whole file. Now a Pipeline named test is created in greptime_private.pipelines
table. Then we can try to put some log into database:
curl -X "POST" "http://localhost:4000/v1/events/logs?db=public&table=logs1&pipeline_name=test" \
-H 'Content-Type: application/json' \
-d '[
{
"id1": "2436",
"id2": "2528",
"logger": "INTERACT.MANAGER",
"type": "I",
"time": "2024-05-25 20:16:37.217",
"log": "ClusterAdapter:enter sendTextDataToCluster\\n"
}
]'
The log data is JSON formatted. The new /v1/events/logs
api will look for Pipeline from pipeline_name
parameter to process the Payload data. Note how the field is related to the Pipeline definition. A table named logs1
is created(if not exist already) and typed data is inserted into the table.
mysql> show create table logs1;
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logs1 | CREATE TABLE IF NOT EXISTS `logs1` (
`id1` INT NULL,
`id2` INT NULL,
`type` STRING NULL,
`log` STRING NULL,
`logger` STRING NULL,
`time` TIMESTAMP(9) NOT NULL,
TIME INDEX (`time`)
)
ENGINE=mito
WITH(
append_mode = 'true'
) |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)
mysql> select * from logs1;
+------+------+------+---------------------------------------------+------------------+----------------------------+
| id1 | id2 | type | log | logger | time |
+------+------+------+---------------------------------------------+------------------+----------------------------+
| 2436 | 2528 | I | ClusterAdapter:enter sendTextDataToCluster
| INTERACT.MANAGER | 2024-05-25 20:16:37.217000 |
+------+------+------+---------------------------------------------+------------------+----------------------------+
1 row in set (0.03 sec)
db#4112 Simplify Parquet Writer
The BufferedWriter
came to solve the problem that Arrow's Parquet Writer requires std::io::Write
while OpenDAL only provides async S3 Writer that implements tokio::io::AsyncWrite
. Now Arrow provides AsyncArrowWriter
, we can remove those structs.
By removing those redundant structures and extra code paths, we achieve a 6% improvement on time consumption in writing Parquet files.
db#4040 Implement SHOW CREATE FLOW
Now we can SHOW CREATE FLOW
after a Flow is created. For example, if we create a Flow like the following:
mysql> CREATE FLOW IF NOT EXISTS my_flow
-> SINK TO my_sink_table
-> EXPIRE AFTER INTERVAL '1 hour'
-> AS
-> SELECT count(1) from monitor;
Query OK, 0 rows affected (0.04 sec)
We can use SHOW CREATE FLOW my_flow
to check create sentences later on.
mysql> show create flow my_flow;
+---------+-----------------------------------------------------------------------------------------------------------------------+
| Flow | Create Flow |
+---------+-----------------------------------------------------------------------------------------------------------------------+
| my_flow | CREATE OR REPLACE FLOW IF NOT EXISTS my_flow
SINK TO my_sink_table
EXPIRE AFTER 3600
AS SELECT count(1) FROM monitor |
+---------+-----------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
db#4151 Guide For Benchmarking GreptimeDB
We use a modified version of tsbs
to benchmark GreptimeDB. Now we provide a guide so that users can run benchmarks too. It also can be used to run benchmark against any other database that tsbs
supports, so that a comparison is generated. Please note to use release build before running the benchmark.
Please refer the benchmark guide here.
Good First Issue
db#4157 Fix information_schema.region_peers
returns same region_id
We 'store' region_id
in information_schema.region_peers
. However, it seems only one region_id
is returned even with multiple region peers. Find out if there is a bug in assembling the return value of information_schema
.region_peers and fix it.
Keywords: Information Schema
Difficulty: Easy