Join us for a virtual meetup on Zoom at 8 PM, July 31 (PDT) about using One Time Series Database for Both Metrics and Logs 👉🏻 Register Now

Skip to content
On this page
Biweekly
June 19, 2024

Support for Log Data Ingestion, Enhanced Interaction with Grafana Plugin | Greptime Biweekly Report

A recap of the past two-weeks progress and changes happened on GreptimeDB.

Summary

Together with our global community of contributors, GreptimeDB continues to evolve and flourish as a growing open-source project. We are grateful to each and every one of you.

Below are the highlights among recent commits:

  • Log Ingestion Support: process and transform log data using Elastic Ingest Pipelines configuration, then ingest typed data into table schemalessly.

  • Implement SHOW CREATE FLOW: GreptimeFlow is continuously getting matured by supporting SHOW CREATE FLOW.

  • Simplify Parquet Writer: by removing redundant Writer between Arrow and OpenDAL/S3, reduce 6% time consumption in writing Parquet files.

New Projects

Released Grafana GreptimeDB Plugin

We have released the Grafana GreptimeDB plugin, which is based on the Grafana Prometheus plugin. This plugin provides better interaction and functionality support for GreptimeDB, including support for GreptimeDB's multi-value model. It is currently available for local installation. For more details: https://github.com/GreptimeTeam/greptimedb-grafana-datasource/

Contributors

For the past two weeks, our community has been super active with a total of 55 PRs merged. 6 PRs from 5 individual contributors merged successfully and lots pending to be merged.

@cjwcommuny (db#4117)

@irenjjdb#4040

@realtaobo (db#4088)

@WL2O2O (dashboard#433)

@yuanbohan (db#4121 db#4123)

Congrats on becoming our most active contributors in the past 2 weeks:

👏 Welcome contributor @cjwcommuny @WL2O2O join to the community as the new individual contributor, and congratulations on successfully merging their first PR, more PRs are waiting to be merged.

New Contributor of GreptimeDB
New Contributor of GreptimeDB

A big THANK YOU to all our members and contributors! It is people like you who are making GreptimeDB a great product. Let's build an even greater community together.

Highlights of Recent PRs

db#4014 Log Ingestion Support

This PR introduces support for log ingestion. We use Elastic Ingest Pipelines syntax to define process and transform behavior, which we call Pipelines. After uploading Pipeline model to database, we can then use this Pipeline to process log into structured data and insert into tables.

For example, we can create a Pipeline like the following:

shell
curl -X "POST" "http://localhost:4000/v1/events/pipelines/test" \
     -H 'Content-Type: application/x-yaml' \
     -d 'processors:
  - date:
      field: time
      formats:
        - "%Y-%m-%d %H:%M:%S%.3f"
      ignore_missing: true

transform:
  - fields:
      - id1
      - id2
    type: int32
  - fields:
      - type
      - log
      - logger
    type: string
  - field: time
    type: time
    index: timestamp
'

It also supports putting the content into a file and uploading the whole file. Now a Pipeline named test is created in greptime_private.pipelines table. Then we can try to put some log into database:

shell
curl -X "POST" "http://localhost:4000/v1/events/logs?db=public&table=logs1&pipeline_name=test" \
     -H 'Content-Type: application/json' \
     -d '[
    {
      "id1": "2436",
      "id2": "2528",
      "logger": "INTERACT.MANAGER",
      "type": "I",
      "time": "2024-05-25 20:16:37.217",
      "log": "ClusterAdapter:enter sendTextDataToCluster\\n"
    }
  ]'

The log data is JSON formatted. The new /v1/events/logs api will look for Pipeline from pipeline_name parameter to process the Payload data. Note how the field is related to the Pipeline definition. A table named logs1 is created(if not exist already) and typed data is inserted into the table.

shell
mysql> show create table logs1;
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                                                                                                    |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logs1 | CREATE TABLE IF NOT EXISTS `logs1` (
  `id1` INT NULL,
  `id2` INT NULL,
  `type` STRING NULL,
  `log` STRING NULL,
  `logger` STRING NULL,
  `time` TIMESTAMP(9) NOT NULL,
  TIME INDEX (`time`)
)

ENGINE=mito
WITH(
  append_mode = 'true'
) |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)

mysql> select * from logs1;
+------+------+------+---------------------------------------------+------------------+----------------------------+
| id1  | id2  | type | log                                         | logger           | time                       |
+------+------+------+---------------------------------------------+------------------+----------------------------+
| 2436 | 2528 | I    | ClusterAdapter:enter sendTextDataToCluster
 | INTERACT.MANAGER | 2024-05-25 20:16:37.217000 |
+------+------+------+---------------------------------------------+------------------+----------------------------+
1 row in set (0.03 sec)

db#4112 Simplify Parquet Writer

The BufferedWriter came to solve the problem that Arrow's Parquet Writer requires std::io::Write while OpenDAL only provides async S3 Writer that implements tokio::io::AsyncWrite. Now Arrow provides AsyncArrowWriter, we can remove those structs.

By removing those redundant structures and extra code paths, we achieve a 6% improvement on time consumption in writing Parquet files.

db#4040 Implement SHOW CREATE FLOW

Now we can SHOW CREATE FLOW after a Flow is created. For example, if we create a Flow like the following:

shell
mysql> CREATE FLOW IF NOT EXISTS my_flow
    -> SINK TO my_sink_table
    -> EXPIRE AFTER INTERVAL '1 hour'
    -> AS
    -> SELECT count(1) from monitor;
Query OK, 0 rows affected (0.04 sec)

We can use SHOW CREATE FLOW my_flow to check create sentences later on.

sql
mysql> show create flow my_flow;
+---------+-----------------------------------------------------------------------------------------------------------------------+
| Flow    | Create Flow                                                                                                           |
+---------+-----------------------------------------------------------------------------------------------------------------------+
| my_flow | CREATE OR REPLACE FLOW IF NOT EXISTS my_flow
SINK TO my_sink_table
EXPIRE AFTER 3600
AS SELECT count(1) FROM monitor |
+---------+-----------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

db#4151 Guide For Benchmarking GreptimeDB

We use a modified version of tsbs to benchmark GreptimeDB. Now we provide a guide so that users can run benchmarks too. It also can be used to run benchmark against any other database that tsbs supports, so that a comparison is generated. Please note to use release build before running the benchmark.

Please refer the benchmark guide here.

Good First Issue

db#4157 Fix information_schema.region_peers returns same region_id

We 'store' region_id in information_schema.region_peers. However, it seems only one region_id is returned even with multiple region peers. Find out if there is a bug in assembling the return value of information_schema.region_peers and fix it.

Keywords: Information Schema

Difficulty: Easy

biweekly

Join our community

Get the latest updates and discuss with other users.