Start a Local Query Cluster

tip

Expected deployment time: 5 minutes ⏱

Once you've installed Databend, it's very simple to run a databend-query cluster locally.

The new databend-query node only needs to register itself to the databend-meta with the same cluster_id, they will autodiscovery and formed into a cluster.

Before you begin

Make sure you have already installed Databend.
Databend Cluster mode only works on shared storage(AWS S3 or MinIO s3-like storage).
Note that running multiple nodes on a single host is useful for testing Databend, but it's not suitable for production.

Step 1. Start standalone

Install Databend with standalone mode, please see Install Databend with MinIO.

Step 2. Scale new databend-query Node to the Cluster

2.1 Create databend-query-node2.toml

databend-query-node2.toml
[log]
level = "INFO"
dir = "benddata/_logs"

[query]
# For admin RESET API.
admin_api_address = "127.0.0.1:8002"

# Metrics.
metric_api_address = "127.0.0.1:7072"

# Cluster flight RPC.
flight_api_address = "127.0.0.1:9092"

# Query MySQL Handler.
mysql_handler_host = "127.0.0.1"
mysql_handler_port = 3308

# Query ClickHouse Handler.
clickhouse_handler_host = "127.0.0.1"
clickhouse_handler_port = 9002

# Query ClickHouse HTTP Handler.
clickhouse_http_handler_host = "127.0.0.1"
clickhouse_http_handler_port = 8125

# Query HTTP Handler.
http_handler_host = "127.0.0.1"
http_handler_port = 8082

tenant_id = "tenant1"
cluster_id = "cluster1"

[meta]
meta_address = "127.0.0.1:9101"
meta_username = "root"
meta_password = "root"


[storage]
# s3
type = "s3"

# The storage should be same for the cluster.
[storage.s3]
bucket = "databend"
endpoint_url = "http://127.0.0.1:9900"
access_key_id = "minioadmin"
secret_access_key = "minioadmin"

2.2 Start new databend-query Node

./databend-query -c ./databend-query-node2.toml 2>&1 > query.node2.log&

2.3 Check new databend-query Node Status

curl -I  http://127.0.0.1:8002/v1/health

Check the response is HTTP/1.1 200 OK.

2.4 Check the Cluster Information

mysql -h127.0.0.1 -uroot -P3308

SELECT * FROM system.clusters

+------------------------+-----------+------+
| name                   | host      | port |
+------------------------+-----------+------+
| QXyxUbieMYMV6OGrjoDKL6 | 127.0.0.1 | 9092 |
| Y1lJiseTjCLwpVRYItQ2f3 | 127.0.0.1 | 9091 |
+------------------------+-----------+------+

Step 3. Distributed query

EXPLAIN SELECT max(number), sum(number) FROM numbers_mt(10000000000) GROUP BY number % 3, number % 4, number % 5 LIMIT 10;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| explain                                                                                                                                                                                                           |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Limit: 10                                                                                                                                                                                                         |
|   RedistributeStage[expr: 0]                                                                                                                                                                                      |
|     Projection: max(number):UInt64, sum(number):UInt64                                                                                                                                                            |
|       AggregatorFinal: groupBy=[[(number % 3), (number % 4), (number % 5)]], aggr=[[max(number), sum(number)]]                                                                                                    |
|         RedistributeStage[expr: sipHash(_group_by_key)]                                                                                                                                                           |
|           AggregatorPartial: groupBy=[[(number % 3), (number % 4), (number % 5)]], aggr=[[max(number), sum(number)]]                                                                                              |
|             Expression: (number % 3):UInt8, (number % 4):UInt8, (number % 5):UInt8, number:UInt64 (Before GroupBy)                                                                                                |
|               ReadDataSource: scan schema: [number:UInt64], statistics: [read_rows: 10000000000, read_bytes: 80000000000, partitions_scanned: 1000001, partitions_total: 1000001], push_downs: [projections: [0]] |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

The distributed query works, the cluster will efficient transfer data through flight_api_address.

tip

Before you begin​

Step 1. Start standalone​

Step 2. Scale new databend-query Node to the Cluster​

2.1 Create databend-query-node2.toml​

2.2 Start new databend-query Node​

2.3 Check new databend-query Node Status​

2.4 Check the Cluster Information​

Step 3. Distributed query​

Before you begin

Step 1. Start standalone

Step 2. Scale new databend-query Node to the Cluster

2.1 Create databend-query-node2.toml

2.2 Start new databend-query Node

2.3 Check new databend-query Node Status

2.4 Check the Cluster Information

Step 3. Distributed query