StripeLog
This engine belongs to the family of log engines. See the common properties of log engines and their differences in the Log Engine Family article.
Use this engine in scenarios when you need to write many tables with a small amount of data (less than 1 million rows). For example, this table can be used to store incoming data batches for transformation where atomic processing of them is required. 100k instances of this table type are viable for a ClickHouse server. This table engine should be preferred over Log when a high number of tables are required. This is at the expense of read efficiency.
Creating a Table
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
column1_name [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
column2_name [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
...
) ENGINE = StripeLog
See the detailed description of the CREATE TABLE query.
Writing the Data
The StripeLog
engine stores all the columns in one file. For each INSERT
query, ClickHouse appends the data block to the end of a table file, writing columns one by one.
For each table ClickHouse writes the files:
data.bin
— Data file.index.mrk
— File with marks. Marks contain offsets for each column of each data block inserted.
The StripeLog
engine does not support the ALTER UPDATE
and ALTER DELETE
operations.
Reading the Data
The file with marks allows ClickHouse to parallelize the reading of data. This means that a SELECT
query returns rows in an unpredictable order. Use the ORDER BY
clause to sort rows.
Example of Use
Creating a table:
CREATE TABLE stripe_log_table
(
timestamp DateTime,
message_type String,
message String
)
ENGINE = StripeLog
Inserting data:
INSERT INTO stripe_log_table VALUES (now(),'REGULAR','The first regular message')
INSERT INTO stripe_log_table VALUES (now(),'REGULAR','The second regular message'),(now(),'WARNING','The first warning message')
We used two INSERT
queries to create two data blocks inside the data.bin
file.
ClickHouse uses multiple threads when selecting data. Each thread reads a separate data block and returns resulting rows independently as it finishes. As a result, the order of blocks of rows in the output does not match the order of the same blocks in the input in most cases. For example:
SELECT * FROM stripe_log_table
┌───────────timestamp─┬─message_type─┬─message────────────────────┐
│ 2019-01-18 14:27:32 │ REGULAR │ The second regular message │
│ 2019-01-18 14:34:53 │ WARNING │ The first warning message │
└─────────────────────┴──────────────┴────────────────────────────┘
┌───────────timestamp─┬─message_type─┬─message───────────────────┐
│ 2019-01-18 14:23:43 │ REGULAR │ The first regular message │
└─────────────────────┴──────────────┴───────────────────────────┘
Sorting the results (ascending order by default):
SELECT * FROM stripe_log_table ORDER BY timestamp
┌───────────timestamp─┬─message_type─┬─message────────────────────┐
│ 2019-01-18 14:23:43 │ REGULAR │ The first regular message │
│ 2019-01-18 14:27:32 │ REGULAR │ The second regular message │
│ 2019-01-18 14:34:53 │ WARNING │ The first warning message │
└─────────────────────┴──────────────┴────────────────────────────┘