Relational Data - Data Types - Data Types Binary Encoding: Difference between revisions
Chris.Hansen (talk | contribs) (Import ClickHouse Docs: Wed Aug 28 2024 14:44:14 GMT-0400 (Eastern Daylight Time)) |
Chris.Hansen (talk | contribs) (Import ClickHouse Docs: Wed Aug 28 2024 15:05:32 GMT-0400 (Eastern Daylight Time)) |
||
Line 1: | Line 1: | ||
This specification describes the binary format that can be used for binary encoding and decoding of ClickHouse data types. This format is used in <code>Dynamic</code> column [[Relational_Data_-_Data_Types_-_Dynamic#binary-output-format|binary serialization]] and can be used in input/output formats [https://clickhouse.com/docs/en/interfaces/formats#rowbinarywithnamesandtypes RowBinaryWithNamesAndTypes] and [https://clickhouse.com/docs/en/interfaces/formats#native Native] under corresponding settings. | |||
<code> | |||
The table below describes how each data type is represented in binary format. Each data type encoding consist of 1 byte that indicates the type and some optional additional information. <code>var_uint</code> in the binary encoding means that the size is encoded using Variable-Length Quantity compression. | |||
<code> | |||
<code> | {| class="wikitable" | ||
<code> | |- | ||
<code> | ! ClickHouse data type | ||
<code> | ! Binary encoding | ||
<code> | |- | ||
<code> | | <code>Nothing</code> | ||
<code> | | <code>0x00</code> | ||
<code> | |- | ||
<code> | | <code>UInt8</code> | ||
<code> | | <code>0x01</code> | ||
<code> | |- | ||
<code> | | <code>UInt16</code> | ||
<code> | | <code>0x02</code> | ||
<code> | |- | ||
<code> | | <code>UInt32</code> | ||
<code> | | <code>0x03</code> | ||
<code> | |- | ||
| <code>UInt64</code> | |||
<code>AggregateFunctionState</code> | <code>0x15<var_uint_name_size><name_data><var_uint_data_size><data></code> | | | <code>0x04</code> | ||
<code>Negative infinity</code> | <code>0xFE</code> | | |- | ||
<code>Positive infinity</code> | <code>0xFF</code> | | | <code>UInt128</code> | ||
| <code>0x05</code> | |||
|- | |||
| <code>UInt256</code> | |||
| <code>0x06</code> | |||
|- | |||
| <code>Int8</code> | |||
| <code>0x07</code> | |||
|- | |||
| <code>Int16</code> | |||
| <code>0x08</code> | |||
|- | |||
| <code>Int32</code> | |||
| <code>0x09</code> | |||
|- | |||
| <code>Int64</code> | |||
| <code>0x0A</code> | |||
|- | |||
| <code>Int128</code> | |||
| <code>0x0B</code> | |||
|- | |||
| <code>Int256</code> | |||
| <code>0x0C</code> | |||
|- | |||
| <code>Float32</code> | |||
| <code>0x0D</code> | |||
|- | |||
| <code>Float64</code> | |||
| <code>0x0E</code> | |||
|- | |||
| <code>Date</code> | |||
| <code>0x0F</code> | |||
|- | |||
| <code>Date32</code> | |||
| <code>0x10</code> | |||
|- | |||
| <code>DateTime</code> | |||
| <code>0x11</code> | |||
|- | |||
| <code>DateTime(time_zone)</code> | |||
| <code>0x12<var_uint_time_zone_name_size><time_zone_name_data></code> | |||
|- | |||
| <code>DateTime64(P)</code> | |||
| <code>0x13<uint8_precision></code> | |||
|- | |||
| <code>DateTime64(P, time_zone)</code> | |||
| <code>0x14<uint8_precision><var_uint_time_zone_name_size><time_zone_name_data></code> | |||
|- | |||
| <code>String</code> | |||
| <code>0x15</code> | |||
|- | |||
| <code>FixedString(N)</code> | |||
| <code>0x16<var_uint_size></code> | |||
|- | |||
| <code>Enum8</code> | |||
| <code>0x17<var_uint_number_of_elements><var_uint_name_size_1><name_data_1><int8_value_1>...<var_uint_name_size_N><name_data_N><int8_value_N></code> | |||
|- | |||
| <code>Enum16</code> | |||
| <code>0x18<var_uint_number_of_elements><var_uint_name_size_1><name_data_1><int16_little_endian_value_1>...><var_uint_name_size_N><name_data_N><int16_little_endian_value_N></code> | |||
|- | |||
| <code>Decimal32(P, S)</code> | |||
| <code>0x19<uint8_precision><uint8_scale></code> | |||
|- | |||
| <code>Decimal64(P, S)</code> | |||
| <code>0x1A<uint8_precision><uint8_scale></code> | |||
|- | |||
| <code>Decimal128(P, S)</code> | |||
| <code>0x1B<uint8_precision><uint8_scale></code> | |||
|- | |||
| <code>Decimal256(P, S)</code> | |||
| <code>0x1C<uint8_precision><uint8_scale></code> | |||
|- | |||
| <code>UUID</code> | |||
| <code>0x1D</code> | |||
|- | |||
| <code>Array(T)</code> | |||
| <code>0x1E<nested_type_encoding></code> | |||
|- | |||
| <code>Tuple(T1, ..., TN)</code> | |||
| <code>0x1F<var_uint_number_of_elements><nested_type_encoding_1>...<nested_type_encoding_N></code> | |||
|- | |||
| <code>Tuple(name1 T1, ..., nameN TN)</code> | |||
| <code>0x20<var_uint_number_of_elements><var_uint_name_size_1><name_data_1><nested_type_encoding_1>...<var_uint_name_size_N><name_data_N><nested_type_encoding_N></code> | |||
|- | |||
| <code>Set</code> | |||
| <code>0x21</code> | |||
|- | |||
| <code>Interval</code> | |||
| <code>0x22<interval_kind></code> (see [[#interval-kind-binary-encoding|interval kind binary encoding]]) | |||
|- | |||
| <code>Nullable(T)</code> | |||
| <code>0x23<nested_type_encoding></code> | |||
|- | |||
| <code>Function</code> | |||
| <code>0x24<var_uint_number_of_arguments><argument_type_encoding_1>...<argument_type_encoding_N><return_type_encoding></code> | |||
|- | |||
| <code>AggregateFunction(function_name(param_1, ..., param_N), arg_T1, ..., arg_TN)</code> | |||
| <code>0x25<var_uint_version><var_uint_function_name_size><function_name_data><var_uint_number_of_parameters><param_1>...<param_N><var_uint_number_of_arguments><argument_type_encoding_1>...<argument_type_encoding_N></code> (see [[#aggregate-function-parameter-binary-encoding|aggregate function parameter binary encoding]]) | |||
|- | |||
| <code>LowCardinality(T)</code> | |||
| <code>0x26<nested_type_encoding></code> | |||
|- | |||
| <code>Map(K, V)</code> | |||
| <code>0x27<key_type_encoding><value_type_encoding></code> | |||
|- | |||
| <code>IPv4</code> | |||
| <code>0x28</code> | |||
|- | |||
| <code>IPv6</code> | |||
| <code>0x29</code> | |||
|- | |||
| <code>Variant(T1, ..., TN)</code> | |||
| <code>0x2A<var_uint_number_of_variants><variant_type_encoding_1>...<variant_type_encoding_N></code> | |||
|- | |||
| <code>Dynamic(max_types=N)</code> | |||
| <code>0x2B<uint8_max_types></code> | |||
|- | |||
| <code>Custom type</code> (<code>Ring</code>, <code>Polygon</code>, etc) | |||
| <code>0x2C<var_uint_type_name_size><type_name_data></code> | |||
|- | |||
| <code>Bool</code> | |||
| <code>0x2D</code> | |||
|- | |||
| <code>SimpleAggregateFunction(function_name(param_1, ..., param_N), arg_T1, ..., arg_TN)</code> | |||
| <code>0x2E<var_uint_function_name_size><function_name_data><var_uint_number_of_parameters><param_1>...<param_N><var_uint_number_of_arguments><argument_type_encoding_1>...<argument_type_encoding_N></code> (see [[#aggregate-function-parameter-binary-encoding|aggregate function parameter binary encoding]]) | |||
|- | |||
| <code>Nested(name1 T1, ..., nameN TN)</code> | |||
| <code>0x2F<var_uint_number_of_elements><var_uint_name_size_1><name_data_1><nested_type_encoding_1>...<var_uint_name_size_N><name_data_N><nested_type_encoding_N></code> | |||
|- | |||
| <code>JSON(max_dynamic_paths=N, max_dynamic_types=M, path Type, SKIP skip_path, SKIP REGEXP skip_path_regexp)</code> | |||
| <code>0x30<uint8_serialization_version><var_int_max_dynamic_paths><uint8_max_dynamic_types><var_uint_number_of_typed_paths><var_uint_path_name_size_1><path_name_data_1><encoded_type_1>...<var_uint_number_of_skip_paths><var_uint_skip_path_size_1><skip_path_data_1>...<var_uint_number_of_skip_path_regexps><var_uint_skip_path_regexp_size_1><skip_path_data_regexp_1>...</code> | |||
|} | |||
For type <code>JSON</code> byte <code>uint8_serialization_version</code> indicates the version of the serialization. Right now the version is always 0 but can change in future if new arguments will be introduced for <code>JSON</code> type. | |||
<span id="interval-kind-binary-encoding"></span> | |||
=== Interval kind binary encoding === | |||
The table below describes how different interval kinds of <code>Interval</code> data type are encoded. | |||
{| class="wikitable" | |||
|- | |||
! Interval kind | |||
! Binary encoding | |||
|- | |||
| <code>Nanosecond</code> | |||
| <code>0x00</code> | |||
|- | |||
| <code>Microsecond</code> | |||
| <code>0x01</code> | |||
|- | |||
| <code>Millisecond</code> | |||
| <code>0x02</code> | |||
|- | |||
| <code>Second</code> | |||
| <code>0x03</code> | |||
|- | |||
| <code>Minute</code> | |||
| <code>0x04</code> | |||
|- | |||
| <code>Hour</code> | |||
| <code>0x05</code> | |||
|- | |||
| <code>Day</code> | |||
| <code>0x06</code> | |||
|- | |||
| <code>Week</code> | |||
| <code>0x07</code> | |||
|- | |||
| <code>Month</code> | |||
| <code>0x08</code> | |||
|- | |||
| <code>Quarter</code> | |||
| <code>0x09</code> | |||
|- | |||
| <code>Year</code> | |||
| <code>0x1A</code> | |||
|} | |||
<span id="aggregate-function-parameter-binary-encoding"></span> | |||
=== Aggregate function parameter binary encoding === | |||
The table below describes how parameters of <code>AggragateFunction</code> and <code>SimpleAggregateFunction</code> are encoded. The encoding of a parameter consists of 1 byte indicating the type of the parameter and the value itself. | |||
{| class="wikitable" | |||
|- | |||
! Parameter type | |||
! Binary encoding | |||
|- | |||
| <code>Null</code> | |||
| <code>0x00</code> | |||
|- | |||
| <code>UInt64</code> | |||
| <code>0x01<var_uint_value></code> | |||
|- | |||
| <code>Int64</code> | |||
| <code>0x02<var_int_value></code> | |||
|- | |||
| <code>UInt128</code> | |||
| <code>0x03<uint128_little_endian_value></code> | |||
|- | |||
| <code>Int128</code> | |||
| <code>0x04<int128_little_endian_value></code> | |||
|- | |||
| <code>UInt128</code> | |||
| <code>0x05<uint128_little_endian_value></code> | |||
|- | |||
| <code>Int128</code> | |||
| <code>0x06<int128_little_endian_value></code> | |||
|- | |||
| <code>Float64</code> | |||
| <code>0x07<float64_little_endian_value></code> | |||
|- | |||
| <code>Decimal32</code> | |||
| <code>0x08<var_uint_scale><int32_little_endian_value></code> | |||
|- | |||
| <code>Decimal64</code> | |||
| <code>0x09<var_uint_scale><int64_little_endian_value></code> | |||
|- | |||
| <code>Decimal128</code> | |||
| <code>0x0A<var_uint_scale><int128_little_endian_value></code> | |||
|- | |||
| <code>Decimal256</code> | |||
| <code>0x0B<var_uint_scale><int256_little_endian_value></code> | |||
|- | |||
| <code>String</code> | |||
| <code>0x0C<var_uint_size><data></code> | |||
|- | |||
| <code>Array</code> | |||
| <code>0x0D<var_uint_size><value_encoding_1>...<value_encoding_N></code> | |||
|- | |||
| <code>Tuple</code> | |||
| <code>0x0E<var_uint_size><value_encoding_1>...<value_encoding_N></code> | |||
|- | |||
| <code>Map</code> | |||
| <code>0x0F<var_uint_size><key_encoding_1><value_encoding_1>...<key_endoding_N><value_encoding_N></code> | |||
|- | |||
| <code>IPv4</code> | |||
| <code>0x10<uint32_little_endian_value></code> | |||
|- | |||
| <code>IPv6</code> | |||
| <code>0x11<uint128_little_endian_value></code> | |||
|- | |||
| <code>UUID</code> | |||
| <code>0x12<uuid_value></code> | |||
|- | |||
| <code>Bool</code> | |||
| <code>0x13<bool_value></code> | |||
|- | |||
| <code>Object</code> | |||
| <code>0x14<var_uint_size><var_uint_key_size_1><key_data_1><value_encoding_1>...<var_uint_key_size_N><key_data_N><value_encoding_N></code> | |||
|- | |||
| <code>AggregateFunctionState</code> | |||
| <code>0x15<var_uint_name_size><name_data><var_uint_data_size><data></code> | |||
|- | |||
| <code>Negative infinity</code> | |||
| <code>0xFE</code> | |||
|- | |||
| <code>Positive infinity</code> | |||
| <code>0xFF</code> | |||
|} | |||
[[Category:Relational_Data]] | [[Category:Relational_Data]] |
Latest revision as of 19:05, 28 August 2024
This specification describes the binary format that can be used for binary encoding and decoding of ClickHouse data types. This format is used in Dynamic
column binary serialization and can be used in input/output formats RowBinaryWithNamesAndTypes and Native under corresponding settings.
The table below describes how each data type is represented in binary format. Each data type encoding consist of 1 byte that indicates the type and some optional additional information. var_uint
in the binary encoding means that the size is encoded using Variable-Length Quantity compression.
ClickHouse data type | Binary encoding |
---|---|
Nothing
|
0x00
|
UInt8
|
0x01
|
UInt16
|
0x02
|
UInt32
|
0x03
|
UInt64
|
0x04
|
UInt128
|
0x05
|
UInt256
|
0x06
|
Int8
|
0x07
|
Int16
|
0x08
|
Int32
|
0x09
|
Int64
|
0x0A
|
Int128
|
0x0B
|
Int256
|
0x0C
|
Float32
|
0x0D
|
Float64
|
0x0E
|
Date
|
0x0F
|
Date32
|
0x10
|
DateTime
|
0x11
|
DateTime(time_zone)
|
0x12<var_uint_time_zone_name_size><time_zone_name_data>
|
DateTime64(P)
|
0x13<uint8_precision>
|
DateTime64(P, time_zone)
|
0x14<uint8_precision><var_uint_time_zone_name_size><time_zone_name_data>
|
String
|
0x15
|
FixedString(N)
|
0x16<var_uint_size>
|
Enum8
|
0x17<var_uint_number_of_elements><var_uint_name_size_1><name_data_1><int8_value_1>...<var_uint_name_size_N><name_data_N><int8_value_N>
|
Enum16
|
0x18<var_uint_number_of_elements><var_uint_name_size_1><name_data_1><int16_little_endian_value_1>...><var_uint_name_size_N><name_data_N><int16_little_endian_value_N>
|
Decimal32(P, S)
|
0x19<uint8_precision><uint8_scale>
|
Decimal64(P, S)
|
0x1A<uint8_precision><uint8_scale>
|
Decimal128(P, S)
|
0x1B<uint8_precision><uint8_scale>
|
Decimal256(P, S)
|
0x1C<uint8_precision><uint8_scale>
|
UUID
|
0x1D
|
Array(T)
|
0x1E<nested_type_encoding>
|
Tuple(T1, ..., TN)
|
0x1F<var_uint_number_of_elements><nested_type_encoding_1>...<nested_type_encoding_N>
|
Tuple(name1 T1, ..., nameN TN)
|
0x20<var_uint_number_of_elements><var_uint_name_size_1><name_data_1><nested_type_encoding_1>...<var_uint_name_size_N><name_data_N><nested_type_encoding_N>
|
Set
|
0x21
|
Interval
|
0x22<interval_kind> (see interval kind binary encoding)
|
Nullable(T)
|
0x23<nested_type_encoding>
|
Function
|
0x24<var_uint_number_of_arguments><argument_type_encoding_1>...<argument_type_encoding_N><return_type_encoding>
|
AggregateFunction(function_name(param_1, ..., param_N), arg_T1, ..., arg_TN)
|
0x25<var_uint_version><var_uint_function_name_size><function_name_data><var_uint_number_of_parameters><param_1>...<param_N><var_uint_number_of_arguments><argument_type_encoding_1>...<argument_type_encoding_N> (see aggregate function parameter binary encoding)
|
LowCardinality(T)
|
0x26<nested_type_encoding>
|
Map(K, V)
|
0x27<key_type_encoding><value_type_encoding>
|
IPv4
|
0x28
|
IPv6
|
0x29
|
Variant(T1, ..., TN)
|
0x2A<var_uint_number_of_variants><variant_type_encoding_1>...<variant_type_encoding_N>
|
Dynamic(max_types=N)
|
0x2B<uint8_max_types>
|
Custom type (Ring , Polygon , etc)
|
0x2C<var_uint_type_name_size><type_name_data>
|
Bool
|
0x2D
|
SimpleAggregateFunction(function_name(param_1, ..., param_N), arg_T1, ..., arg_TN)
|
0x2E<var_uint_function_name_size><function_name_data><var_uint_number_of_parameters><param_1>...<param_N><var_uint_number_of_arguments><argument_type_encoding_1>...<argument_type_encoding_N> (see aggregate function parameter binary encoding)
|
Nested(name1 T1, ..., nameN TN)
|
0x2F<var_uint_number_of_elements><var_uint_name_size_1><name_data_1><nested_type_encoding_1>...<var_uint_name_size_N><name_data_N><nested_type_encoding_N>
|
JSON(max_dynamic_paths=N, max_dynamic_types=M, path Type, SKIP skip_path, SKIP REGEXP skip_path_regexp)
|
0x30<uint8_serialization_version><var_int_max_dynamic_paths><uint8_max_dynamic_types><var_uint_number_of_typed_paths><var_uint_path_name_size_1><path_name_data_1><encoded_type_1>...<var_uint_number_of_skip_paths><var_uint_skip_path_size_1><skip_path_data_1>...<var_uint_number_of_skip_path_regexps><var_uint_skip_path_regexp_size_1><skip_path_data_regexp_1>...
|
For type JSON
byte uint8_serialization_version
indicates the version of the serialization. Right now the version is always 0 but can change in future if new arguments will be introduced for JSON
type.
Interval kind binary encoding
The table below describes how different interval kinds of Interval
data type are encoded.
Interval kind | Binary encoding |
---|---|
Nanosecond
|
0x00
|
Microsecond
|
0x01
|
Millisecond
|
0x02
|
Second
|
0x03
|
Minute
|
0x04
|
Hour
|
0x05
|
Day
|
0x06
|
Week
|
0x07
|
Month
|
0x08
|
Quarter
|
0x09
|
Year
|
0x1A
|
Aggregate function parameter binary encoding
The table below describes how parameters of AggragateFunction
and SimpleAggregateFunction
are encoded. The encoding of a parameter consists of 1 byte indicating the type of the parameter and the value itself.
Parameter type | Binary encoding |
---|---|
Null
|
0x00
|
UInt64
|
0x01<var_uint_value>
|
Int64
|
0x02<var_int_value>
|
UInt128
|
0x03<uint128_little_endian_value>
|
Int128
|
0x04<int128_little_endian_value>
|
UInt128
|
0x05<uint128_little_endian_value>
|
Int128
|
0x06<int128_little_endian_value>
|
Float64
|
0x07<float64_little_endian_value>
|
Decimal32
|
0x08<var_uint_scale><int32_little_endian_value>
|
Decimal64
|
0x09<var_uint_scale><int64_little_endian_value>
|
Decimal128
|
0x0A<var_uint_scale><int128_little_endian_value>
|
Decimal256
|
0x0B<var_uint_scale><int256_little_endian_value>
|
String
|
0x0C<var_uint_size><data>
|
Array
|
0x0D<var_uint_size><value_encoding_1>...<value_encoding_N>
|
Tuple
|
0x0E<var_uint_size><value_encoding_1>...<value_encoding_N>
|
Map
|
0x0F<var_uint_size><key_encoding_1><value_encoding_1>...<key_endoding_N><value_encoding_N>
|
IPv4
|
0x10<uint32_little_endian_value>
|
IPv6
|
0x11<uint128_little_endian_value>
|
UUID
|
0x12<uuid_value>
|
Bool
|
0x13<bool_value>
|
Object
|
0x14<var_uint_size><var_uint_key_size_1><key_data_1><value_encoding_1>...<var_uint_key_size_N><key_data_N><value_encoding_N>
|
AggregateFunctionState
|
0x15<var_uint_name_size><name_data><var_uint_data_size><data>
|
Negative infinity
|
0xFE
|
Positive infinity
|
0xFF
|