Relational Data - Data Types - Data Types Binary Encoding

From FojiSoft Docs
Revision as of 19:05, 28 August 2024 by Chris.Hansen (talk | contribs) (Import ClickHouse Docs: Wed Aug 28 2024 15:05:32 GMT-0400 (Eastern Daylight Time))
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

This specification describes the binary format that can be used for binary encoding and decoding of ClickHouse data types. This format is used in Dynamic column binary serialization and can be used in input/output formats RowBinaryWithNamesAndTypes and Native under corresponding settings.

The table below describes how each data type is represented in binary format. Each data type encoding consist of 1 byte that indicates the type and some optional additional information. var_uint in the binary encoding means that the size is encoded using Variable-Length Quantity compression.

ClickHouse data type Binary encoding
Nothing 0x00
UInt8 0x01
UInt16 0x02
UInt32 0x03
UInt64 0x04
UInt128 0x05
UInt256 0x06
Int8 0x07
Int16 0x08
Int32 0x09
Int64 0x0A
Int128 0x0B
Int256 0x0C
Float32 0x0D
Float64 0x0E
Date 0x0F
Date32 0x10
DateTime 0x11
DateTime(time_zone) 0x12<var_uint_time_zone_name_size><time_zone_name_data>
DateTime64(P) 0x13<uint8_precision>
DateTime64(P, time_zone) 0x14<uint8_precision><var_uint_time_zone_name_size><time_zone_name_data>
String 0x15
FixedString(N) 0x16<var_uint_size>
Enum8 0x17<var_uint_number_of_elements><var_uint_name_size_1><name_data_1><int8_value_1>...<var_uint_name_size_N><name_data_N><int8_value_N>
Enum16 0x18<var_uint_number_of_elements><var_uint_name_size_1><name_data_1><int16_little_endian_value_1>...><var_uint_name_size_N><name_data_N><int16_little_endian_value_N>
Decimal32(P, S) 0x19<uint8_precision><uint8_scale>
Decimal64(P, S) 0x1A<uint8_precision><uint8_scale>
Decimal128(P, S) 0x1B<uint8_precision><uint8_scale>
Decimal256(P, S) 0x1C<uint8_precision><uint8_scale>
UUID 0x1D
Array(T) 0x1E<nested_type_encoding>
Tuple(T1, ..., TN) 0x1F<var_uint_number_of_elements><nested_type_encoding_1>...<nested_type_encoding_N>
Tuple(name1 T1, ..., nameN TN) 0x20<var_uint_number_of_elements><var_uint_name_size_1><name_data_1><nested_type_encoding_1>...<var_uint_name_size_N><name_data_N><nested_type_encoding_N>
Set 0x21
Interval 0x22<interval_kind> (see interval kind binary encoding)
Nullable(T) 0x23<nested_type_encoding>
Function 0x24<var_uint_number_of_arguments><argument_type_encoding_1>...<argument_type_encoding_N><return_type_encoding>
AggregateFunction(function_name(param_1, ..., param_N), arg_T1, ..., arg_TN) 0x25<var_uint_version><var_uint_function_name_size><function_name_data><var_uint_number_of_parameters><param_1>...<param_N><var_uint_number_of_arguments><argument_type_encoding_1>...<argument_type_encoding_N> (see aggregate function parameter binary encoding)
LowCardinality(T) 0x26<nested_type_encoding>
Map(K, V) 0x27<key_type_encoding><value_type_encoding>
IPv4 0x28
IPv6 0x29
Variant(T1, ..., TN) 0x2A<var_uint_number_of_variants><variant_type_encoding_1>...<variant_type_encoding_N>
Dynamic(max_types=N) 0x2B<uint8_max_types>
Custom type (Ring, Polygon, etc) 0x2C<var_uint_type_name_size><type_name_data>
Bool 0x2D
SimpleAggregateFunction(function_name(param_1, ..., param_N), arg_T1, ..., arg_TN) 0x2E<var_uint_function_name_size><function_name_data><var_uint_number_of_parameters><param_1>...<param_N><var_uint_number_of_arguments><argument_type_encoding_1>...<argument_type_encoding_N> (see aggregate function parameter binary encoding)
Nested(name1 T1, ..., nameN TN) 0x2F<var_uint_number_of_elements><var_uint_name_size_1><name_data_1><nested_type_encoding_1>...<var_uint_name_size_N><name_data_N><nested_type_encoding_N>
JSON(max_dynamic_paths=N, max_dynamic_types=M, path Type, SKIP skip_path, SKIP REGEXP skip_path_regexp) 0x30<uint8_serialization_version><var_int_max_dynamic_paths><uint8_max_dynamic_types><var_uint_number_of_typed_paths><var_uint_path_name_size_1><path_name_data_1><encoded_type_1>...<var_uint_number_of_skip_paths><var_uint_skip_path_size_1><skip_path_data_1>...<var_uint_number_of_skip_path_regexps><var_uint_skip_path_regexp_size_1><skip_path_data_regexp_1>...

For type JSON byte uint8_serialization_version indicates the version of the serialization. Right now the version is always 0 but can change in future if new arguments will be introduced for JSON type.

Interval kind binary encoding

The table below describes how different interval kinds of Interval data type are encoded.

Interval kind Binary encoding
Nanosecond 0x00
Microsecond 0x01
Millisecond 0x02
Second 0x03
Minute 0x04
Hour 0x05
Day 0x06
Week 0x07
Month 0x08
Quarter 0x09
Year 0x1A

Aggregate function parameter binary encoding

The table below describes how parameters of AggragateFunction and SimpleAggregateFunction are encoded. The encoding of a parameter consists of 1 byte indicating the type of the parameter and the value itself.

Parameter type Binary encoding
Null 0x00
UInt64 0x01<var_uint_value>
Int64 0x02<var_int_value>
UInt128 0x03<uint128_little_endian_value>
Int128 0x04<int128_little_endian_value>
UInt128 0x05<uint128_little_endian_value>
Int128 0x06<int128_little_endian_value>
Float64 0x07<float64_little_endian_value>
Decimal32 0x08<var_uint_scale><int32_little_endian_value>
Decimal64 0x09<var_uint_scale><int64_little_endian_value>
Decimal128 0x0A<var_uint_scale><int128_little_endian_value>
Decimal256 0x0B<var_uint_scale><int256_little_endian_value>
String 0x0C<var_uint_size><data>
Array 0x0D<var_uint_size><value_encoding_1>...<value_encoding_N>
Tuple 0x0E<var_uint_size><value_encoding_1>...<value_encoding_N>
Map 0x0F<var_uint_size><key_encoding_1><value_encoding_1>...<key_endoding_N><value_encoding_N>
IPv4 0x10<uint32_little_endian_value>
IPv6 0x11<uint128_little_endian_value>
UUID 0x12<uuid_value>
Bool 0x13<bool_value>
Object 0x14<var_uint_size><var_uint_key_size_1><key_data_1><value_encoding_1>...<var_uint_key_size_N><key_data_N><value_encoding_N>
AggregateFunctionState 0x15<var_uint_name_size><name_data><var_uint_data_size><data>
Negative infinity 0xFE
Positive infinity 0xFF