Relational Data - Data Types - Data Types Binary Encoding: Difference between revisions

Latest revision as of 19:05, 28 August 2024

This specification describes the binary format that can be used for binary encoding and decoding of ClickHouse data types. This format is used in Dynamic column binary serialization and can be used in input/output formats RowBinaryWithNamesAndTypes and Native under corresponding settings.

The table below describes how each data type is represented in binary format. Each data type encoding consist of 1 byte that indicates the type and some optional additional information. var_uint in the binary encoding means that the size is encoded using Variable-Length Quantity compression.

ClickHouse data type	Binary encoding
`Nothing`	`0x00`
`UInt8`	`0x01`
`UInt16`	`0x02`
`UInt32`	`0x03`
`UInt64`	`0x04`
`UInt128`	`0x05`
`UInt256`	`0x06`
`Int8`	`0x07`
`Int16`	`0x08`
`Int32`	`0x09`
`Int64`	`0x0A`
`Int128`	`0x0B`
`Int256`	`0x0C`
`Float32`	`0x0D`
`Float64`	`0x0E`
`Date`	`0x0F`
`Date32`	`0x10`
`DateTime`	`0x11`
`DateTime(time_zone)`	`0x12<var_uint_time_zone_name_size><time_zone_name_data>`
`DateTime64(P)`	`0x13<uint8_precision>`
`DateTime64(P, time_zone)`	`0x14<uint8_precision><var_uint_time_zone_name_size><time_zone_name_data>`
`String`	`0x15`
`FixedString(N)`	`0x16<var_uint_size>`
`Enum8`	`0x17<var_uint_number_of_elements><var_uint_name_size_1><name_data_1><int8_value_1>...<var_uint_name_size_N><name_data_N><int8_value_N>`
`Enum16`	`0x18<var_uint_number_of_elements><var_uint_name_size_1><name_data_1><int16_little_endian_value_1>...><var_uint_name_size_N><name_data_N><int16_little_endian_value_N>`
`Decimal32(P, S)`	`0x19<uint8_precision><uint8_scale>`
`Decimal64(P, S)`	`0x1A<uint8_precision><uint8_scale>`
`Decimal128(P, S)`	`0x1B<uint8_precision><uint8_scale>`
`Decimal256(P, S)`	`0x1C<uint8_precision><uint8_scale>`
`UUID`	`0x1D`
`Array(T)`	`0x1E<nested_type_encoding>`
`Tuple(T1, ..., TN)`	`0x1F<var_uint_number_of_elements><nested_type_encoding_1>...<nested_type_encoding_N>`
`Tuple(name1 T1, ..., nameN TN)`	`0x20<var_uint_number_of_elements><var_uint_name_size_1><name_data_1><nested_type_encoding_1>...<var_uint_name_size_N><name_data_N><nested_type_encoding_N>`
`Set`	`0x21`
`Interval`	`0x22<interval_kind>` (see interval kind binary encoding)
`Nullable(T)`	`0x23<nested_type_encoding>`
`Function`	`0x24<var_uint_number_of_arguments><argument_type_encoding_1>...<argument_type_encoding_N><return_type_encoding>`
`AggregateFunction(function_name(param_1, ..., param_N), arg_T1, ..., arg_TN)`	`0x25<var_uint_version><var_uint_function_name_size><function_name_data><var_uint_number_of_parameters><param_1>...<param_N><var_uint_number_of_arguments><argument_type_encoding_1>...<argument_type_encoding_N>` (see aggregate function parameter binary encoding)
`LowCardinality(T)`	`0x26<nested_type_encoding>`
`Map(K, V)`	`0x27<key_type_encoding><value_type_encoding>`
`IPv4`	`0x28`
`IPv6`	`0x29`
`Variant(T1, ..., TN)`	`0x2A<var_uint_number_of_variants><variant_type_encoding_1>...<variant_type_encoding_N>`
`Dynamic(max_types=N)`	`0x2B<uint8_max_types>`
`Custom type` (`Ring`, `Polygon`, etc)	`0x2C<var_uint_type_name_size><type_name_data>`
`Bool`	`0x2D`
`SimpleAggregateFunction(function_name(param_1, ..., param_N), arg_T1, ..., arg_TN)`	`0x2E<var_uint_function_name_size><function_name_data><var_uint_number_of_parameters><param_1>...<param_N><var_uint_number_of_arguments><argument_type_encoding_1>...<argument_type_encoding_N>` (see aggregate function parameter binary encoding)
`Nested(name1 T1, ..., nameN TN)`	`0x2F<var_uint_number_of_elements><var_uint_name_size_1><name_data_1><nested_type_encoding_1>...<var_uint_name_size_N><name_data_N><nested_type_encoding_N>`
`JSON(max_dynamic_paths=N, max_dynamic_types=M, path Type, SKIP skip_path, SKIP REGEXP skip_path_regexp)`	`0x30<uint8_serialization_version><var_int_max_dynamic_paths><uint8_max_dynamic_types><var_uint_number_of_typed_paths><var_uint_path_name_size_1><path_name_data_1><encoded_type_1>...<var_uint_number_of_skip_paths><var_uint_skip_path_size_1><skip_path_data_1>...<var_uint_number_of_skip_path_regexps><var_uint_skip_path_regexp_size_1><skip_path_data_regexp_1>...`

For type JSON byte uint8_serialization_version indicates the version of the serialization. Right now the version is always 0 but can change in future if new arguments will be introduced for JSON type.

Interval kind binary encoding

The table below describes how different interval kinds of Interval data type are encoded.

Interval kind	Binary encoding
`Nanosecond`	`0x00`
`Microsecond`	`0x01`
`Millisecond`	`0x02`
`Second`	`0x03`
`Minute`	`0x04`
`Hour`	`0x05`
`Day`	`0x06`
`Week`	`0x07`
`Month`	`0x08`
`Quarter`	`0x09`
`Year`	`0x1A`

Aggregate function parameter binary encoding

The table below describes how parameters of AggragateFunction and SimpleAggregateFunction are encoded. The encoding of a parameter consists of 1 byte indicating the type of the parameter and the value itself.

Parameter type	Binary encoding
`Null`	`0x00`
`UInt64`	`0x01<var_uint_value>`
`Int64`	`0x02<var_int_value>`
`UInt128`	`0x03<uint128_little_endian_value>`
`Int128`	`0x04<int128_little_endian_value>`
`UInt128`	`0x05<uint128_little_endian_value>`
`Int128`	`0x06<int128_little_endian_value>`
`Float64`	`0x07<float64_little_endian_value>`
`Decimal32`	`0x08<var_uint_scale><int32_little_endian_value>`
`Decimal64`	`0x09<var_uint_scale><int64_little_endian_value>`
`Decimal128`	`0x0A<var_uint_scale><int128_little_endian_value>`
`Decimal256`	`0x0B<var_uint_scale><int256_little_endian_value>`
`String`	`0x0C<var_uint_size><data>`
`Array`	`0x0D<var_uint_size><value_encoding_1>...<value_encoding_N>`
`Tuple`	`0x0E<var_uint_size><value_encoding_1>...<value_encoding_N>`
`Map`	`0x0F<var_uint_size><key_encoding_1><value_encoding_1>...<key_endoding_N><value_encoding_N>`
`IPv4`	`0x10<uint32_little_endian_value>`
`IPv6`	`0x11<uint128_little_endian_value>`
`UUID`	`0x12<uuid_value>`
`Bool`	`0x13<bool_value>`
`Object`	`0x14<var_uint_size><var_uint_key_size_1><key_data_1><value_encoding_1>...<var_uint_key_size_N><key_data_N><value_encoding_N>`
`AggregateFunctionState`	`0x15<var_uint_name_size><name_data><var_uint_data_size><data>`
`Negative infinity`	`0xFE`
`Positive infinity`	`0xFF`