Relational Data - Data Types - Lowcardinality: Difference between revisions
Chris.Hansen (talk | contribs) (Import ClickHouse Docs: Wed Aug 28 2024 14:52:11 GMT-0400 (Eastern Daylight Time)) |
Chris.Hansen (talk | contribs) (Import ClickHouse Docs: Wed Aug 28 2024 15:05:37 GMT-0400 (Eastern Daylight Time)) |
||
Line 50: | Line 50: | ||
* Blog: [https://clickhouse.com/blog/optimize-clickhouse-codecs-compression-schema Optimizing ClickHouse with Schemas and Codecs] | * Blog: [https://clickhouse.com/blog/optimize-clickhouse-codecs-compression-schema Optimizing ClickHouse with Schemas and Codecs] | ||
* Blog: [https://clickhouse.com/blog/working-with-time-series-data-and-functions-ClickHouse Working with time series data in ClickHouse] | * Blog: [https://clickhouse.com/blog/working-with-time-series-data-and-functions-ClickHouse Working with time series data in ClickHouse] | ||
* [https://github.com/ClickHouse/clickhouse-presentations/raw/master/meetup19/string_optimization.pdf Slides in English] | * String Optimization (video presentation in Russian). [https://github.com/ClickHouse/clickhouse-presentations/raw/master/meetup19/string_optimization.pdf Slides in English] | ||
[[Category:Relational_Data]] | [[Category:Relational_Data]] |
Latest revision as of 19:05, 28 August 2024
Changes the internal representation of other data types to be dictionary-encoded.
Syntax
LowCardinality(data_type)
Parameters
data_type
— String, FixedString, Date, DateTime, and numbers excepting Decimal.LowCardinality
is not efficient for some data types, see the allow_suspicious_low_cardinality_types setting description.
Description
LowCardinality
is a superstructure that changes a data storage method and rules of data processing. ClickHouse applies dictionary coding to LowCardinality
-columns. Operating with dictionary encoded data significantly increases performance of SELECT queries for many applications.
The efficiency of using LowCardinality
data type depends on data diversity. If a dictionary contains less than 10,000 distinct values, then ClickHouse mostly shows higher efficiency of data reading and storing. If a dictionary contains more than 100,000 distinct values, then ClickHouse can perform worse in comparison with using ordinary data types.
Consider using LowCardinality
instead of Enum when working with strings. LowCardinality
provides more flexibility in use and often reveals the same or higher efficiency.
Example
Create a table with a LowCardinality
-column:
CREATE TABLE lc_t
(
`id` UInt16,
`strings` LowCardinality(String)
)
ENGINE = MergeTree()
ORDER BY id
Related Settings and Functions
Settings:
- low_cardinality_max_dictionary_size
- low_cardinality_use_single_dictionary_for_part
- low_cardinality_allow_in_native_format
- allow_suspicious_low_cardinality_types
- output_format_arrow_low_cardinality_as_dictionary
Functions:
Related content
- Blog: Optimizing ClickHouse with Schemas and Codecs
- Blog: Working with time series data in ClickHouse
- String Optimization (video presentation in Russian). Slides in English