Relational Data - Statements - Select - Distinct
If SELECT DISTINCT
is specified, only unique rows will remain in a query result. Thus, only a single row will remain out of all the sets of fully matching rows in the result.
You can specify the list of columns that must have unique values: SELECT DISTINCT ON (column1, column2,...)
. If the columns are not specified, all of them are taken into consideration.
Consider the table:
┌─a─┬─b─┬─c─┐
│ 1 │ 1 │ 1 │
│ 1 │ 1 │ 1 │
│ 2 │ 2 │ 2 │
│ 2 │ 2 │ 2 │
│ 1 │ 1 │ 2 │
│ 1 │ 2 │ 2 │
└───┴───┴───┘
Using DISTINCT
without specifying columns:
SELECT DISTINCT * FROM t1;
┌─a─┬─b─┬─c─┐
│ 1 │ 1 │ 1 │
│ 2 │ 2 │ 2 │
│ 1 │ 1 │ 2 │
│ 1 │ 2 │ 2 │
└───┴───┴───┘
Using DISTINCT
with specified columns:
SELECT DISTINCT ON (a,b) * FROM t1;
┌─a─┬─b─┬─c─┐
│ 1 │ 1 │ 1 │
│ 2 │ 2 │ 2 │
│ 1 │ 2 │ 2 │
└───┴───┴───┘
DISTINCT and ORDER BY
ClickHouse supports using the DISTINCT
and ORDER BY
clauses for different columns in one query. The DISTINCT
clause is executed before the ORDER BY
clause.
Consider the table:
┌─a─┬─b─┐
│ 2 │ 1 │
│ 1 │ 2 │
│ 3 │ 3 │
│ 2 │ 4 │
└───┴───┘
Selecting data:
SELECT DISTINCT a FROM t1 ORDER BY b ASC;
┌─a─┐
│ 2 │
│ 1 │
│ 3 │
└───┘
Selecting data with the different sorting direction:
SELECT DISTINCT a FROM t1 ORDER BY b DESC;
┌─a─┐
│ 3 │
│ 1 │
│ 2 │
└───┘
Row 2, 4
was cut before sorting.
Take this implementation specificity into account when programming queries.
Null Processing
DISTINCT
works with NULL as if NULL
were a specific value, and NULL==NULL
. In other words, in the DISTINCT
results, different combinations with NULL
occur only once. It differs from NULL
processing in most other contexts.
Alternatives
It is possible to obtain the same result by applying GROUP BY across the same set of values as specified as SELECT
clause, without using any aggregate functions. But there are few differences from GROUP BY
approach: