Relational Data - Aggregate Functions - Reference - Cramersv

From FojiSoft Docs
Revision as of 18:43, 28 August 2024 by Chris.Hansen (talk | contribs) (Import ClickHouse Docs: Wed Aug 28 2024 14:43:41 GMT-0400 (Eastern Daylight Time))
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Cramer’s V (sometimes referred to as Cramer’s phi) is a measure of association between two columns in a table. The result of the cramersV function ranges from 0 (corresponding to no association between the variables) to 1 and can reach 1 only when each value is completely determined by the other. It may be viewed as the association between two variables as a percentage of their maximum possible variation.

For a bias corrected version of Cramer’s V see: cramersVBiasCorrected


Syntax

cramersV(column1, column2)

Parameters

  • column1: first column to be compared.
  • column2: second column to be compared.

Returned value

  • a value between 0 (corresponding to no association between the columns’ values) to 1 (complete association).

Type: always Float64.

Example

The following two columns being compared below have no association with each other, so the result of cramersV is 0:

Query:

SELECT
    cramersV(a, b)
FROM
    (
        SELECT
            number % 3 AS a,
            number % 5 AS b
        FROM
            numbers(150)
    );

Result:

┌─cramersV(a, b)─┐
│              0 │
└────────────────┘

The following two columns below have a fairly close association, so the result of cramersV is a high value:

SELECT
    cramersV(a, b)
FROM
    (
        SELECT
            number % 10 AS a,
            number % 5 AS b
        FROM
            numbers(150)
    );

Result:

┌─────cramersV(a, b)─┐
│ 0.8944271909999159 │
└────────────────────┘