Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Aggregate Function Tests

Relevant source files

Purpose and Scope

This document describes the aggregate function test suite in the SQL Logic Test corpus. The test file validates the behavior of SQLite's aggregate functions including count(), avg(), sum(), total(), min(), max(), and group_concat(). Tests cover standard aggregate operations, the DISTINCT keyword, NULL handling, type coercion, and return type behavior.

For query operator tests including IN/NOT IN operators, see IN and NOT IN Operator Tests. For other SQL language evidence tests, see SQL Language Evidence Tests.

Sources: test/evidence/slt_lang_aggfunc.test:1-500

Test File Overview

The aggregate function tests are located in a single file that validates compliance with SQLite documentation requirements. Each test includes "EVIDENCE-OF" comments referencing specific SQLite documentation clauses.

PropertyValue
File Locationtest/evidence/slt_lang_aggfunc.test
Hash Threshold8
Platform SupportSQLite only (halts on other databases)
Test Count~100+ individual queries
Primary Tablet1(x INTEGER, y VARCHAR(8))

Platform Execution:

Sources: test/evidence/slt_lang_aggfunc.test:1-19

Test Data Structure

The tests use a simple table structure with carefully designed data to validate edge cases:

Initial Test Data:

erDiagram
    t1 {INTEGER x "Primary test column"\nVARCHAR y "Labels for filtering"}
    
    t1_data {INTEGER x "1, 0, NULL, 2, 2, 4.0, 1<<63, 1<<63, -1"\nVARCHAR y "true, false, NULL, true, true, true, true, true, true"}
    
    t1_index {INDEX t1i1 "ON t1(x) - created then dropped"}
    
    t1 ||--|| t1_data : contains
    t1 ||--o{ t1_index : has
xyPurpose
1'true'Non-zero integer
0'false'Zero value
NULL'NULL'NULL handling
2'true'Duplicate value (added later)
2'true'Duplicate value (added later)
4.0'true'Floating point value
1<<63'true'Large integer (overflow testing)
1<<63'true'Large integer duplicate
-1'true'Negative value

Sources: test/evidence/slt_lang_aggfunc.test:3-16 test/evidence/slt_lang_aggfunc.test:71-77 test/evidence/slt_lang_aggfunc.test:452-478

DISTINCT Keyword Support

All single-argument aggregate functions support the DISTINCT keyword, which filters duplicate values before aggregation.

DISTINCT Behavior Examples:

stateDiagram-v2
    [*] --> ReadRow : Aggregate function execution
    
    ReadRow --> CheckDistinct : Process row
    
    CheckDistinct --> NormalAgg : No DISTINCT
    CheckDistinct --> DistinctFilter : DISTINCT keyword present
    
    DistinctFilter --> SeenBefore{"Value already\nseen?"}
    SeenBefore -->|Yes| ReadRow : Skip row
    SeenBefore -->|No| AddToSet : Add to seen set
    
    AddToSet --> NormalAgg : Process value
    NormalAgg --> Accumulate : Update aggregate
    
    Accumulate --> ReadRow : Next row
    ReadRow --> [*] : End of data
FunctionWithout DISTINCTWith DISTINCTData
count(x)43[1, 0, 2, 2]
avg(x)1.251.0[1, 0, 2, 2]
sum(x)53[1, 0, 2, 2]
group_concat(x)"1,0,2,2""1,0,2"[1, 0, 2, 2]

Evidence Reference: R-00466-56349 and R-00171-59428

Sources: test/evidence/slt_lang_aggfunc.test:24-82

NULL Handling Behavior

Each aggregate function handles NULL values differently. NULL values are filtered out before aggregation, but the return value when all inputs are NULL varies by function.

NULL Processing Matrix:

FunctionNULL Input HandlingAll-NULL ResultMixed NULL/Non-NULL Result
count(x)Excluded from count0Count of non-NULL values
count(*)N/A (counts rows)0 (if no rows)Count of all rows
avg(x)Excluded from averageNULLAverage of non-NULL values
sum(x)Excluded from sumNULLSum of non-NULL values
total(x)Excluded from sum0.0Sum of non-NULL values
min(x)Excluded from minNULLMinimum non-NULL value
max(x)Excluded from maxNULLMaximum non-NULL value
group_concat(x)Excluded from concatNULLConcatenation of non-NULL

Test Examples:

The test validates NULL handling by querying rows where y='null', which contains only a NULL value for x:

  • SELECT avg(x) FROM t1 WHERE y='null' → NULL
  • SELECT sum(x) FROM t1 WHERE y='null' → NULL
  • SELECT total(x) FROM t1 WHERE y='null' → 0
  • SELECT count(x) FROM t1 WHERE y='null' → 0

Sources: test/evidence/slt_lang_aggfunc.test:184-196 test/evidence/slt_lang_aggfunc.test:396-418

Type Coercion and String Handling

Aggregate functions interpret non-numeric strings as 0 when performing arithmetic operations.

String Coercion Test Results:

flowchart LR
    Input["Input Value"]
CheckType{"Value Type"}
IsNumber["Use numeric value"]
IsString["Check if looks\nlike number"]
IsNull["Exclude from\naggregation"]
LooksNumeric{"Parseable\nas number?"}
UseZero["Interpret as 0"]
ParseValue["Parse to number"]
Input --> CheckType
 
   CheckType -->|INTEGER/REAL| IsNumber
 
   CheckType -->|TEXT/BLOB| IsString
 
   CheckType -->|NULL| IsNull
    
 
   IsString --> LooksNumeric
 
   LooksNumeric -->|No| UseZero
 
   LooksNumeric -->|Yes| ParseValue

Testing with y VARCHAR(8) column containing values: 'true', 'false', 'NULL'

FunctionResultExplanation
count(y)5Counts non-NULL strings
avg(y)0Strings interpreted as 0
sum(y)0All strings → 0, sum is 0
total(y)0All strings → 0, sum is 0.0
min(y)0Minimum of converted values
max(y)0Maximum of converted values
group_concat(y)'true,false,NULL,true,true'Concatenates as strings

Evidence Reference: R-29052-00975

Sources: test/evidence/slt_lang_aggfunc.test:92-166

graph TD
    AvgFunc["avg()"]
SumFunc["sum()"]
TotalFunc["total()"]
CountFunc["count()"]
OtherFunc["min() / max() /\ngroup_concat()"]
AvgAlways["Always REAL\n(floating point)"]
SumCheck{"All inputs\nare INTEGER?"}
SumInt["Return INTEGER"]
SumFloat["Return REAL"]
TotalAlways["Always REAL"]
CountAlways["Always INTEGER"]
OtherMatch["Return type matches\ninput column type"]
AvgFunc --> AvgAlways
 
   SumFunc --> SumCheck
 
   SumCheck -->|Yes| SumInt
 
   SumCheck -->|No| SumFloat
 
   TotalFunc --> TotalAlways
 
   CountFunc --> CountAlways
 
   OtherFunc --> OtherMatch
    
    style AvgAlways fill:#f9f9f9
    style SumInt fill:#f9f9f9
    style SumFloat fill:#f9f9f9
    style TotalAlways fill:#f9f9f9
    style CountAlways fill:#f9f9f9
    style OtherMatch fill:#f9f9f9

Return Type Behavior

Aggregate functions return different types depending on the input data and function semantics.

Return Type Rules:

Type Coercion Examples:

QueryInput DataExpected ResultResult Type
SELECT avg(x) FROM t1[1, 0, 2, 2] (all INT)1.250REAL (R)
SELECT sum(x) FROM t1[1, 0, 2, 2] (all INT)5INTEGER (I)
SELECT sum(x) FROM t1[1, 0, 4.0, 2, 2] (mixed)9.000REAL (R)
SELECT total(x) FROM t1[1, 0, 2, 2] (all INT)5.000REAL (R)

Evidence References:

  • R-17177-10067 (avg always returns REAL)
  • R-19660-56479 (sum returns INTEGER for all-integer input)
  • R-33611-59266 (sum returns REAL if any non-integer input)
  • R-07734-01023 (total always returns REAL)

Sources: test/evidence/slt_lang_aggfunc.test:168-182 test/evidence/slt_lang_aggfunc.test:420-464

Aggregate Functions Reference

count()

Syntax: count(x) or count(DISTINCT x) or count(*)

Behavior:

  • count(x): Returns the number of non-NULL values in column x
  • count(*): Returns the total number of rows in the group
  • count(DISTINCT x): Returns the number of distinct non-NULL values

Return Type: INTEGER

NULL Handling: NULLs are excluded; returns 0 if no non-NULL values

Special Cases:

  • count(DISTINCT *) is a syntax error
  • count(*) counts all rows, including those with all NULL columns

Test Coverage: test/evidence/slt_lang_aggfunc.test:28-31 test/evidence/slt_lang_aggfunc.test:198-234

Evidence References: R-34280-42283, R-13776-21310


avg()

Syntax: avg(x) or avg(DISTINCT x)

Behavior:

  • Returns the average of all non-NULL values
  • Always returns a floating-point value, even for integer inputs

Return Type: REAL (always)

NULL Handling: Returns NULL if all values are NULL

Type Coercion: Non-numeric strings interpreted as 0

Test Coverage: test/evidence/slt_lang_aggfunc.test:33-36 test/evidence/slt_lang_aggfunc.test:83-90 test/evidence/slt_lang_aggfunc.test:168-182

Evidence References: R-20409-33051, R-17177-10067, R-40597-22164


sum()

Syntax: sum(x) or sum(DISTINCT x)

Behavior:

  • Returns the sum of all non-NULL values
  • Return type depends on input: INTEGER if all inputs are integers, REAL otherwise
  • Throws "integer overflow" exception on overflow with all-integer input

Return Type: INTEGER (all-integer input) or REAL (mixed input)

NULL Handling: Returns NULL if all values are NULL

Overflow Behavior: Integer overflow causes exception; test demonstrates with 1<<63 values

Test Coverage: test/evidence/slt_lang_aggfunc.test:38-41 test/evidence/slt_lang_aggfunc.test:375-394 test/evidence/slt_lang_aggfunc.test:434-464

Evidence References: R-24943-34514, R-19660-56479, R-33611-59266


total()

Syntax: total(x) or total(DISTINCT x)

Behavior:

  • Similar to sum() but always returns REAL
  • Returns 0.0 instead of NULL when all values are NULL
  • Never throws integer overflow exception

Return Type: REAL (always)

NULL Handling: Returns 0.0 if all values are NULL

Comparison with sum():

Scenariosum()total()
All NULLNULL0.0
All integersINTEGERREAL
Integer overflowExceptionNo exception

Test Coverage: test/evidence/slt_lang_aggfunc.test:43-46 test/evidence/slt_lang_aggfunc.test:420-432 test/evidence/slt_lang_aggfunc.test:489-499

Evidence References: R-24943-34514, R-44223-43966, R-07734-01023


min()

Syntax: min(x) or min(DISTINCT x)

Behavior:

  • Returns the minimum non-NULL value
  • "Minimum" is the first value that would appear in ORDER BY x
  • Uses column's collation for TEXT comparisons

Return Type: Matches input column type

NULL Handling: Returns NULL if all values are NULL

DISTINCT Effect: No practical difference (min is min regardless of duplicates)

Test Coverage: test/evidence/slt_lang_aggfunc.test:48-51 test/evidence/slt_lang_aggfunc.test:325-370

Evidence References: R-16028-39081, R-30311-39793, R-10396-30188


max()

Syntax: max(x) or max(DISTINCT x)

Behavior:

  • Returns the maximum non-NULL value
  • "Maximum" is the last value that would appear in ORDER BY x
  • Uses column's collation for TEXT comparisons

Return Type: Matches input column type

NULL Handling: Returns NULL if all values are NULL

DISTINCT Effect: No practical difference (max is max regardless of duplicates)

Test Coverage: test/evidence/slt_lang_aggfunc.test:53-56 test/evidence/slt_lang_aggfunc.test:277-323

Evidence References: R-52585-35928, R-13053-11096, R-50775-16353


group_concat()

Syntax: group_concat(x) or group_concat(x, sep) or group_concat(DISTINCT x)

Behavior:

  • Concatenates all non-NULL values into a single string
  • Default separator is comma (",")
  • Optional second parameter specifies custom separator
  • DISTINCT can only be used with single-parameter form

Return Type: TEXT

NULL Handling:

  • NULL values are excluded from concatenation
  • Returns NULL if all values are NULL

Separator Rules:

  • Default: ","
  • Custom: second parameter (e.g., group_concat(x, ':'))
  • DISTINCT with custom separator is syntax error

Examples:

QueryInput [1,0,2,2]Result
group_concat(x)[1, 0, 2, 2]"1,0,2,2"
group_concat(DISTINCT x)[1, 0, 2, 2]"1,0,2"
group_concat(x, ':')[1, 0, 2, 2]"1:0:2:2"
group_concat(DISTINCT x, ':')[1, 0, 2, 2]Syntax error

Test Coverage: test/evidence/slt_lang_aggfunc.test:58-62 test/evidence/slt_lang_aggfunc.test:236-275

Evidence References: R-56088-25150, R-08600-21007, R-39910-14723

Sources: test/evidence/slt_lang_aggfunc.test:1-500

flowchart TD
    LargeInt["Large Integer Input\n(1<<63)"]
SumFunc["sum()
function"]
TotalFunc["total()
function"]
AllInt{"All inputs\nare INTEGER?"}
Overflow["Integer Overflow\nException Thrown"]
FloatResult["Convert to REAL\nReturn approximation"]
NoException["No Exception\nReturn REAL result"]
LargeInt --> SumFunc
 
   LargeInt --> TotalFunc
    
 
   SumFunc --> AllInt
 
   AllInt -->|Yes| Overflow
 
   AllInt -->|No| FloatResult
    
 
   TotalFunc --> NoException

Integer Overflow Handling

The test suite includes specific validation for integer overflow behavior when using large values.

Overflow Test Data:

  • Inserts 1<<63 (9223372036854775808) twice
  • Inserts -1 once
  • Expected overflow when summing

Test Results:

FunctionInput TypeResult Behavior
sum(x)All INTEGERException (empty result)
sum(DISTINCT x)All INTEGERConverts to REAL: -9.223...e18
total(x)All INTEGERNo exception: -1.844...e19
total(DISTINCT x)All INTEGERNo exception: -9.223...e18

Evidence References:

  • R-08904-24719 (sum() throws integer overflow)
  • R-19553-64528 (total() never throws overflow)

Sources: test/evidence/slt_lang_aggfunc.test:466-499

Test Execution Directives

The aggregate function test file uses specific directives to control test execution and result comparison.

Key Directives Used:

DirectiveExamplePurpose
statement okCREATE TABLE t1(...)Execute statement, expect success
query I nosortInteger result, no sortingQuery returning integer values
query R nosortReal result, no sortingQuery returning floating-point values
query T nosortText result, no sortingQuery returning text values
label-NULLSpecial NULL indicatorLabel for expected NULL result
label-sumDescriptive labelLabel for sum result queries
skipif sqlitePlatform controlSkip on SQLite
haltStop executionHalt test execution
statement errorExpected errorExpect statement to fail
NOT INDEXEDQuery hintForce table scan, avoid index

Platform Restriction:

The file begins with a platform check that halts execution on non-SQLite databases:

skipif sqlite
halt

This inverse logic (skip if SQLite, then halt) means: "If this is NOT SQLite, halt execution."

Sources: test/evidence/slt_lang_aggfunc.test:1-500

Evidence-Based Testing

The test file validates specific SQLite documentation requirements. Each test section includes comments with "EVIDENCE-OF" markers.

Evidence Reference Format:

Each comment references a specific requirement from SQLite's documentation:

  • Format: # EVIDENCE-OF: R-XXXXX-XXXXX <description>
  • The R-number is a unique identifier for that requirement

Evidence Coverage Map:

Evidence IDRequirementTest Lines
R-00466-56349DISTINCT keyword support in aggregates24-31
R-00171-59428DISTINCT filters duplicates before aggregation64-82
R-31453-41389count(DISTINCT X) returns distinct count64-82
R-20409-33051avg() returns average of non-NULL values83-90
R-29052-00975Non-numeric strings interpreted as 092-166
R-17177-10067avg() always returns floating point168-182
R-40597-22164avg() returns NULL if all NULL184-196
R-34280-42283count(X) counts non-NULL values198-220
R-13776-21310count(*) counts all rows222-234
R-56088-25150group_concat() concatenates non-NULL236-248
R-08600-21007group_concat() custom separator250-261
R-39910-14723group_concat() default comma separator263-275
R-52585-35928max() returns maximum value277-289
R-13053-11096max() uses ORDER BY ordering291-309
R-50775-16353max() returns NULL if all NULL311-323
R-16028-39081min() returns minimum non-NULL325-337
R-30311-39793min() uses ORDER BY ordering339-356
R-10396-30188min() returns NULL if all NULL358-370
R-24943-34514sum()/total() return sum of non-NULL372-394
R-44223-43966sum() NULL vs total() 0 for all-NULL396-418
R-07734-01023total() always returns floating point420-432
R-19660-56479sum() returns integer for all-integer434-446
R-33611-59266sum() returns float for mixed input448-464
R-08904-24719sum() throws overflow exception466-488
R-19553-64528total() never throws overflow489-499

Sources: test/evidence/slt_lang_aggfunc.test:1-500