This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Aggregate Function Tests
Relevant source files
Purpose and Scope
This document describes the aggregate function test suite in the SQL Logic Test corpus. The test file validates the behavior of SQLite's aggregate functions including count(), avg(), sum(), total(), min(), max(), and group_concat(). Tests cover standard aggregate operations, the DISTINCT keyword, NULL handling, type coercion, and return type behavior.
For query operator tests including IN/NOT IN operators, see IN and NOT IN Operator Tests. For other SQL language evidence tests, see SQL Language Evidence Tests.
Sources: test/evidence/slt_lang_aggfunc.test:1-500
Test File Overview
The aggregate function tests are located in a single file that validates compliance with SQLite documentation requirements. Each test includes "EVIDENCE-OF" comments referencing specific SQLite documentation clauses.
| Property | Value |
|---|---|
| File Location | test/evidence/slt_lang_aggfunc.test |
| Hash Threshold | 8 |
| Platform Support | SQLite only (halts on other databases) |
| Test Count | ~100+ individual queries |
| Primary Table | t1(x INTEGER, y VARCHAR(8)) |
Platform Execution:
Sources: test/evidence/slt_lang_aggfunc.test:1-19
Test Data Structure
The tests use a simple table structure with carefully designed data to validate edge cases:
Initial Test Data:
erDiagram
t1 {INTEGER x "Primary test column"\nVARCHAR y "Labels for filtering"}
t1_data {INTEGER x "1, 0, NULL, 2, 2, 4.0, 1<<63, 1<<63, -1"\nVARCHAR y "true, false, NULL, true, true, true, true, true, true"}
t1_index {INDEX t1i1 "ON t1(x) - created then dropped"}
t1 ||--|| t1_data : contains
t1 ||--o{ t1_index : has
| x | y | Purpose |
|---|---|---|
| 1 | 'true' | Non-zero integer |
| 0 | 'false' | Zero value |
| NULL | 'NULL' | NULL handling |
| 2 | 'true' | Duplicate value (added later) |
| 2 | 'true' | Duplicate value (added later) |
| 4.0 | 'true' | Floating point value |
| 1<<63 | 'true' | Large integer (overflow testing) |
| 1<<63 | 'true' | Large integer duplicate |
| -1 | 'true' | Negative value |
Sources: test/evidence/slt_lang_aggfunc.test:3-16 test/evidence/slt_lang_aggfunc.test:71-77 test/evidence/slt_lang_aggfunc.test:452-478
DISTINCT Keyword Support
All single-argument aggregate functions support the DISTINCT keyword, which filters duplicate values before aggregation.
DISTINCT Behavior Examples:
stateDiagram-v2
[*] --> ReadRow : Aggregate function execution
ReadRow --> CheckDistinct : Process row
CheckDistinct --> NormalAgg : No DISTINCT
CheckDistinct --> DistinctFilter : DISTINCT keyword present
DistinctFilter --> SeenBefore{"Value already\nseen?"}
SeenBefore -->|Yes| ReadRow : Skip row
SeenBefore -->|No| AddToSet : Add to seen set
AddToSet --> NormalAgg : Process value
NormalAgg --> Accumulate : Update aggregate
Accumulate --> ReadRow : Next row
ReadRow --> [*] : End of data
| Function | Without DISTINCT | With DISTINCT | Data |
|---|---|---|---|
count(x) | 4 | 3 | [1, 0, 2, 2] |
avg(x) | 1.25 | 1.0 | [1, 0, 2, 2] |
sum(x) | 5 | 3 | [1, 0, 2, 2] |
group_concat(x) | "1,0,2,2" | "1,0,2" | [1, 0, 2, 2] |
Evidence Reference: R-00466-56349 and R-00171-59428
Sources: test/evidence/slt_lang_aggfunc.test:24-82
NULL Handling Behavior
Each aggregate function handles NULL values differently. NULL values are filtered out before aggregation, but the return value when all inputs are NULL varies by function.
NULL Processing Matrix:
| Function | NULL Input Handling | All-NULL Result | Mixed NULL/Non-NULL Result |
|---|---|---|---|
count(x) | Excluded from count | 0 | Count of non-NULL values |
count(*) | N/A (counts rows) | 0 (if no rows) | Count of all rows |
avg(x) | Excluded from average | NULL | Average of non-NULL values |
sum(x) | Excluded from sum | NULL | Sum of non-NULL values |
total(x) | Excluded from sum | 0.0 | Sum of non-NULL values |
min(x) | Excluded from min | NULL | Minimum non-NULL value |
max(x) | Excluded from max | NULL | Maximum non-NULL value |
group_concat(x) | Excluded from concat | NULL | Concatenation of non-NULL |
Test Examples:
The test validates NULL handling by querying rows where y='null', which contains only a NULL value for x:
SELECT avg(x) FROM t1 WHERE y='null'→ NULLSELECT sum(x) FROM t1 WHERE y='null'→ NULLSELECT total(x) FROM t1 WHERE y='null'→ 0SELECT count(x) FROM t1 WHERE y='null'→ 0
Sources: test/evidence/slt_lang_aggfunc.test:184-196 test/evidence/slt_lang_aggfunc.test:396-418
Type Coercion and String Handling
Aggregate functions interpret non-numeric strings as 0 when performing arithmetic operations.
String Coercion Test Results:
flowchart LR
Input["Input Value"]
CheckType{"Value Type"}
IsNumber["Use numeric value"]
IsString["Check if looks\nlike number"]
IsNull["Exclude from\naggregation"]
LooksNumeric{"Parseable\nas number?"}
UseZero["Interpret as 0"]
ParseValue["Parse to number"]
Input --> CheckType
CheckType -->|INTEGER/REAL| IsNumber
CheckType -->|TEXT/BLOB| IsString
CheckType -->|NULL| IsNull
IsString --> LooksNumeric
LooksNumeric -->|No| UseZero
LooksNumeric -->|Yes| ParseValue
Testing with y VARCHAR(8) column containing values: 'true', 'false', 'NULL'
| Function | Result | Explanation |
|---|---|---|
count(y) | 5 | Counts non-NULL strings |
avg(y) | 0 | Strings interpreted as 0 |
sum(y) | 0 | All strings → 0, sum is 0 |
total(y) | 0 | All strings → 0, sum is 0.0 |
min(y) | 0 | Minimum of converted values |
max(y) | 0 | Maximum of converted values |
group_concat(y) | 'true,false,NULL,true,true' | Concatenates as strings |
Evidence Reference: R-29052-00975
Sources: test/evidence/slt_lang_aggfunc.test:92-166
graph TD
AvgFunc["avg()"]
SumFunc["sum()"]
TotalFunc["total()"]
CountFunc["count()"]
OtherFunc["min() / max() /\ngroup_concat()"]
AvgAlways["Always REAL\n(floating point)"]
SumCheck{"All inputs\nare INTEGER?"}
SumInt["Return INTEGER"]
SumFloat["Return REAL"]
TotalAlways["Always REAL"]
CountAlways["Always INTEGER"]
OtherMatch["Return type matches\ninput column type"]
AvgFunc --> AvgAlways
SumFunc --> SumCheck
SumCheck -->|Yes| SumInt
SumCheck -->|No| SumFloat
TotalFunc --> TotalAlways
CountFunc --> CountAlways
OtherFunc --> OtherMatch
style AvgAlways fill:#f9f9f9
style SumInt fill:#f9f9f9
style SumFloat fill:#f9f9f9
style TotalAlways fill:#f9f9f9
style CountAlways fill:#f9f9f9
style OtherMatch fill:#f9f9f9
Return Type Behavior
Aggregate functions return different types depending on the input data and function semantics.
Return Type Rules:
Type Coercion Examples:
| Query | Input Data | Expected Result | Result Type |
|---|---|---|---|
SELECT avg(x) FROM t1 | [1, 0, 2, 2] (all INT) | 1.250 | REAL (R) |
SELECT sum(x) FROM t1 | [1, 0, 2, 2] (all INT) | 5 | INTEGER (I) |
SELECT sum(x) FROM t1 | [1, 0, 4.0, 2, 2] (mixed) | 9.000 | REAL (R) |
SELECT total(x) FROM t1 | [1, 0, 2, 2] (all INT) | 5.000 | REAL (R) |
Evidence References:
- R-17177-10067 (avg always returns REAL)
- R-19660-56479 (sum returns INTEGER for all-integer input)
- R-33611-59266 (sum returns REAL if any non-integer input)
- R-07734-01023 (total always returns REAL)
Sources: test/evidence/slt_lang_aggfunc.test:168-182 test/evidence/slt_lang_aggfunc.test:420-464
Aggregate Functions Reference
count()
Syntax: count(x) or count(DISTINCT x) or count(*)
Behavior:
count(x): Returns the number of non-NULL values in column xcount(*): Returns the total number of rows in the groupcount(DISTINCT x): Returns the number of distinct non-NULL values
Return Type: INTEGER
NULL Handling: NULLs are excluded; returns 0 if no non-NULL values
Special Cases:
count(DISTINCT *)is a syntax errorcount(*)counts all rows, including those with all NULL columns
Test Coverage: test/evidence/slt_lang_aggfunc.test:28-31 test/evidence/slt_lang_aggfunc.test:198-234
Evidence References: R-34280-42283, R-13776-21310
avg()
Syntax: avg(x) or avg(DISTINCT x)
Behavior:
- Returns the average of all non-NULL values
- Always returns a floating-point value, even for integer inputs
Return Type: REAL (always)
NULL Handling: Returns NULL if all values are NULL
Type Coercion: Non-numeric strings interpreted as 0
Test Coverage: test/evidence/slt_lang_aggfunc.test:33-36 test/evidence/slt_lang_aggfunc.test:83-90 test/evidence/slt_lang_aggfunc.test:168-182
Evidence References: R-20409-33051, R-17177-10067, R-40597-22164
sum()
Syntax: sum(x) or sum(DISTINCT x)
Behavior:
- Returns the sum of all non-NULL values
- Return type depends on input: INTEGER if all inputs are integers, REAL otherwise
- Throws "integer overflow" exception on overflow with all-integer input
Return Type: INTEGER (all-integer input) or REAL (mixed input)
NULL Handling: Returns NULL if all values are NULL
Overflow Behavior: Integer overflow causes exception; test demonstrates with 1<<63 values
Test Coverage: test/evidence/slt_lang_aggfunc.test:38-41 test/evidence/slt_lang_aggfunc.test:375-394 test/evidence/slt_lang_aggfunc.test:434-464
Evidence References: R-24943-34514, R-19660-56479, R-33611-59266
total()
Syntax: total(x) or total(DISTINCT x)
Behavior:
- Similar to
sum()but always returns REAL - Returns 0.0 instead of NULL when all values are NULL
- Never throws integer overflow exception
Return Type: REAL (always)
NULL Handling: Returns 0.0 if all values are NULL
Comparison with sum():
| Scenario | sum() | total() |
|---|---|---|
| All NULL | NULL | 0.0 |
| All integers | INTEGER | REAL |
| Integer overflow | Exception | No exception |
Test Coverage: test/evidence/slt_lang_aggfunc.test:43-46 test/evidence/slt_lang_aggfunc.test:420-432 test/evidence/slt_lang_aggfunc.test:489-499
Evidence References: R-24943-34514, R-44223-43966, R-07734-01023
min()
Syntax: min(x) or min(DISTINCT x)
Behavior:
- Returns the minimum non-NULL value
- "Minimum" is the first value that would appear in
ORDER BY x - Uses column's collation for TEXT comparisons
Return Type: Matches input column type
NULL Handling: Returns NULL if all values are NULL
DISTINCT Effect: No practical difference (min is min regardless of duplicates)
Test Coverage: test/evidence/slt_lang_aggfunc.test:48-51 test/evidence/slt_lang_aggfunc.test:325-370
Evidence References: R-16028-39081, R-30311-39793, R-10396-30188
max()
Syntax: max(x) or max(DISTINCT x)
Behavior:
- Returns the maximum non-NULL value
- "Maximum" is the last value that would appear in
ORDER BY x - Uses column's collation for TEXT comparisons
Return Type: Matches input column type
NULL Handling: Returns NULL if all values are NULL
DISTINCT Effect: No practical difference (max is max regardless of duplicates)
Test Coverage: test/evidence/slt_lang_aggfunc.test:53-56 test/evidence/slt_lang_aggfunc.test:277-323
Evidence References: R-52585-35928, R-13053-11096, R-50775-16353
group_concat()
Syntax: group_concat(x) or group_concat(x, sep) or group_concat(DISTINCT x)
Behavior:
- Concatenates all non-NULL values into a single string
- Default separator is comma (",")
- Optional second parameter specifies custom separator
- DISTINCT can only be used with single-parameter form
Return Type: TEXT
NULL Handling:
- NULL values are excluded from concatenation
- Returns NULL if all values are NULL
Separator Rules:
- Default: ","
- Custom: second parameter (e.g.,
group_concat(x, ':')) - DISTINCT with custom separator is syntax error
Examples:
| Query | Input [1,0,2,2] | Result |
|---|---|---|
group_concat(x) | [1, 0, 2, 2] | "1,0,2,2" |
group_concat(DISTINCT x) | [1, 0, 2, 2] | "1,0,2" |
group_concat(x, ':') | [1, 0, 2, 2] | "1:0:2:2" |
group_concat(DISTINCT x, ':') | [1, 0, 2, 2] | Syntax error |
Test Coverage: test/evidence/slt_lang_aggfunc.test:58-62 test/evidence/slt_lang_aggfunc.test:236-275
Evidence References: R-56088-25150, R-08600-21007, R-39910-14723
Sources: test/evidence/slt_lang_aggfunc.test:1-500
flowchart TD
LargeInt["Large Integer Input\n(1<<63)"]
SumFunc["sum()
function"]
TotalFunc["total()
function"]
AllInt{"All inputs\nare INTEGER?"}
Overflow["Integer Overflow\nException Thrown"]
FloatResult["Convert to REAL\nReturn approximation"]
NoException["No Exception\nReturn REAL result"]
LargeInt --> SumFunc
LargeInt --> TotalFunc
SumFunc --> AllInt
AllInt -->|Yes| Overflow
AllInt -->|No| FloatResult
TotalFunc --> NoException
Integer Overflow Handling
The test suite includes specific validation for integer overflow behavior when using large values.
Overflow Test Data:
- Inserts
1<<63(9223372036854775808) twice - Inserts
-1once - Expected overflow when summing
Test Results:
| Function | Input Type | Result Behavior |
|---|---|---|
sum(x) | All INTEGER | Exception (empty result) |
sum(DISTINCT x) | All INTEGER | Converts to REAL: -9.223...e18 |
total(x) | All INTEGER | No exception: -1.844...e19 |
total(DISTINCT x) | All INTEGER | No exception: -9.223...e18 |
Evidence References:
- R-08904-24719 (sum() throws integer overflow)
- R-19553-64528 (total() never throws overflow)
Sources: test/evidence/slt_lang_aggfunc.test:466-499
Test Execution Directives
The aggregate function test file uses specific directives to control test execution and result comparison.
Key Directives Used:
| Directive | Example | Purpose |
|---|---|---|
statement ok | CREATE TABLE t1(...) | Execute statement, expect success |
query I nosort | Integer result, no sorting | Query returning integer values |
query R nosort | Real result, no sorting | Query returning floating-point values |
query T nosort | Text result, no sorting | Query returning text values |
label-NULL | Special NULL indicator | Label for expected NULL result |
label-sum | Descriptive label | Label for sum result queries |
skipif sqlite | Platform control | Skip on SQLite |
halt | Stop execution | Halt test execution |
statement error | Expected error | Expect statement to fail |
NOT INDEXED | Query hint | Force table scan, avoid index |
Platform Restriction:
The file begins with a platform check that halts execution on non-SQLite databases:
skipif sqlite
halt
This inverse logic (skip if SQLite, then halt) means: "If this is NOT SQLite, halt execution."
Sources: test/evidence/slt_lang_aggfunc.test:1-500
Evidence-Based Testing
The test file validates specific SQLite documentation requirements. Each test section includes comments with "EVIDENCE-OF" markers.
Evidence Reference Format:
Each comment references a specific requirement from SQLite's documentation:
- Format:
# EVIDENCE-OF: R-XXXXX-XXXXX <description> - The R-number is a unique identifier for that requirement
Evidence Coverage Map:
| Evidence ID | Requirement | Test Lines |
|---|---|---|
| R-00466-56349 | DISTINCT keyword support in aggregates | 24-31 |
| R-00171-59428 | DISTINCT filters duplicates before aggregation | 64-82 |
| R-31453-41389 | count(DISTINCT X) returns distinct count | 64-82 |
| R-20409-33051 | avg() returns average of non-NULL values | 83-90 |
| R-29052-00975 | Non-numeric strings interpreted as 0 | 92-166 |
| R-17177-10067 | avg() always returns floating point | 168-182 |
| R-40597-22164 | avg() returns NULL if all NULL | 184-196 |
| R-34280-42283 | count(X) counts non-NULL values | 198-220 |
| R-13776-21310 | count(*) counts all rows | 222-234 |
| R-56088-25150 | group_concat() concatenates non-NULL | 236-248 |
| R-08600-21007 | group_concat() custom separator | 250-261 |
| R-39910-14723 | group_concat() default comma separator | 263-275 |
| R-52585-35928 | max() returns maximum value | 277-289 |
| R-13053-11096 | max() uses ORDER BY ordering | 291-309 |
| R-50775-16353 | max() returns NULL if all NULL | 311-323 |
| R-16028-39081 | min() returns minimum non-NULL | 325-337 |
| R-30311-39793 | min() uses ORDER BY ordering | 339-356 |
| R-10396-30188 | min() returns NULL if all NULL | 358-370 |
| R-24943-34514 | sum()/total() return sum of non-NULL | 372-394 |
| R-44223-43966 | sum() NULL vs total() 0 for all-NULL | 396-418 |
| R-07734-01023 | total() always returns floating point | 420-432 |
| R-19660-56479 | sum() returns integer for all-integer | 434-446 |
| R-33611-59266 | sum() returns float for mixed input | 448-464 |
| R-08904-24719 | sum() throws overflow exception | 466-488 |
| R-19553-64528 | total() never throws overflow | 489-499 |
Sources: test/evidence/slt_lang_aggfunc.test:1-500