This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Aggregate Function Tests

Loading…

Aggregate Function Tests

Relevant source files

test/evidence/slt_lang_aggfunc.test

Purpose and Scope

This document describes the aggregate function test suite in the SQL Logic Test corpus. The test file validates the behavior of SQLite’s aggregate functions including count(), avg(), sum(), total(), min(), max(), and group_concat(). Tests cover standard aggregate operations, the DISTINCT keyword, NULL handling, type coercion, and return type behavior.

For query operator tests including IN/NOT IN operators, see IN and NOT IN Operator Tests. For other SQL language evidence tests, see SQL Language Evidence Tests.

Sources: test/evidence/slt_lang_aggfunc.test:1-500

Test File Overview

The aggregate function tests are located in a single file that validates compliance with SQLite documentation requirements. Each test includes “EVIDENCE-OF” comments referencing specific SQLite documentation clauses.

Property	Value
File Location	`test/evidence/slt_lang_aggfunc.test`
Hash Threshold	8
Platform Support	SQLite only (halts on other databases)
Test Count	~100+ individual queries
Primary Table	`t1(x INTEGER, y VARCHAR(8))`

Platform Execution:

Sources: test/evidence/slt_lang_aggfunc.test:1-19

Test Data Structure

The tests use a simple table structure with carefully designed data to validate edge cases:

Initial Test Data:

erDiagram
    t1 {INTEGER x "Primary test column"\nVARCHAR y "Labels for filtering"}
    
    t1_data {INTEGER x "1, 0, NULL, 2, 2, 4.0, 1<<63, 1<<63, -1"\nVARCHAR y "true, false, NULL, true, true, true, true, true, true"}
    
    t1_index {INDEX t1i1 "ON t1(x) - created then dropped"}
    
    t1 ||--|| t1_data : contains
    t1 ||--o{ t1_index : has

x	y	Purpose
1	‘true’	Non-zero integer
0	‘false’	Zero value
NULL	‘NULL’	NULL handling
2	‘true’	Duplicate value (added later)
2	‘true’	Duplicate value (added later)
4.0	‘true’	Floating point value
1<<63	‘true’	Large integer (overflow testing)
1<<63	‘true’	Large integer duplicate
-1	‘true’	Negative value

Sources: test/evidence/slt_lang_aggfunc.test:3-16 test/evidence/slt_lang_aggfunc.test:71-77 test/evidence/slt_lang_aggfunc.test:452-478

DISTINCT Keyword Support

All single-argument aggregate functions support the DISTINCT keyword, which filters duplicate values before aggregation.

DISTINCT Behavior Examples:

stateDiagram-v2
    [*] --> ReadRow : Aggregate function execution
    
    ReadRow --> CheckDistinct : Process row
    
    CheckDistinct --> NormalAgg : No DISTINCT
    CheckDistinct --> DistinctFilter : DISTINCT keyword present
    
    DistinctFilter --> SeenBefore{"Value already\nseen?"}
    SeenBefore -->|Yes| ReadRow : Skip row
    SeenBefore -->|No| AddToSet : Add to seen set
    
    AddToSet --> NormalAgg : Process value
    NormalAgg --> Accumulate : Update aggregate
    
    Accumulate --> ReadRow : Next row
    ReadRow --> [*] : End of data

Function	Without DISTINCT	With DISTINCT	Data
`count(x)`	4	3	[1, 0, 2, 2]
`avg(x)`	1.25	1.0	[1, 0, 2, 2]
`sum(x)`	5	3	[1, 0, 2, 2]
`group_concat(x)`	“1,0,2,2”	“1,0,2”	[1, 0, 2, 2]

Evidence Reference: R-00466-56349 and R-00171-59428

Sources: test/evidence/slt_lang_aggfunc.test:24-82

NULL Handling Behavior

Each aggregate function handles NULL values differently. NULL values are filtered out before aggregation, but the return value when all inputs are NULL varies by function.

NULL Processing Matrix:

Function	NULL Input Handling	All-NULL Result	Mixed NULL/Non-NULL Result
`count(x)`	Excluded from count	0	Count of non-NULL values
`count(*)`	N/A (counts rows)	0 (if no rows)	Count of all rows
`avg(x)`	Excluded from average	NULL	Average of non-NULL values
`sum(x)`	Excluded from sum	NULL	Sum of non-NULL values
`total(x)`	Excluded from sum	0.0	Sum of non-NULL values
`min(x)`	Excluded from min	NULL	Minimum non-NULL value
`max(x)`	Excluded from max	NULL	Maximum non-NULL value
`group_concat(x)`	Excluded from concat	NULL	Concatenation of non-NULL

Test Examples:

The test validates NULL handling by querying rows where y='null', which contains only a NULL value for x:

SELECT avg(x) FROM t1 WHERE y='null' → NULL
SELECT sum(x) FROM t1 WHERE y='null' → NULL
SELECT total(x) FROM t1 WHERE y='null' → 0
SELECT count(x) FROM t1 WHERE y='null' → 0

Sources: test/evidence/slt_lang_aggfunc.test:184-196 test/evidence/slt_lang_aggfunc.test:396-418

Type Coercion and String Handling

Aggregate functions interpret non-numeric strings as 0 when performing arithmetic operations.

String Coercion Test Results:

flowchart LR
    Input["Input Value"]
CheckType{"Value Type"}
IsNumber["Use numeric value"]
IsString["Check if looks\nlike number"]
IsNull["Exclude from\naggregation"]
LooksNumeric{"Parseable\nas number?"}
UseZero["Interpret as 0"]
ParseValue["Parse to number"]
Input --> CheckType
 
   CheckType -->|INTEGER/REAL| IsNumber
 
   CheckType -->|TEXT/BLOB| IsString
 
   CheckType -->|NULL| IsNull
    
 
   IsString --> LooksNumeric
 
   LooksNumeric -->|No| UseZero
 
   LooksNumeric -->|Yes| ParseValue

Testing with y VARCHAR(8) column containing values: ‘true’, ‘false’, ‘NULL’

Function	Result	Explanation
`count(y)`	5	Counts non-NULL strings
`avg(y)`	0	Strings interpreted as 0
`sum(y)`	0	All strings → 0, sum is 0
`total(y)`	0	All strings → 0, sum is 0.0
`min(y)`	0	Minimum of converted values
`max(y)`	0	Maximum of converted values
`group_concat(y)`	‘true,false,NULL,true,true’	Concatenates as strings

Evidence Reference: R-29052-00975

Sources: test/evidence/slt_lang_aggfunc.test:92-166

graph TD
    AvgFunc["avg()"]
SumFunc["sum()"]
TotalFunc["total()"]
CountFunc["count()"]
OtherFunc["min() / max() /\ngroup_concat()"]
AvgAlways["Always REAL\n(floating point)"]
SumCheck{"All inputs\nare INTEGER?"}
SumInt["Return INTEGER"]
SumFloat["Return REAL"]
TotalAlways["Always REAL"]
CountAlways["Always INTEGER"]
OtherMatch["Return type matches\ninput column type"]
AvgFunc --> AvgAlways
 
   SumFunc --> SumCheck
 
   SumCheck -->|Yes| SumInt
 
   SumCheck -->|No| SumFloat
 
   TotalFunc --> TotalAlways
 
   CountFunc --> CountAlways
 
   OtherFunc --> OtherMatch
    
    style AvgAlways fill:#f9f9f9
    style SumInt fill:#f9f9f9
    style SumFloat fill:#f9f9f9
    style TotalAlways fill:#f9f9f9
    style CountAlways fill:#f9f9f9
    style OtherMatch fill:#f9f9f9

Return Type Behavior

Aggregate functions return different types depending on the input data and function semantics.

Return Type Rules:

Type Coercion Examples:

Query	Input Data	Expected Result	Result Type
`SELECT avg(x) FROM t1`	[1, 0, 2, 2] (all INT)	1.250	REAL (R)
`SELECT sum(x) FROM t1`	[1, 0, 2, 2] (all INT)	5	INTEGER (I)
`SELECT sum(x) FROM t1`	[1, 0, 4.0, 2, 2] (mixed)	9.000	REAL (R)
`SELECT total(x) FROM t1`	[1, 0, 2, 2] (all INT)	5.000	REAL (R)

Evidence References:

R-17177-10067 (avg always returns REAL)
R-19660-56479 (sum returns INTEGER for all-integer input)
R-33611-59266 (sum returns REAL if any non-integer input)
R-07734-01023 (total always returns REAL)

Sources: test/evidence/slt_lang_aggfunc.test:168-182 test/evidence/slt_lang_aggfunc.test:420-464

Aggregate Functions Reference

count()

Syntax: count(x) or count(DISTINCT x) or count(*)

Behavior:

count(x): Returns the number of non-NULL values in column x
count(*): Returns the total number of rows in the group
count(DISTINCT x): Returns the number of distinct non-NULL values

Return Type: INTEGER

NULL Handling: NULLs are excluded; returns 0 if no non-NULL values

Special Cases:

count(DISTINCT *) is a syntax error
count(*) counts all rows, including those with all NULL columns

Test Coverage: test/evidence/slt_lang_aggfunc.test:28-31 test/evidence/slt_lang_aggfunc.test:198-234

Evidence References: R-34280-42283, R-13776-21310

avg()

Syntax: avg(x) or avg(DISTINCT x)

Behavior:

Returns the average of all non-NULL values
Always returns a floating-point value, even for integer inputs

Return Type: REAL (always)

NULL Handling: Returns NULL if all values are NULL

Type Coercion: Non-numeric strings interpreted as 0

Test Coverage: test/evidence/slt_lang_aggfunc.test:33-36 test/evidence/slt_lang_aggfunc.test:83-90 test/evidence/slt_lang_aggfunc.test:168-182

Evidence References: R-20409-33051, R-17177-10067, R-40597-22164

sum()

Syntax: sum(x) or sum(DISTINCT x)

Behavior:

Returns the sum of all non-NULL values
Return type depends on input: INTEGER if all inputs are integers, REAL otherwise
Throws “integer overflow” exception on overflow with all-integer input

Return Type: INTEGER (all-integer input) or REAL (mixed input)

NULL Handling: Returns NULL if all values are NULL

Overflow Behavior: Integer overflow causes exception; test demonstrates with 1<<63 values

Test Coverage: test/evidence/slt_lang_aggfunc.test:38-41 test/evidence/slt_lang_aggfunc.test:375-394 test/evidence/slt_lang_aggfunc.test:434-464

Evidence References: R-24943-34514, R-19660-56479, R-33611-59266

total()

Syntax: total(x) or total(DISTINCT x)

Behavior:

Similar to sum() but always returns REAL
Returns 0.0 instead of NULL when all values are NULL
Never throws integer overflow exception

Return Type: REAL (always)

NULL Handling: Returns 0.0 if all values are NULL

Comparison with sum():

Scenario	sum()	total()
All NULL	NULL	0.0
All integers	INTEGER	REAL
Integer overflow	Exception	No exception

Test Coverage: test/evidence/slt_lang_aggfunc.test:43-46 test/evidence/slt_lang_aggfunc.test:420-432 test/evidence/slt_lang_aggfunc.test:489-499

Evidence References: R-24943-34514, R-44223-43966, R-07734-01023

min()

Syntax: min(x) or min(DISTINCT x)

Behavior:

Returns the minimum non-NULL value
“Minimum” is the first value that would appear in ORDER BY x
Uses column’s collation for TEXT comparisons

Return Type: Matches input column type

NULL Handling: Returns NULL if all values are NULL

DISTINCT Effect: No practical difference (min is min regardless of duplicates)

Test Coverage: test/evidence/slt_lang_aggfunc.test:48-51 test/evidence/slt_lang_aggfunc.test:325-370

Evidence References: R-16028-39081, R-30311-39793, R-10396-30188

max()

Syntax: max(x) or max(DISTINCT x)

Behavior:

Returns the maximum non-NULL value
“Maximum” is the last value that would appear in ORDER BY x
Uses column’s collation for TEXT comparisons

Return Type: Matches input column type

NULL Handling: Returns NULL if all values are NULL

DISTINCT Effect: No practical difference (max is max regardless of duplicates)

Test Coverage: test/evidence/slt_lang_aggfunc.test:53-56 test/evidence/slt_lang_aggfunc.test:277-323

Evidence References: R-52585-35928, R-13053-11096, R-50775-16353

group_concat()

Syntax: group_concat(x) or group_concat(x, sep) or group_concat(DISTINCT x)

Behavior:

Concatenates all non-NULL values into a single string
Default separator is comma (“,”)
Optional second parameter specifies custom separator
DISTINCT can only be used with single-parameter form

Return Type: TEXT

NULL Handling:

NULL values are excluded from concatenation
Returns NULL if all values are NULL

Separator Rules:

Default: “,”
Custom: second parameter (e.g., group_concat(x, ':'))
DISTINCT with custom separator is syntax error

Examples:

Query	Input [1,0,2,2]	Result
`group_concat(x)`	[1, 0, 2, 2]	“1,0,2,2”
`group_concat(DISTINCT x)`	[1, 0, 2, 2]	“1,0,2”
`group_concat(x, ':')`	[1, 0, 2, 2]	“1:0:2:2”
`group_concat(DISTINCT x, ':')`	[1, 0, 2, 2]	Syntax error

Test Coverage: test/evidence/slt_lang_aggfunc.test:58-62 test/evidence/slt_lang_aggfunc.test:236-275

Evidence References: R-56088-25150, R-08600-21007, R-39910-14723

Sources: test/evidence/slt_lang_aggfunc.test:1-500

flowchart TD
    LargeInt["Large Integer Input\n(1<<63)"]
SumFunc["sum()
function"]
TotalFunc["total()
function"]
AllInt{"All inputs\nare INTEGER?"}
Overflow["Integer Overflow\nException Thrown"]
FloatResult["Convert to REAL\nReturn approximation"]
NoException["No Exception\nReturn REAL result"]
LargeInt --> SumFunc
 
   LargeInt --> TotalFunc
    
 
   SumFunc --> AllInt
 
   AllInt -->|Yes| Overflow
 
   AllInt -->|No| FloatResult
    
 
   TotalFunc --> NoException

Integer Overflow Handling

The test suite includes specific validation for integer overflow behavior when using large values.

Overflow Test Data:

Inserts 1<<63 (9223372036854775808) twice
Inserts -1 once
Expected overflow when summing

Test Results:

Function	Input Type	Result Behavior
`sum(x)`	All INTEGER	Exception (empty result)
`sum(DISTINCT x)`	All INTEGER	Converts to REAL: -9.223…e18
`total(x)`	All INTEGER	No exception: -1.844…e19
`total(DISTINCT x)`	All INTEGER	No exception: -9.223…e18

Evidence References:

R-08904-24719 (sum() throws integer overflow)
R-19553-64528 (total() never throws overflow)

Sources: test/evidence/slt_lang_aggfunc.test:466-499

Test Execution Directives

The aggregate function test file uses specific directives to control test execution and result comparison.

Key Directives Used:

Directive	Example	Purpose
`statement ok`	`CREATE TABLE t1(...)`	Execute statement, expect success
`query I nosort`	Integer result, no sorting	Query returning integer values
`query R nosort`	Real result, no sorting	Query returning floating-point values
`query T nosort`	Text result, no sorting	Query returning text values
`label-NULL`	Special NULL indicator	Label for expected NULL result
`label-sum`	Descriptive label	Label for sum result queries
`skipif sqlite`	Platform control	Skip on SQLite
`halt`	Stop execution	Halt test execution
`statement error`	Expected error	Expect statement to fail
`NOT INDEXED`	Query hint	Force table scan, avoid index

Platform Restriction:

The file begins with a platform check that halts execution on non-SQLite databases:

skipif sqlite
halt

This inverse logic (skip if SQLite, then halt) means: “If this is NOT SQLite, halt execution.”

Sources: test/evidence/slt_lang_aggfunc.test:1-500

Evidence-Based Testing

The test file validates specific SQLite documentation requirements. Each test section includes comments with “EVIDENCE-OF” markers.

Evidence Reference Format:

Each comment references a specific requirement from SQLite’s documentation:

Format: # EVIDENCE-OF: R-XXXXX-XXXXX <description>
The R-number is a unique identifier for that requirement

Evidence Coverage Map:

Evidence ID	Requirement	Test Lines
R-00466-56349	DISTINCT keyword support in aggregates	24-31
R-00171-59428	DISTINCT filters duplicates before aggregation	64-82
R-31453-41389	count(DISTINCT X) returns distinct count	64-82
R-20409-33051	avg() returns average of non-NULL values	83-90
R-29052-00975	Non-numeric strings interpreted as 0	92-166
R-17177-10067	avg() always returns floating point	168-182
R-40597-22164	avg() returns NULL if all NULL	184-196
R-34280-42283	count(X) counts non-NULL values	198-220
R-13776-21310	count(*) counts all rows	222-234
R-56088-25150	group_concat() concatenates non-NULL	236-248
R-08600-21007	group_concat() custom separator	250-261
R-39910-14723	group_concat() default comma separator	263-275
R-52585-35928	max() returns maximum value	277-289
R-13053-11096	max() uses ORDER BY ordering	291-309
R-50775-16353	max() returns NULL if all NULL	311-323
R-16028-39081	min() returns minimum non-NULL	325-337
R-30311-39793	min() uses ORDER BY ordering	339-356
R-10396-30188	min() returns NULL if all NULL	358-370
R-24943-34514	sum()/total() return sum of non-NULL	372-394
R-44223-43966	sum() NULL vs total() 0 for all-NULL	396-418
R-07734-01023	total() always returns floating point	420-432
R-19660-56479	sum() returns integer for all-integer	434-446
R-33611-59266	sum() returns float for mixed input	448-464
R-08904-24719	sum() throws overflow exception	466-488
R-19553-64528	total() never throws overflow	489-499

Sources: test/evidence/slt_lang_aggfunc.test:1-500

Dismiss

Refresh this wiki

Enter email to refresh

Keyboard shortcuts

sqlite-sqllogictest-corpus Documentation