This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Usage Guide
Relevant source files
Purpose and Scope
This guide provides practical instructions for using the sqlite-sqllogictest-corpus repository to extract and utilize SQLite's official SQL Logic Test corpus. It covers local extraction workflows, CI/CD integration patterns, and common usage scenarios.
For architectural details about the extraction pipeline, see System Architecture. For detailed information about the test file format and directives, see Test File Format Specification. For building and running the extractor in detail, see Building and Running the Extractor. For CI/CD integration examples, see Integrating with CI/CD Systems.
Prerequisites
Before using the corpus extraction system, ensure the following are available:
| Requirement | Purpose | Minimum Version |
|---|---|---|
| Docker Engine | Container runtime for slt-gen image | 20.10+ |
| Git | Repository cloning and version control | 2.30+ |
| Disk Space | Test corpus storage | ~500 MB free |
| Network Access | Fossil repository cloning | HTTPS outbound |
The extraction process requires no additional dependencies as all tools (Fossil SCM, bash, tcl) are bundled within the Docker image.
Sources: Dockerfile:1-36 README.md:1-27
Local Extraction Workflow
Docker Image Build Process
The following diagram illustrates the multi-stage process for building the slt-gen Docker image:
flowchart TD
START["docker build -t slt-gen ."] --> BASE["FROM debian:stable-slim"]
BASE --> PKGS["apt-get install\nfossil, bash, build-essential,\nca-certificates, curl, tcl"]
PKGS --> CLONE["fossil clone\nhttps://www.sqlite.org/sqllogictest/\n/src/sqllogictest.fossil"]
CLONE --> OPEN["fossil open\n/src/sqllogictest.fossil"]
OPEN --> SCRIPT["Create /usr/local/bin/slt-extract\nCopy /src/test to destination"]
SCRIPT --> ENTRY["ENTRYPOINT slt-extract"]
ENTRY --> COMPLETE["Image: slt-gen\nReady for extraction"]
style START fill:#f9f9f9
style COMPLETE fill:#f9f9f9
Sources: Dockerfile:1-36
Building the Image
Execute the build command from the repository root:
The build process performs the following operations:
- Base Layer : Initializes Debian stable-slim environment Dockerfile1
- Dependency Installation : Installs Fossil SCM and build tools Dockerfile:5-12
- Repository Cloning : Clones the official sqllogictest Fossil repository Dockerfile:15-17
- Extractor Script : Creates the
slt-extractutility Dockerfile:20-31 - Entry Point Configuration : Sets container entry point to
slt-extractDockerfile35
Build output confirms successful layer creation and Fossil repository synchronization.
Sources: Dockerfile:1-36 README.md:7-11
Volume Mounting Architecture
The extraction process uses Docker volume mounting to transfer files from the container to the host filesystem:
flowchart LR
subgraph HOST["Host Filesystem"]
CWD["$PWD"]
TEST_DIR["test/"]
end
subgraph CONTAINER["slt-gen Container"]
SRC["/src/test/\nFossil repository tests"]
WORK["/work/test/\nVolume mount point"]
EXTRACT["slt-extract script"]
end
CWD --> TEST_DIR
TEST_DIR -.->|Volume mount -v $PWD/test:/work/test| WORK
SRC -->|cp -R| WORK
EXTRACT -->|Executes copy| SRC
WORK -.->|Persists to| TEST_DIR
style HOST fill:#f9f9f9
style CONTAINER fill:#f9f9f9
Sources: Dockerfile:20-31 .github/workflows/update-corpus.yml:28-31
Running the Extractor
Execute the extraction sequence with the following commands:
Command breakdown:
| Command | Purpose | File Reference |
|---|---|---|
rm -rf test | Remove existing test directory | .github/workflows/update-corpus.yml29 |
mkdir test | Create fresh test directory | .github/workflows/update-corpus.yml30 |
docker run --rm | Run container with auto-removal | .github/workflows/update-corpus.yml31 |
-v "$PWD/test:/work/test" | Mount host directory to container | .github/workflows/update-corpus.yml31 |
slt-gen | Image name (runs slt-extract entrypoint) | .github/workflows/update-corpus.yml31 |
The slt-extract script copies all files from /src/test (Fossil repository content) to /work/test (volume-mounted host directory) Dockerfile:24-28
Expected output:
copied corpus to /work/test
Sources: .github/workflows/update-corpus.yml:28-31 Dockerfile:20-31 README.md:13-19
Working with Extracted Tests
Directory Structure Post-Extraction
After successful extraction, the test/ directory contains the complete corpus organized by test category:
test/
├── evidence/
│ ├── slt_lang_createtrigger.test
│ ├── slt_lang_createview.test
│ ├── slt_lang_dropindex.test
│ ├── slt_lang_droptable.test
│ ├── slt_lang_droptrigger.test
│ ├── slt_lang_dropview.test
│ ├── slt_lang_reindex.test
│ ├── slt_lang_replace.test
│ ├── slt_lang_update.test
│ ├── slt_lang_aggfunc.test
│ ├── in1.test
│ └── in2.test
└── index/
└── between/
├── 1/
├── 10/
└── 100/
For detailed organization structure, see Test Organization Structure.
Sources: Dockerfile:24-28
Verifying Extraction Success
Validate the extraction by checking for expected test files:
The corpus should contain hundreds of .test files organized into evidence/ and index/ subdirectories.
Sources: .github/workflows/update-corpus.yml:36-44
Automated Update Workflow
sequenceDiagram
participant SCHED as "GitHub Scheduler"
participant WF as "update-corpus.yml"
participant DOCKER as "Docker Engine"
participant FOSSIL as "Fossil Repository"
participant GIT as "Git Repository"
Note over SCHED: Every Monday 06:00 UTC
SCHED->>WF: Trigger workflow
WF->>WF: Checkout repository\nactions/checkout@v4
WF->>DOCKER: docker build -t slt-gen .
DOCKER->>FOSSIL: fossil clone\nhttps://www.sqlite.org/sqllogictest/
FOSSIL-->>DOCKER: Repository data
DOCKER-->>WF: Image ready
WF->>WF: rm -rf test && mkdir test
WF->>DOCKER: docker run --rm\n-v $PWD/test:/work/test slt-gen
DOCKER->>WF: Extraction complete
WF->>WF: git status --porcelain
alt "Changes detected & not PR"
WF->>GIT: git add test
WF->>GIT: git commit -m 'Update corpus'
WF->>GIT: git push
else "No changes or is PR"
WF->>WF: Skip commit
end
GitHub Actions Integration
The repository includes a pre-configured GitHub Actions workflow that automates corpus updates:
Sources: .github/workflows/update-corpus.yml:1-45
Workflow Triggers
The workflow activates under three conditions:
| Trigger Type | Schedule/Event | Purpose | Configuration |
|---|---|---|---|
| Scheduled | Every Monday 06:00 UTC | Weekly automatic updates | .github/workflows/update-corpus.yml:4-6 |
| Manual | workflow_dispatch | On-demand corpus refresh | .github/workflows/update-corpus.yml7 |
| Pull Request | Path changes | Testing workflow modifications | .github/workflows/update-corpus.yml:8-12 |
The scheduled trigger uses cron syntax "0 6 * * 1" .github/workflows/update-corpus.yml6
Sources: .github/workflows/update-corpus.yml:3-12
flowchart TD
EXTRACT["Extraction complete"] --> STATUS["git status --porcelain"]
STATUS --> CHECK_EVENT{"Event type?"}
CHECK_EVENT -->|pull_request| SKIP["Skip commit\nPrevent PR pollution"]
CHECK_EVENT -->|schedule or workflow_dispatch| CHECK_CHANGES{"Changes\ndetected?"}
CHECK_CHANGES -->|Output exists| COMMIT["git config user\ngit add test\ngit commit\ngit push"]
CHECK_CHANGES -->|No output| NO_UPDATE["Echo 'No updates'\nSkip commit"]
SKIP --> END["Workflow complete"]
COMMIT --> END
NO_UPDATE --> END
Change Detection and Commit Logic
The workflow implements intelligent change detection to avoid unnecessary commits:
The conditional logic at .github/workflows/update-corpus.yml34 prevents commits during pull request builds, while .github/workflows/update-corpus.yml36 uses git status --porcelain to detect file changes.
Sources: .github/workflows/update-corpus.yml:33-44
Integration Patterns
CI/CD Pipeline Integration
The corpus can be integrated into various continuous integration systems using the extraction workflow.
Pattern 1: Direct Docker Execution
For CI systems with Docker support (GitHub Actions, GitLab CI, Circle CI):
This pattern builds the extractor image directly from the repository URL without cloning.
Sources: .github/workflows/update-corpus.yml:24-31 README.md:7-19
Pattern 2: Repository Cloning
For systems preferring Git-based workflows:
This pattern provides version control for the extraction infrastructure.
Sources: README.md:1-27
Pattern 3: Pre-extracted Corpus Usage
For CI systems with limited Docker support:
This pattern uses the pre-extracted corpus committed to the repository, avoiding Docker dependency.
Sources: .github/workflows/update-corpus.yml39
Local Development Workflow
For iterative local testing during SQL engine development:
The build step is executed once; extraction is refreshed periodically to obtain upstream updates.
Sources: README.md:5-19
Common Usage Scenarios
Scenario 1: Database Vendor Compliance Testing
Database vendors can validate SQL standard compliance:
For conditional execution details, see Conditional Execution Directives.
Sources: README.md:5-19
Scenario 2: SQL Parser Development
Parser developers can use the corpus for syntax validation:
This extracts raw SQL statements without execution context for parser-only validation.
Sources: Dockerfile:24-28
Scenario 3: Research and SQL Semantics Analysis
Researchers can analyze SQL behavior patterns:
For aggregate function test details, see Aggregate Function Tests.
Sources: Dockerfile:24-28
Troubleshooting
Common Issues and Solutions
| Issue | Symptom | Solution | Reference |
|---|---|---|---|
| Build failure | Docker build fails with network error | Check HTTPS access to sqlite.org | Dockerfile15 |
| Empty test directory | test/ directory created but empty | Verify volume mount path is absolute | Dockerfile25 |
| Permission errors | Cannot write to mounted volume | Ensure Docker has write permissions to host directory | .github/workflows/update-corpus.yml31 |
| Stale corpus | Tests outdated compared to upstream | Rebuild image to re-clone Fossil repository | Dockerfile:15-17 |
Verifying Fossil Repository Clone
To confirm successful Fossil repository cloning during image build:
The Fossil repository clone occurs at Dockerfile:15-17 during image build.
Sources: Dockerfile:14-17
Manual Extraction Script Execution
For debugging extraction issues, the slt-extract script can be executed manually:
The script implementation is defined at Dockerfile:20-31
Sources: Dockerfile:20-33
Advanced Usage
Custom Extraction Destination
Override the default extraction path by passing an argument:
The slt-extract script accepts a destination path parameter Dockerfile25
Sources: Dockerfile25
Selective Test Extraction
Extract only specific test categories using volume filtering:
Alternatively, modify the slt-extract script for selective copying before building the image.
Sources: Dockerfile:24-28
Image Size Optimization
The built image contains the full Fossil repository and build tools. For production CI/CD, consider:
- Multi-stage builds : Separate Fossil clone from final extraction
- Volume caching : Cache built image to avoid rebuilds
- Pre-extracted artifacts : Use repository-committed tests directly
The current single-stage design prioritizes simplicity over size optimization Dockerfile:1-36
Sources: Dockerfile:1-36
Next Steps
After extracting the corpus:
- Implement test runner : Parse and execute
.testfiles against your SQL engine. See Test File Format Specification for format details. - Filter by database platform : Use conditional directives to select relevant tests. See Conditional Execution Directives.
- Set up automated testing : Integrate extraction into CI/CD pipeline. See Integrating with CI/CD Systems.
- Explore test categories : Review available test types and coverage. See Test Corpus Reference.
Sources: README.md:1-27