Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Usage Guide

Relevant source files

Purpose and Scope

This guide provides practical instructions for using the sqlite-sqllogictest-corpus repository to extract and utilize SQLite's official SQL Logic Test corpus. It covers local extraction workflows, CI/CD integration patterns, and common usage scenarios.

For architectural details about the extraction pipeline, see System Architecture. For detailed information about the test file format and directives, see Test File Format Specification. For building and running the extractor in detail, see Building and Running the Extractor. For CI/CD integration examples, see Integrating with CI/CD Systems.


Prerequisites

Before using the corpus extraction system, ensure the following are available:

RequirementPurposeMinimum Version
Docker EngineContainer runtime for slt-gen image20.10+
GitRepository cloning and version control2.30+
Disk SpaceTest corpus storage~500 MB free
Network AccessFossil repository cloningHTTPS outbound

The extraction process requires no additional dependencies as all tools (Fossil SCM, bash, tcl) are bundled within the Docker image.

Sources: Dockerfile:1-36 README.md:1-27


Local Extraction Workflow

Docker Image Build Process

The following diagram illustrates the multi-stage process for building the slt-gen Docker image:

flowchart TD
 
   START["docker build -t slt-gen ."] --> BASE["FROM debian:stable-slim"]
BASE --> PKGS["apt-get install\nfossil, bash, build-essential,\nca-certificates, curl, tcl"]
PKGS --> CLONE["fossil clone\nhttps://www.sqlite.org/sqllogictest/\n/src/sqllogictest.fossil"]
CLONE --> OPEN["fossil open\n/src/sqllogictest.fossil"]
OPEN --> SCRIPT["Create /usr/local/bin/slt-extract\nCopy /src/test to destination"]
SCRIPT --> ENTRY["ENTRYPOINT slt-extract"]
ENTRY --> COMPLETE["Image: slt-gen\nReady for extraction"]
style START fill:#f9f9f9
    style COMPLETE fill:#f9f9f9

Sources: Dockerfile:1-36

Building the Image

Execute the build command from the repository root:

The build process performs the following operations:

  1. Base Layer : Initializes Debian stable-slim environment Dockerfile1
  2. Dependency Installation : Installs Fossil SCM and build tools Dockerfile:5-12
  3. Repository Cloning : Clones the official sqllogictest Fossil repository Dockerfile:15-17
  4. Extractor Script : Creates the slt-extract utility Dockerfile:20-31
  5. Entry Point Configuration : Sets container entry point to slt-extract Dockerfile35

Build output confirms successful layer creation and Fossil repository synchronization.

Sources: Dockerfile:1-36 README.md:7-11


Volume Mounting Architecture

The extraction process uses Docker volume mounting to transfer files from the container to the host filesystem:

flowchart LR
    subgraph HOST["Host Filesystem"]
CWD["$PWD"]
TEST_DIR["test/"]
end
    
    subgraph CONTAINER["slt-gen Container"]
SRC["/src/test/\nFossil repository tests"]
WORK["/work/test/\nVolume mount point"]
EXTRACT["slt-extract script"]
end
    
 
   CWD --> TEST_DIR
 
   TEST_DIR -.->|Volume mount -v $PWD/test:/work/test| WORK
 
   SRC -->|cp -R| WORK
 
   EXTRACT -->|Executes copy| SRC
 
   WORK -.->|Persists to| TEST_DIR
    
    style HOST fill:#f9f9f9
    style CONTAINER fill:#f9f9f9

Sources: Dockerfile:20-31 .github/workflows/update-corpus.yml:28-31

Running the Extractor

Execute the extraction sequence with the following commands:

Command breakdown:

CommandPurposeFile Reference
rm -rf testRemove existing test directory.github/workflows/update-corpus.yml29
mkdir testCreate fresh test directory.github/workflows/update-corpus.yml30
docker run --rmRun container with auto-removal.github/workflows/update-corpus.yml31
-v "$PWD/test:/work/test"Mount host directory to container.github/workflows/update-corpus.yml31
slt-genImage name (runs slt-extract entrypoint).github/workflows/update-corpus.yml31

The slt-extract script copies all files from /src/test (Fossil repository content) to /work/test (volume-mounted host directory) Dockerfile:24-28

Expected output:

copied corpus to /work/test

Sources: .github/workflows/update-corpus.yml:28-31 Dockerfile:20-31 README.md:13-19


Working with Extracted Tests

Directory Structure Post-Extraction

After successful extraction, the test/ directory contains the complete corpus organized by test category:

test/
├── evidence/
│   ├── slt_lang_createtrigger.test
│   ├── slt_lang_createview.test
│   ├── slt_lang_dropindex.test
│   ├── slt_lang_droptable.test
│   ├── slt_lang_droptrigger.test
│   ├── slt_lang_dropview.test
│   ├── slt_lang_reindex.test
│   ├── slt_lang_replace.test
│   ├── slt_lang_update.test
│   ├── slt_lang_aggfunc.test
│   ├── in1.test
│   └── in2.test
└── index/
    └── between/
        ├── 1/
        ├── 10/
        └── 100/

For detailed organization structure, see Test Organization Structure.

Sources: Dockerfile:24-28

Verifying Extraction Success

Validate the extraction by checking for expected test files:

The corpus should contain hundreds of .test files organized into evidence/ and index/ subdirectories.

Sources: .github/workflows/update-corpus.yml:36-44


Automated Update Workflow

sequenceDiagram
    participant SCHED as "GitHub Scheduler"
    participant WF as "update-corpus.yml"
    participant DOCKER as "Docker Engine"
    participant FOSSIL as "Fossil Repository"
    participant GIT as "Git Repository"
    
    Note over SCHED: Every Monday 06:00 UTC
    SCHED->>WF: Trigger workflow
    WF->>WF: Checkout repository\nactions/checkout@v4
    WF->>DOCKER: docker build -t slt-gen .
    DOCKER->>FOSSIL: fossil clone\nhttps://www.sqlite.org/sqllogictest/
    FOSSIL-->>DOCKER: Repository data
    DOCKER-->>WF: Image ready
    WF->>WF: rm -rf test && mkdir test
    WF->>DOCKER: docker run --rm\n-v $PWD/test:/work/test slt-gen
    DOCKER->>WF: Extraction complete
    WF->>WF: git status --porcelain
    alt "Changes detected & not PR"
        WF->>GIT: git add test
        WF->>GIT: git commit -m 'Update corpus'
        WF->>GIT: git push
    else "No changes or is PR"
        WF->>WF: Skip commit
    end

GitHub Actions Integration

The repository includes a pre-configured GitHub Actions workflow that automates corpus updates:

Sources: .github/workflows/update-corpus.yml:1-45

Workflow Triggers

The workflow activates under three conditions:

Trigger TypeSchedule/EventPurposeConfiguration
ScheduledEvery Monday 06:00 UTCWeekly automatic updates.github/workflows/update-corpus.yml:4-6
Manualworkflow_dispatchOn-demand corpus refresh.github/workflows/update-corpus.yml7
Pull RequestPath changesTesting workflow modifications.github/workflows/update-corpus.yml:8-12

The scheduled trigger uses cron syntax "0 6 * * 1" .github/workflows/update-corpus.yml6

Sources: .github/workflows/update-corpus.yml:3-12

flowchart TD
 
   EXTRACT["Extraction complete"] --> STATUS["git status --porcelain"]
STATUS --> CHECK_EVENT{"Event type?"}
CHECK_EVENT -->|pull_request| SKIP["Skip commit\nPrevent PR pollution"]
CHECK_EVENT -->|schedule or workflow_dispatch| CHECK_CHANGES{"Changes\ndetected?"}
CHECK_CHANGES -->|Output exists| COMMIT["git config user\ngit add test\ngit commit\ngit push"]
CHECK_CHANGES -->|No output| NO_UPDATE["Echo 'No updates'\nSkip commit"]
SKIP --> END["Workflow complete"]
COMMIT --> END
 
   NO_UPDATE --> END

Change Detection and Commit Logic

The workflow implements intelligent change detection to avoid unnecessary commits:

The conditional logic at .github/workflows/update-corpus.yml34 prevents commits during pull request builds, while .github/workflows/update-corpus.yml36 uses git status --porcelain to detect file changes.

Sources: .github/workflows/update-corpus.yml:33-44


Integration Patterns

CI/CD Pipeline Integration

The corpus can be integrated into various continuous integration systems using the extraction workflow.

Pattern 1: Direct Docker Execution

For CI systems with Docker support (GitHub Actions, GitLab CI, Circle CI):

This pattern builds the extractor image directly from the repository URL without cloning.

Sources: .github/workflows/update-corpus.yml:24-31 README.md:7-19

Pattern 2: Repository Cloning

For systems preferring Git-based workflows:

This pattern provides version control for the extraction infrastructure.

Sources: README.md:1-27

Pattern 3: Pre-extracted Corpus Usage

For CI systems with limited Docker support:

This pattern uses the pre-extracted corpus committed to the repository, avoiding Docker dependency.

Sources: .github/workflows/update-corpus.yml39

Local Development Workflow

For iterative local testing during SQL engine development:

The build step is executed once; extraction is refreshed periodically to obtain upstream updates.

Sources: README.md:5-19


Common Usage Scenarios

Scenario 1: Database Vendor Compliance Testing

Database vendors can validate SQL standard compliance:

For conditional execution details, see Conditional Execution Directives.

Sources: README.md:5-19

Scenario 2: SQL Parser Development

Parser developers can use the corpus for syntax validation:

This extracts raw SQL statements without execution context for parser-only validation.

Sources: Dockerfile:24-28

Scenario 3: Research and SQL Semantics Analysis

Researchers can analyze SQL behavior patterns:

For aggregate function test details, see Aggregate Function Tests.

Sources: Dockerfile:24-28


Troubleshooting

Common Issues and Solutions

IssueSymptomSolutionReference
Build failureDocker build fails with network errorCheck HTTPS access to sqlite.orgDockerfile15
Empty test directorytest/ directory created but emptyVerify volume mount path is absoluteDockerfile25
Permission errorsCannot write to mounted volumeEnsure Docker has write permissions to host directory.github/workflows/update-corpus.yml31
Stale corpusTests outdated compared to upstreamRebuild image to re-clone Fossil repositoryDockerfile:15-17

Verifying Fossil Repository Clone

To confirm successful Fossil repository cloning during image build:

The Fossil repository clone occurs at Dockerfile:15-17 during image build.

Sources: Dockerfile:14-17

Manual Extraction Script Execution

For debugging extraction issues, the slt-extract script can be executed manually:

The script implementation is defined at Dockerfile:20-31

Sources: Dockerfile:20-33


Advanced Usage

Custom Extraction Destination

Override the default extraction path by passing an argument:

The slt-extract script accepts a destination path parameter Dockerfile25

Sources: Dockerfile25

Selective Test Extraction

Extract only specific test categories using volume filtering:

Alternatively, modify the slt-extract script for selective copying before building the image.

Sources: Dockerfile:24-28

Image Size Optimization

The built image contains the full Fossil repository and build tools. For production CI/CD, consider:

  1. Multi-stage builds : Separate Fossil clone from final extraction
  2. Volume caching : Cache built image to avoid rebuilds
  3. Pre-extracted artifacts : Use repository-committed tests directly

The current single-stage design prioritizes simplicity over size optimization Dockerfile:1-36

Sources: Dockerfile:1-36


Next Steps

After extracting the corpus:

Sources: README.md:1-27