This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Building and Running the Extractor
Relevant source files
Purpose and Scope
This page provides step-by-step instructions for building the slt-gen Docker image and running the extraction process to obtain a local copy of the SQLite sqllogictest corpus. It covers the practical mechanics of using the Docker-based extraction system locally, including prerequisites, build commands, and execution steps.
For information about how the automated update workflow uses this extraction system in CI/CD, see Automated Update Workflow. For information about integrating the corpus into your own CI/CD pipelines after extraction, see Integrating with CI/CD Systems.
Prerequisites
The extraction system requires the following tools to be installed on your local machine:
| Tool | Purpose | Minimum Version |
|---|---|---|
| Docker | Container runtime for building and executing the slt-gen image | 20.10+ |
| Bash | Shell for executing extraction commands | 4.0+ |
| Git | (Optional) Version control if cloning this repository | 2.0+ |
The Docker image itself handles all other dependencies internally, including the Fossil SCM client and the SQLite repository cloning process.
Sources: README.md:1-27 Dockerfile:1-36
Building the Docker Image
Build Command
The Docker image is built using the standard docker build command with the tag slt-gen:
This command must be executed from the repository root directory where the Dockerfile is located.
Sources: README.md:9-11
Build Process Overview
Build Process: Docker Image Construction Pipeline
Sources: Dockerfile:1-36
Build Steps Breakdown
The build process executes the following stages as defined in the Dockerfile:
-
Base Image Selection Dockerfile:1-3
- Uses
debian:stable-slimas the foundation - Sets
DEBIAN_FRONTEND=noninteractiveto prevent prompts during package installation
- Uses
-
System Dependencies Installation Dockerfile:5-12
- Installs:
bash,build-essential,ca-certificates,curl,fossil,tcl - Cleans up apt cache to minimize image size
- Installs:
-
Fossil Repository Cloning Dockerfile:14-17
- Sets working directory to
/src - Executes
fossil clone https://www.sqlite.org/sqllogictest/to download the repository - Opens the fossil file with
fossil opento extract the working tree - Configures default user as
root
- Sets working directory to
-
Extraction Script Creation Dockerfile:19-33
- Switches working directory to
/work - Creates
/usr/local/bin/slt-extractbash script using heredoc syntax - Makes the script executable with
chmod +x
- Switches working directory to
-
Entrypoint Configuration Dockerfile35
- Sets
ENTRYPOINTto executeslt-extractwhen the container runs
- Sets
Sources: Dockerfile:1-36
Running the Extraction
Extraction Command
To extract the test corpus to a local test/ directory:
The --rm flag automatically removes the container after execution. The -v flag mounts the local test/ directory into the container at /work/test.
Sources: README.md:15-19
Volume Mount Mapping
Volume Mapping: Host and Container Filesystem Relationship
Sources: README.md18 Dockerfile:25-28
Extraction Process Flow
Execution Flow: Container Lifecycle and File Extraction
Sources: Dockerfile:20-31 README.md18
Understanding the slt-extract Script
The extraction logic is implemented in the slt-extract bash script, which is embedded directly in the Dockerfile at Dockerfile:20-31
Script Components
| Component | Code Reference | Description |
|---|---|---|
| Shebang | #!/usr/bin/env bash | Specifies bash as the interpreter |
| Error Handling | set -euo pipefail | Exit on error, undefined variables, and pipe failures |
| Source Path | src_root="/src/test" | Location of test corpus in Fossil working tree |
| Destination Path | dest_root="${1:-/work/test}" | Target directory with default value |
| Directory Creation | mkdir -p "$dest_root" | Ensures destination directory exists |
| File Copy | cp -R "$src_root/." "$dest_root/" | Recursively copies all test files |
| Confirmation Output | echo "copied corpus to $dest_root" | Reports successful extraction |
Sources: Dockerfile:20-31
Script Invocation
The script is invoked as the container's entrypoint and accepts an optional argument to override the default destination path:
Sources: Dockerfile25 Dockerfile35
graph TD
TEST["test/"]
EVIDENCE["evidence/\nSQL language specification tests"]
INDEX["index/\nQuery optimization tests"]
EVIDENCE_DDL["DDL tests\nslt_lang_create*, slt_lang_drop*"]
EVIDENCE_DML["DML tests\nslt_lang_update, slt_lang_replace"]
EVIDENCE_DQL["DQL tests\nin1.test, in2.test, slt_lang_aggfunc"]
INDEX_BETWEEN["between/\nBETWEEN operator suites"]
INDEX_OTHER["Other index tests"]
TEST --> EVIDENCE
TEST --> INDEX
EVIDENCE --> EVIDENCE_DDL
EVIDENCE --> EVIDENCE_DML
EVIDENCE --> EVIDENCE_DQL
INDEX --> INDEX_BETWEEN
INDEX --> INDEX_OTHER
Extraction Output Structure
After successful extraction, the test/ directory contains the complete corpus organized into subdirectories:
Extracted Corpus: Directory Structure and Test Categories
For detailed information about test organization and taxonomy, see Test Organization Structure.
Sources: README.md21
Common Usage Patterns
Pattern 1: Fresh Extraction
This is the standard usage pattern for obtaining a clean copy of the corpus:
Sources: README.md:9-19
Pattern 2: Incremental Update
If the Docker image has already been built, you can update the corpus without rebuilding:
Sources: README.md:15-19
Pattern 3: Custom Destination Directory
Extract to a directory other than test/:
The script always writes to /work/test inside the container, but the volume mount determines where files appear on the host.
Sources: Dockerfile25 README.md18
Pattern 4: One-Off Extraction Without Cleanup
If you want to inspect the container without auto-removal:
Note: This pattern is not recommended for regular use as it leaves stopped containers accumulating.
Sources: README.md18
Verification and Validation
Verifying Successful Extraction
After extraction, verify the corpus is present:
Expected output should show hundreds of .test files organized into the evidence/ and index/ subdirectories.
Sources: README.md21
Validating Test File Format
Test files should contain SQL Logic Test directives. Example validation:
For detailed information about test file format, see Test File Format Specification.
Sources: Referenced from overall architecture understanding
Troubleshooting
Issue: Docker Build Fails During Fossil Clone
Symptom: Build fails at the fossil clone step with network errors.
Cause: Network connectivity issues or upstream repository unavailable.
Solution:
Sources: Dockerfile15
Issue: Volume Mount Permissions Error
Symptom: docker run fails with permission denied errors when writing to mounted volume.
Cause: Docker container runs as root but host directory has restricted permissions.
Solution:
Sources: README.md18
Issue: Empty test/ Directory After Extraction
Symptom: Container executes successfully but test/ directory remains empty.
Cause: Incorrect volume mount path or destination parameter mismatch.
Solution:
Sources: Dockerfile25 README.md18
Issue: Fossil Clone Creates Outdated Corpus
Symptom: Extracted tests don't match latest upstream changes.
Cause: Docker layer caching prevents fresh Fossil clone.
Solution:
Sources: Dockerfile:15-17
Issue: Insufficient Disk Space
Symptom: Extraction fails with "no space left on device" error.
Cause: The Fossil repository and extracted tests require significant disk space (typically 1-2 GB).
Solution:
Sources: General Docker troubleshooting knowledge
Build and Extraction Internals
Docker Layer Architecture
The Dockerfile creates multiple layers during the build process:
Docker Build: Layer Caching Strategy
Layers 1-2 are typically cached and reused between builds. Layer 3 (Fossil clone) is invalidated when upstream changes occur or when using --no-cache. Layers 4-7 depend on Layer 3 and will rebuild if Layer 3 changes.
Sources: Dockerfile:1-36
File System Paths Summary
| Path | Location | Purpose |
|---|---|---|
/src | Container | Fossil repository working directory |
/src/sqllogictest.fossil | Container | Fossil repository database file |
/src/test/ | Container | Extracted Fossil working tree |
/work | Container | Container working directory |
/work/test/ | Container | Default extraction destination |
/usr/local/bin/slt-extract | Container | Extraction script executable |
$PWD/test/ | Host | Mounted volume for output |
Sources: Dockerfile14 Dockerfile19 Dockerfile:24-25 Dockerfile20 README.md18
Next Steps
After successfully extracting the corpus:
-
Explore Test Files : Navigate through
test/evidence/andtest/index/to understand test organization. See Test Organization Structure. -
Understand Test Format : Review the SQL Logic Test file format and directives. See Test File Format Specification.
-
Run Tests : Integrate tests into your SQL engine testing workflow. See Integrating with CI/CD Systems.
-
Handle Cross-Database Compatibility : Learn about platform-specific test execution. See Cross-Database Compatibility.
Sources: Overall documentation structure