Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Building and Running the Extractor

Relevant source files

Purpose and Scope

This page provides step-by-step instructions for building the slt-gen Docker image and running the extraction process to obtain a local copy of the SQLite sqllogictest corpus. It covers the practical mechanics of using the Docker-based extraction system locally, including prerequisites, build commands, and execution steps.

For information about how the automated update workflow uses this extraction system in CI/CD, see Automated Update Workflow. For information about integrating the corpus into your own CI/CD pipelines after extraction, see Integrating with CI/CD Systems.


Prerequisites

The extraction system requires the following tools to be installed on your local machine:

ToolPurposeMinimum Version
DockerContainer runtime for building and executing the slt-gen image20.10+
BashShell for executing extraction commands4.0+
Git(Optional) Version control if cloning this repository2.0+

The Docker image itself handles all other dependencies internally, including the Fossil SCM client and the SQLite repository cloning process.

Sources: README.md:1-27 Dockerfile:1-36


Building the Docker Image

Build Command

The Docker image is built using the standard docker build command with the tag slt-gen:

This command must be executed from the repository root directory where the Dockerfile is located.

Sources: README.md:9-11

Build Process Overview

Build Process: Docker Image Construction Pipeline

Sources: Dockerfile:1-36

Build Steps Breakdown

The build process executes the following stages as defined in the Dockerfile:

  1. Base Image Selection Dockerfile:1-3

    • Uses debian:stable-slim as the foundation
    • Sets DEBIAN_FRONTEND=noninteractive to prevent prompts during package installation
  2. System Dependencies Installation Dockerfile:5-12

    • Installs: bash, build-essential, ca-certificates, curl, fossil, tcl
    • Cleans up apt cache to minimize image size
  3. Fossil Repository Cloning Dockerfile:14-17

    • Sets working directory to /src
    • Executes fossil clone https://www.sqlite.org/sqllogictest/ to download the repository
    • Opens the fossil file with fossil open to extract the working tree
    • Configures default user as root
  4. Extraction Script Creation Dockerfile:19-33

    • Switches working directory to /work
    • Creates /usr/local/bin/slt-extract bash script using heredoc syntax
    • Makes the script executable with chmod +x
  5. Entrypoint Configuration Dockerfile35

    • Sets ENTRYPOINT to execute slt-extract when the container runs

Sources: Dockerfile:1-36


Running the Extraction

Extraction Command

To extract the test corpus to a local test/ directory:

The --rm flag automatically removes the container after execution. The -v flag mounts the local test/ directory into the container at /work/test.

Sources: README.md:15-19

Volume Mount Mapping

Volume Mapping: Host and Container Filesystem Relationship

Sources: README.md18 Dockerfile:25-28

Extraction Process Flow

Execution Flow: Container Lifecycle and File Extraction

Sources: Dockerfile:20-31 README.md18


Understanding the slt-extract Script

The extraction logic is implemented in the slt-extract bash script, which is embedded directly in the Dockerfile at Dockerfile:20-31

Script Components

ComponentCode ReferenceDescription
Shebang#!/usr/bin/env bashSpecifies bash as the interpreter
Error Handlingset -euo pipefailExit on error, undefined variables, and pipe failures
Source Pathsrc_root="/src/test"Location of test corpus in Fossil working tree
Destination Pathdest_root="${1:-/work/test}"Target directory with default value
Directory Creationmkdir -p "$dest_root"Ensures destination directory exists
File Copycp -R "$src_root/." "$dest_root/"Recursively copies all test files
Confirmation Outputecho "copied corpus to $dest_root"Reports successful extraction

Sources: Dockerfile:20-31

Script Invocation

The script is invoked as the container's entrypoint and accepts an optional argument to override the default destination path:

Sources: Dockerfile25 Dockerfile35


graph TD
    TEST["test/"]
EVIDENCE["evidence/\nSQL language specification tests"]
INDEX["index/\nQuery optimization tests"]
EVIDENCE_DDL["DDL tests\nslt_lang_create*, slt_lang_drop*"]
EVIDENCE_DML["DML tests\nslt_lang_update, slt_lang_replace"]
EVIDENCE_DQL["DQL tests\nin1.test, in2.test, slt_lang_aggfunc"]
INDEX_BETWEEN["between/\nBETWEEN operator suites"]
INDEX_OTHER["Other index tests"]
TEST --> EVIDENCE
 
   TEST --> INDEX
    
 
   EVIDENCE --> EVIDENCE_DDL
 
   EVIDENCE --> EVIDENCE_DML
 
   EVIDENCE --> EVIDENCE_DQL
    
 
   INDEX --> INDEX_BETWEEN
 
   INDEX --> INDEX_OTHER

Extraction Output Structure

After successful extraction, the test/ directory contains the complete corpus organized into subdirectories:

Extracted Corpus: Directory Structure and Test Categories

For detailed information about test organization and taxonomy, see Test Organization Structure.

Sources: README.md21


Common Usage Patterns

Pattern 1: Fresh Extraction

This is the standard usage pattern for obtaining a clean copy of the corpus:

Sources: README.md:9-19

Pattern 2: Incremental Update

If the Docker image has already been built, you can update the corpus without rebuilding:

Sources: README.md:15-19

Pattern 3: Custom Destination Directory

Extract to a directory other than test/:

The script always writes to /work/test inside the container, but the volume mount determines where files appear on the host.

Sources: Dockerfile25 README.md18

Pattern 4: One-Off Extraction Without Cleanup

If you want to inspect the container without auto-removal:

Note: This pattern is not recommended for regular use as it leaves stopped containers accumulating.

Sources: README.md18


Verification and Validation

Verifying Successful Extraction

After extraction, verify the corpus is present:

Expected output should show hundreds of .test files organized into the evidence/ and index/ subdirectories.

Sources: README.md21

Validating Test File Format

Test files should contain SQL Logic Test directives. Example validation:

For detailed information about test file format, see Test File Format Specification.

Sources: Referenced from overall architecture understanding


Troubleshooting

Issue: Docker Build Fails During Fossil Clone

Symptom: Build fails at the fossil clone step with network errors.

Cause: Network connectivity issues or upstream repository unavailable.

Solution:

Sources: Dockerfile15

Issue: Volume Mount Permissions Error

Symptom: docker run fails with permission denied errors when writing to mounted volume.

Cause: Docker container runs as root but host directory has restricted permissions.

Solution:

Sources: README.md18

Issue: Empty test/ Directory After Extraction

Symptom: Container executes successfully but test/ directory remains empty.

Cause: Incorrect volume mount path or destination parameter mismatch.

Solution:

Sources: Dockerfile25 README.md18

Issue: Fossil Clone Creates Outdated Corpus

Symptom: Extracted tests don't match latest upstream changes.

Cause: Docker layer caching prevents fresh Fossil clone.

Solution:

Sources: Dockerfile:15-17

Issue: Insufficient Disk Space

Symptom: Extraction fails with "no space left on device" error.

Cause: The Fossil repository and extracted tests require significant disk space (typically 1-2 GB).

Solution:

Sources: General Docker troubleshooting knowledge


Build and Extraction Internals

Docker Layer Architecture

The Dockerfile creates multiple layers during the build process:

Docker Build: Layer Caching Strategy

Layers 1-2 are typically cached and reused between builds. Layer 3 (Fossil clone) is invalidated when upstream changes occur or when using --no-cache. Layers 4-7 depend on Layer 3 and will rebuild if Layer 3 changes.

Sources: Dockerfile:1-36

File System Paths Summary

PathLocationPurpose
/srcContainerFossil repository working directory
/src/sqllogictest.fossilContainerFossil repository database file
/src/test/ContainerExtracted Fossil working tree
/workContainerContainer working directory
/work/test/ContainerDefault extraction destination
/usr/local/bin/slt-extractContainerExtraction script executable
$PWD/test/HostMounted volume for output

Sources: Dockerfile14 Dockerfile19 Dockerfile:24-25 Dockerfile20 README.md18


Next Steps

After successfully extracting the corpus:

  1. Explore Test Files : Navigate through test/evidence/ and test/index/ to understand test organization. See Test Organization Structure.

  2. Understand Test Format : Review the SQL Logic Test file format and directives. See Test File Format Specification.

  3. Run Tests : Integrate tests into your SQL engine testing workflow. See Integrating with CI/CD Systems.

  4. Handle Cross-Database Compatibility : Learn about platform-specific test execution. See Cross-Database Compatibility.

Sources: Overall documentation structure