Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

System Architecture

Relevant source files

Purpose and Scope

This document describes the overall architecture of the sqlite-sqllogictest-corpus system, which provides automated synchronization of SQLite's official sqllogictest corpus from a Fossil repository to a Git repository. The architecture consists of three primary layers: source repository access, Docker-based extraction, and GitHub Actions automation.

For detailed information about specific subsystems:

Architectural Overview

The system implements a scheduled pull-based synchronization model that extracts test files from SQLite's Fossil repository and commits them to a Git repository. The architecture eliminates manual intervention through containerization and scheduled automation.

graph TB
    subgraph Source["Fossil Source Repository"]
FOSSIL["sqlite.org/sqllogictest/\nFossil SCM"]
end
    
    subgraph Automation["GitHub Actions Layer"]
WORKFLOW["update-corpus.yml\nWorkflow Definition"]
SCHEDULE["Cron: 0 6 * * 1\nMonday 06:00 UTC"]
DISPATCH["workflow_dispatch\nManual Trigger"]
end
    
    subgraph Container["Docker Build & Extract Layer"]
DOCKERFILE["Dockerfile\nImage Definition"]
IMAGE["slt-gen\nDocker Image"]
EXTRACT["slt-extract\nBash Script"]
end
    
    subgraph Storage["Filesystem Storage"]
TEST_DIR["test/\nMounted Volume"]
EVIDENCE["test/evidence/\nSQL Tests"]
INDEX["test/index/\nOptimization Tests"]
end
    
    subgraph VCS["Version Control"]
REPO["Git Repository\nsqlite-sqllogictest-corpus"]
end
    
 
   FOSSIL -->|fossil clone| IMAGE
 
   SCHEDULE -->|triggers| WORKFLOW
 
   DISPATCH -->|triggers| WORKFLOW
 
   WORKFLOW -->|docker build -t slt-gen .| DOCKERFILE
 
   DOCKERFILE -->|creates| IMAGE
 
   IMAGE -->|contains| EXTRACT
 
   WORKFLOW -->|docker run --rm -v| IMAGE
 
   IMAGE -->|executes| EXTRACT
 
   EXTRACT -->|cp -R /src/test/.| TEST_DIR
 
   TEST_DIR -->|contains| EVIDENCE
 
   TEST_DIR -->|contains| INDEX
 
   WORKFLOW -->|git add/commit/push| REPO
 
   TEST_DIR -.->|persisted to| REPO

System Components

Sources: .github/workflows/update-corpus.yml:1-44 Dockerfile:1-36 README.md:1-27

Component Architecture

Fossil Repository Access Layer

The system accesses SQLite's canonical sqllogictest repository hosted on Fossil SCM. The repository URL https://www.sqlite.org/sqllogictest/ contains the upstream test corpus that serves as the single source of truth.

graph LR
    APT["apt-get install fossil"]
CLONE["fossil clone\nhttps://www.sqlite.org/sqllogictest/"]
FOSSIL_FILE["/src/sqllogictest.fossil\nLocal Clone"]
OPEN["fossil open\n--user root"]
CHECKOUT["/src/test/\nWorking Directory"]
APT -->|installs| CLONE
 
   CLONE -->|creates| FOSSIL_FILE
 
   FOSSIL_FILE -->|input to| OPEN
 
   OPEN -->|populates| CHECKOUT

The Docker image clones this repository during build time using the fossil client installed via Debian's package manager:

Sources: Dockerfile:5-17

Docker Container Architecture

The Dockerfile defines a Debian-based image that encapsulates the entire extraction process. The image is tagged as slt-gen during the build phase.

ComponentPurposeImplementation
Base Imagedebian:stable-slimMinimal footprint for build tools
Fossil ClientVersion control accessInstalled via apt-get
Working Directory/workVolume mount point for output
Source Directory/srcContains cloned Fossil repository
Extraction Script/usr/local/bin/slt-extractBash script that copies test files
graph TD
    ENTRY["ENTRYPOINT slt-extract"]
ARG["dest_root=${1:-/work/test}"]
SRC["src_root=/src/test"]
MKDIR["mkdir -p $dest_root"]
COPY["cp -R $src_root/. $dest_root/"]
ECHO["echo copied corpus..."]
ENTRY -->|executes with argument| ARG
 
   ARG --> SRC
 
   SRC --> MKDIR
 
   MKDIR --> COPY
 
   COPY --> ECHO

The slt-extract script serves as the container's entrypoint and accepts a single optional argument for the destination directory (defaults to /work/test):

Sources: Dockerfile:1-36 Dockerfile:20-35

GitHub Actions Workflow Layer

The update-corpus.yml workflow orchestrates the entire update cycle through four discrete steps executed on an ubuntu-latest runner.

graph TD
    CRON["schedule:\ncron: 0 6 * * 1"]
MANUAL["workflow_dispatch:\nManual Execution"]
PR["pull_request:\npaths filter"]
WORKFLOW["update job\nruns-on: ubuntu-latest"]
CRON -->|Monday 06:00 UTC| WORKFLOW
 
   MANUAL -->|on-demand| WORKFLOW
 
   PR -->|when specific files change| WORKFLOW

Workflow Triggers

The workflow monitors changes to three specific paths in pull requests: .github/workflows/update-corpus.yml, Dockerfile, and README.md.

Sources: .github/workflows/update-corpus.yml:3-12

Execution Steps

The workflow executes four sequential steps:

  1. Repository Checkout - Uses actions/checkout@v4 to clone the Git repository
  2. Image Build - Executes docker build -t slt-gen . to create the extraction container
  3. Corpus Refresh - Removes existing test/ directory, recreates it, and runs the container with volume mount
  4. Change Commit - Conditionally commits and pushes changes if files were modified and the event is not a pull request

Sources: .github/workflows/update-corpus.yml:18-44

sequenceDiagram
    participant Runner as "ubuntu-latest runner"
    participant Git as "Git Repository"
    participant Docker as "Docker Engine"
    participant Volume as "test/ directory"
    
    Runner->>Git: actions/checkout@v4
    Git-->>Runner: Repository checked out
    
    Runner->>Docker: docker build -t slt-gen .
    Docker-->>Runner: Image slt-gen created
    
    Runner->>Volume: rm -rf test
    Runner->>Volume: mkdir test
    Runner->>Docker: docker run --rm -v $PWD/test:/work/test slt-gen
    Docker->>Volume: Extract files to mounted volume
    Docker-->>Runner: Container exit
    
    Runner->>Git: git status --porcelain
    
    alt changes detected and not PR
        Runner->>Git: git add test
        Runner->>Git: git commit -m "Update sqllogictest corpus"
        Runner->>Git: git push
    else no changes or is PR
        Runner->>Runner: echo "No updates; skipping commit"
    end

Conditional Commit Logic

The workflow implements intelligent change detection to avoid empty commits:

This conditional executes only when github.event_name != 'pull_request', preventing commits during PR validation runs.

Sources: .github/workflows/update-corpus.yml:34-44

Data Flow Architecture

The complete data flow follows a unidirectional path from the Fossil repository through Docker containerization to the Git repository.

Sources: Dockerfile:14-28 .github/workflows/update-corpus.yml:24-41

flowchart LR
    subgraph Upstream
        FOSSIL_SRC["sqlite.org/sqllogictest\nFossil Repository"]
end
    
    subgraph "Build Phase"
        DOCKER_BUILD["docker build -t slt-gen ."]
FOSSIL_CLONE["fossil clone + fossil open"]
SRC_TEST["/src/test/\nFiles in Image"]
end
    
    subgraph "Extract Phase"
        DOCKER_RUN["docker run --rm -v"]
SLT_EXTRACT["slt-extract script"]
CP_COMMAND["cp -R /src/test/. /work/test/"]
end
    
    subgraph "Persist Phase"
        MOUNTED_VOL["$PWD/test/\nHost Filesystem"]
GIT_ADD["git add test"]
GIT_COMMIT["git commit"]
GIT_PUSH["git push"]
end
    
    subgraph Downstream
        GIT_REMOTE["GitHub Repository\nsqlite-sqllogictest-corpus"]
end
    
 
   FOSSIL_SRC -->|cloned during build| FOSSIL_CLONE
 
   DOCKER_BUILD --> FOSSIL_CLONE
 
   FOSSIL_CLONE --> SRC_TEST
 
   SRC_TEST -.->|embedded in image| DOCKER_RUN
 
   DOCKER_RUN --> SLT_EXTRACT
 
   SLT_EXTRACT --> CP_COMMAND
 
   CP_COMMAND -->|volume mount| MOUNTED_VOL
 
   MOUNTED_VOL --> GIT_ADD
 
   GIT_ADD --> GIT_COMMIT
 
   GIT_COMMIT --> GIT_PUSH
 
   GIT_PUSH --> GIT_REMOTE

Volume Mount Strategy

The system uses Docker volume mounts to bridge container execution with host filesystem persistence. The mount configuration "$PWD/test:/work/test" maps the host's current working directory test/ subdirectory to the container's /work/test path.

This approach ensures:

  • Test files persist after container termination (--rm flag removes container but preserves mounted volume data)
  • The workflow can detect changes using git status --porcelain
  • No data copying between container and host required after extraction

Sources: .github/workflows/update-corpus.yml31 Dockerfile:24-28

Build-Time vs. Runtime Execution

The architecture separates concerns between build-time and runtime operations:

Build-Time Operations

Executed during docker build -t slt-gen .:

  • Installation of system dependencies (fossil, bash, build-essential)
  • Cloning of Fossil repository to /src/sqllogictest.fossil
  • Opening Fossil repository to /src working directory
  • Creation of /usr/local/bin/slt-extract script
  • Setting script executable permissions

Runtime Operations

Executed during docker run --rm -v "$PWD/test:/work/test" slt-gen:

  • Execution of slt-extract entrypoint
  • Directory creation: mkdir -p /work/test
  • Recursive copy: cp -R /src/test/. /work/test/
  • Status output: echo "copied corpus to /work/test"

This separation ensures the expensive Fossil clone operation occurs once during image build, while extraction can execute repeatedly with minimal overhead.

Sources: Dockerfile:5-35 .github/workflows/update-corpus.yml:28-31

System Permissions and Security

The workflow requires write access to the repository through the contents: write permission, enabling automated commits. The commit author is configured as github-actions[bot] with email 41898282+github-actions[bot]@users.noreply.github.com.

The Fossil repository is configured with default user root to avoid interactive prompts during clone and open operations.

Sources: .github/workflows/update-corpus.yml:14-15 .github/workflows/update-corpus.yml:37-38 Dockerfile:15-17