This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
System Architecture
Relevant source files
Purpose and Scope
This document describes the overall architecture of the sqlite-sqllogictest-corpus system, which provides automated synchronization of SQLite's official sqllogictest corpus from a Fossil repository to a Git repository. The architecture consists of three primary layers: source repository access, Docker-based extraction, and GitHub Actions automation.
For detailed information about specific subsystems:
- Extraction implementation details: see Corpus Extraction Pipeline
- Workflow scheduling and commit logic: see Automated Update Workflow
- Test directory structure: see Test Organization Structure
Architectural Overview
The system implements a scheduled pull-based synchronization model that extracts test files from SQLite's Fossil repository and commits them to a Git repository. The architecture eliminates manual intervention through containerization and scheduled automation.
graph TB
subgraph Source["Fossil Source Repository"]
FOSSIL["sqlite.org/sqllogictest/\nFossil SCM"]
end
subgraph Automation["GitHub Actions Layer"]
WORKFLOW["update-corpus.yml\nWorkflow Definition"]
SCHEDULE["Cron: 0 6 * * 1\nMonday 06:00 UTC"]
DISPATCH["workflow_dispatch\nManual Trigger"]
end
subgraph Container["Docker Build & Extract Layer"]
DOCKERFILE["Dockerfile\nImage Definition"]
IMAGE["slt-gen\nDocker Image"]
EXTRACT["slt-extract\nBash Script"]
end
subgraph Storage["Filesystem Storage"]
TEST_DIR["test/\nMounted Volume"]
EVIDENCE["test/evidence/\nSQL Tests"]
INDEX["test/index/\nOptimization Tests"]
end
subgraph VCS["Version Control"]
REPO["Git Repository\nsqlite-sqllogictest-corpus"]
end
FOSSIL -->|fossil clone| IMAGE
SCHEDULE -->|triggers| WORKFLOW
DISPATCH -->|triggers| WORKFLOW
WORKFLOW -->|docker build -t slt-gen .| DOCKERFILE
DOCKERFILE -->|creates| IMAGE
IMAGE -->|contains| EXTRACT
WORKFLOW -->|docker run --rm -v| IMAGE
IMAGE -->|executes| EXTRACT
EXTRACT -->|cp -R /src/test/.| TEST_DIR
TEST_DIR -->|contains| EVIDENCE
TEST_DIR -->|contains| INDEX
WORKFLOW -->|git add/commit/push| REPO
TEST_DIR -.->|persisted to| REPO
System Components
Sources: .github/workflows/update-corpus.yml:1-44 Dockerfile:1-36 README.md:1-27
Component Architecture
Fossil Repository Access Layer
The system accesses SQLite's canonical sqllogictest repository hosted on Fossil SCM. The repository URL https://www.sqlite.org/sqllogictest/ contains the upstream test corpus that serves as the single source of truth.
graph LR
APT["apt-get install fossil"]
CLONE["fossil clone\nhttps://www.sqlite.org/sqllogictest/"]
FOSSIL_FILE["/src/sqllogictest.fossil\nLocal Clone"]
OPEN["fossil open\n--user root"]
CHECKOUT["/src/test/\nWorking Directory"]
APT -->|installs| CLONE
CLONE -->|creates| FOSSIL_FILE
FOSSIL_FILE -->|input to| OPEN
OPEN -->|populates| CHECKOUT
The Docker image clones this repository during build time using the fossil client installed via Debian's package manager:
Sources: Dockerfile:5-17
Docker Container Architecture
The Dockerfile defines a Debian-based image that encapsulates the entire extraction process. The image is tagged as slt-gen during the build phase.
| Component | Purpose | Implementation |
|---|---|---|
| Base Image | debian:stable-slim | Minimal footprint for build tools |
| Fossil Client | Version control access | Installed via apt-get |
| Working Directory | /work | Volume mount point for output |
| Source Directory | /src | Contains cloned Fossil repository |
| Extraction Script | /usr/local/bin/slt-extract | Bash script that copies test files |
graph TD
ENTRY["ENTRYPOINT slt-extract"]
ARG["dest_root=${1:-/work/test}"]
SRC["src_root=/src/test"]
MKDIR["mkdir -p $dest_root"]
COPY["cp -R $src_root/. $dest_root/"]
ECHO["echo copied corpus..."]
ENTRY -->|executes with argument| ARG
ARG --> SRC
SRC --> MKDIR
MKDIR --> COPY
COPY --> ECHO
The slt-extract script serves as the container's entrypoint and accepts a single optional argument for the destination directory (defaults to /work/test):
Sources: Dockerfile:1-36 Dockerfile:20-35
GitHub Actions Workflow Layer
The update-corpus.yml workflow orchestrates the entire update cycle through four discrete steps executed on an ubuntu-latest runner.
graph TD
CRON["schedule:\ncron: 0 6 * * 1"]
MANUAL["workflow_dispatch:\nManual Execution"]
PR["pull_request:\npaths filter"]
WORKFLOW["update job\nruns-on: ubuntu-latest"]
CRON -->|Monday 06:00 UTC| WORKFLOW
MANUAL -->|on-demand| WORKFLOW
PR -->|when specific files change| WORKFLOW
Workflow Triggers
The workflow monitors changes to three specific paths in pull requests: .github/workflows/update-corpus.yml, Dockerfile, and README.md.
Sources: .github/workflows/update-corpus.yml:3-12
Execution Steps
The workflow executes four sequential steps:
- Repository Checkout - Uses
actions/checkout@v4to clone the Git repository - Image Build - Executes
docker build -t slt-gen .to create the extraction container - Corpus Refresh - Removes existing
test/directory, recreates it, and runs the container with volume mount - Change Commit - Conditionally commits and pushes changes if files were modified and the event is not a pull request
Sources: .github/workflows/update-corpus.yml:18-44
sequenceDiagram
participant Runner as "ubuntu-latest runner"
participant Git as "Git Repository"
participant Docker as "Docker Engine"
participant Volume as "test/ directory"
Runner->>Git: actions/checkout@v4
Git-->>Runner: Repository checked out
Runner->>Docker: docker build -t slt-gen .
Docker-->>Runner: Image slt-gen created
Runner->>Volume: rm -rf test
Runner->>Volume: mkdir test
Runner->>Docker: docker run --rm -v $PWD/test:/work/test slt-gen
Docker->>Volume: Extract files to mounted volume
Docker-->>Runner: Container exit
Runner->>Git: git status --porcelain
alt changes detected and not PR
Runner->>Git: git add test
Runner->>Git: git commit -m "Update sqllogictest corpus"
Runner->>Git: git push
else no changes or is PR
Runner->>Runner: echo "No updates; skipping commit"
end
Conditional Commit Logic
The workflow implements intelligent change detection to avoid empty commits:
This conditional executes only when github.event_name != 'pull_request', preventing commits during PR validation runs.
Sources: .github/workflows/update-corpus.yml:34-44
Data Flow Architecture
The complete data flow follows a unidirectional path from the Fossil repository through Docker containerization to the Git repository.
Sources: Dockerfile:14-28 .github/workflows/update-corpus.yml:24-41
flowchart LR
subgraph Upstream
FOSSIL_SRC["sqlite.org/sqllogictest\nFossil Repository"]
end
subgraph "Build Phase"
DOCKER_BUILD["docker build -t slt-gen ."]
FOSSIL_CLONE["fossil clone + fossil open"]
SRC_TEST["/src/test/\nFiles in Image"]
end
subgraph "Extract Phase"
DOCKER_RUN["docker run --rm -v"]
SLT_EXTRACT["slt-extract script"]
CP_COMMAND["cp -R /src/test/. /work/test/"]
end
subgraph "Persist Phase"
MOUNTED_VOL["$PWD/test/\nHost Filesystem"]
GIT_ADD["git add test"]
GIT_COMMIT["git commit"]
GIT_PUSH["git push"]
end
subgraph Downstream
GIT_REMOTE["GitHub Repository\nsqlite-sqllogictest-corpus"]
end
FOSSIL_SRC -->|cloned during build| FOSSIL_CLONE
DOCKER_BUILD --> FOSSIL_CLONE
FOSSIL_CLONE --> SRC_TEST
SRC_TEST -.->|embedded in image| DOCKER_RUN
DOCKER_RUN --> SLT_EXTRACT
SLT_EXTRACT --> CP_COMMAND
CP_COMMAND -->|volume mount| MOUNTED_VOL
MOUNTED_VOL --> GIT_ADD
GIT_ADD --> GIT_COMMIT
GIT_COMMIT --> GIT_PUSH
GIT_PUSH --> GIT_REMOTE
Volume Mount Strategy
The system uses Docker volume mounts to bridge container execution with host filesystem persistence. The mount configuration "$PWD/test:/work/test" maps the host's current working directory test/ subdirectory to the container's /work/test path.
This approach ensures:
- Test files persist after container termination (
--rmflag removes container but preserves mounted volume data) - The workflow can detect changes using
git status --porcelain - No data copying between container and host required after extraction
Sources: .github/workflows/update-corpus.yml31 Dockerfile:24-28
Build-Time vs. Runtime Execution
The architecture separates concerns between build-time and runtime operations:
Build-Time Operations
Executed during docker build -t slt-gen .:
- Installation of system dependencies (
fossil,bash,build-essential) - Cloning of Fossil repository to
/src/sqllogictest.fossil - Opening Fossil repository to
/srcworking directory - Creation of
/usr/local/bin/slt-extractscript - Setting script executable permissions
Runtime Operations
Executed during docker run --rm -v "$PWD/test:/work/test" slt-gen:
- Execution of
slt-extractentrypoint - Directory creation:
mkdir -p /work/test - Recursive copy:
cp -R /src/test/. /work/test/ - Status output:
echo "copied corpus to /work/test"
This separation ensures the expensive Fossil clone operation occurs once during image build, while extraction can execute repeatedly with minimal overhead.
Sources: Dockerfile:5-35 .github/workflows/update-corpus.yml:28-31
System Permissions and Security
The workflow requires write access to the repository through the contents: write permission, enabling automated commits. The commit author is configured as github-actions[bot] with email 41898282+github-actions[bot]@users.noreply.github.com.
The Fossil repository is configured with default user root to avoid interactive prompts during clone and open operations.
Sources: .github/workflows/update-corpus.yml:14-15 .github/workflows/update-corpus.yml:37-38 Dockerfile:15-17