This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Automated Update Workflow
Relevant source files
Purpose and Scope
This document describes the GitHub Actions workflow that automatically synchronizes the test corpus from SQLite's Fossil repository on a weekly basis. The workflow orchestrates the Docker-based extraction process (detailed in Corpus Extraction Pipeline) and manages the commit lifecycle to keep the Git repository up-to-date with the upstream source.
For information about the extracted test file organization, see Test Organization Structure. For practical usage of the extraction system outside of automation, see Building and Running the Extractor.
Workflow Triggers
The workflow defined in .github/workflows/update-corpus.yml:3-12 supports three distinct trigger mechanisms:
| Trigger Type | Configuration | Purpose |
|---|---|---|
| Scheduled | cron: "0 6 * * 1" | Automated weekly refresh every Monday at 06:00 UTC |
| Manual | workflow_dispatch | On-demand corpus updates triggered through GitHub UI |
| Pull Request | Paths: workflow file, Dockerfile, README | Validates workflow changes before merge |
The scheduled trigger ensures regular synchronization without manual intervention, while the manual trigger allows immediate updates when upstream changes are known. The pull request trigger enables CI validation when modifying the automation infrastructure itself.
Sources: .github/workflows/update-corpus.yml:3-12
Workflow Architecture
The following diagram maps the GitHub Actions job structure to the actual code entities defined in the workflow file:
Sources: .github/workflows/update-corpus.yml:17-44
graph TB
subgraph "GitHub Actions Workflow: update-corpus.yml"
TRIGGER[/"Trigger Events"/]
subgraph "Job: update"
RUNNER["runs-on: ubuntu-latest"]
STEP1["Step: Checkout repository\nuses: actions/checkout@v4"]
STEP2["Step: Build extractor image\ndocker build -t slt-gen ."]
STEP3["Step: Refresh corpus\nrm -rf test; mkdir test\ndocker run --rm -v PWD/test:/work/test slt-gen"]
STEP4["Step: Commit and push changes\nif: github.event_name != 'pull_request'"]
RUNNER --> STEP1
STEP1 --> STEP2
STEP2 --> STEP3
STEP3 --> STEP4
end
end
TRIGGER -->|schedule workflow_dispatch pull_request| RUNNER
STEP2 -.->|Builds from| DOCKERFILE[Dockerfile]
STEP3 -.->|Writes to| TESTDIR["$PWD/test/"]
STEP4 -.->|Commits| GITREPO["Git Repository"]
style TRIGGER fill:#f9f9f9
style RUNNER fill:#f9f9f9
style STEP1 fill:#f9f9f9
style STEP2 fill:#f9f9f9
style STEP3 fill:#f9f9f9
style STEP4 fill:#f9f9f9
Execution Pipeline
Step 1: Repository Checkout
The workflow begins by checking out the repository using the actions/checkout@v4 action with write permissions enabled through the permissions: contents: write declaration at .github/workflows/update-corpus.yml:14-15 This permission is critical for the final commit step.
Sources: .github/workflows/update-corpus.yml:21-22 .github/workflows/update-corpus.yml:14-15
Step 2: Docker Image Build
The slt-gen image is built from the Dockerfile in the repository root:
The build command at .github/workflows/update-corpus.yml25 creates an image tagged as slt-gen, which contains the Fossil clone of the sqllogictest repository and the slt-extract utility defined at Dockerfile:20-33
Sources: .github/workflows/update-corpus.yml:24-25 Dockerfile:1-36
Step 3: Corpus Extraction
The extraction step performs a clean refresh of the test corpus:
The commands at .github/workflows/update-corpus.yml:28-31 ensure a clean state by removing the existing test/ directory before extraction. The Docker volume mount (-v "$PWD/test:/work/test") allows the container to write directly to the host filesystem, with files persisting after the container terminates.
Sources: .github/workflows/update-corpus.yml:27-31 Dockerfile:20-35
Step 4: Change Detection and Commit
The workflow employs intelligent change detection to avoid unnecessary commits:
stateDiagram-v2
[*] --> CheckEventType : Step 4 executes
CheckEventType --> SkipCommit : if pull_request
CheckEventType --> CheckChanges : if NOT pull_request
CheckChanges --> RunGitStatus : git status --porcelain
RunGitStatus --> EvaluateOutput : Check output
EvaluateOutput --> ConfigureGit : Output non-empty
EvaluateOutput --> LogSkip : Output empty
ConfigureGit --> SetUserName : git config user.name
SetUserName --> SetUserEmail : git config user.email
SetUserEmail --> StageFiles : git add test
StageFiles --> Commit : git commit -m message
Commit --> Push : git push
Push --> [*]
LogSkip --> [*] : Echo skip message
SkipCommit --> [*] : Conditional skipped
The conditional at .github/workflows/update-corpus.yml34 prevents commits during pull request runs, ensuring the workflow can be safely tested without polluting the repository history. The git status --porcelain command at .github/workflows/update-corpus.yml36 produces machine-readable output that, when piped to grep ., will only succeed if changes exist. This pattern prevents empty commits when the upstream corpus has not changed.
The bot credentials configured at .github/workflows/update-corpus.yml:37-38 identify automated commits using the standard GitHub Actions bot user (github-actions[bot]).
Sources: .github/workflows/update-corpus.yml:33-44
File System Operations
The workflow manipulates the following filesystem locations:
| Path | Operation | Purpose |
|---|---|---|
$PWD/test/ | Delete, recreate | Clean slate for extraction |
$PWD/test/ | Docker volume mount | Container write target |
/work/test/ (container) | Mount point | Container-side path for extraction |
/src/test/ (container) | Source directory | Fossil repository test location |
The mapping between container paths and host paths is established through the volume specification: -v "$PWD/test:/work/test" at .github/workflows/update-corpus.yml31
Sources: .github/workflows/update-corpus.yml:28-31 Dockerfile:24-28
Workflow Permissions
The workflow requires elevated permissions to commit and push changes:
This declaration at .github/workflows/update-corpus.yml:14-15 grants the workflow write access to repository contents. Without this permission, the git push operation at .github/workflows/update-corpus.yml41 would fail with an authentication error.
The GITHUB_TOKEN is implicitly available to all workflow steps and provides the authentication credentials needed for the push operation. The token is automatically scoped to the repository and expires when the workflow completes.
Sources: .github/workflows/update-corpus.yml:14-15
graph TB
SHELL["Shell: set -euo pipefail\n(in slt-extract)"] --> DOCKER_BUILD["Docker build failure"]
SHELL --> DOCKER_RUN["Docker run failure"]
SHELL --> GIT_PUSH["Git push failure"]
DOCKER_BUILD --> FAIL["Workflow fails"]
DOCKER_RUN --> FAIL
GIT_PUSH --> FAIL
FAIL --> NOTIFY["GitHub Actions UI shows failure"]
FAIL --> EMAIL["Email notification (if configured)"]
Error Handling
The workflow employs a fail-fast approach through shell options and Docker flags:
The --rm flag at .github/workflows/update-corpus.yml31 ensures container cleanup even if the extraction process fails. The slt-extract script uses set -euo pipefail at Dockerfile22 to halt execution on any command failure, undefined variable access, or pipeline errors.
Sources: .github/workflows/update-corpus.yml31 Dockerfile:20-33
Testing Workflow Modifications
The pull request trigger configuration enables safe testing of workflow changes:
When any of these files are modified in a pull request, the workflow executes but skips the commit step due to the conditional at .github/workflows/update-corpus.yml34 This allows verification that:
- The Docker image builds successfully
- The extraction process completes without errors
- The expected test files are generated
The workflow only triggers on changes to infrastructure files, not on modifications to the extracted corpus itself, preventing unnecessary CI runs.
Sources: .github/workflows/update-corpus.yml:8-12 .github/workflows/update-corpus.yml34