Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Automated Update Workflow

Relevant source files

Purpose and Scope

This document describes the GitHub Actions workflow that automatically synchronizes the test corpus from SQLite's Fossil repository on a weekly basis. The workflow orchestrates the Docker-based extraction process (detailed in Corpus Extraction Pipeline) and manages the commit lifecycle to keep the Git repository up-to-date with the upstream source.

For information about the extracted test file organization, see Test Organization Structure. For practical usage of the extraction system outside of automation, see Building and Running the Extractor.


Workflow Triggers

The workflow defined in .github/workflows/update-corpus.yml:3-12 supports three distinct trigger mechanisms:

Trigger TypeConfigurationPurpose
Scheduledcron: "0 6 * * 1"Automated weekly refresh every Monday at 06:00 UTC
Manualworkflow_dispatchOn-demand corpus updates triggered through GitHub UI
Pull RequestPaths: workflow file, Dockerfile, READMEValidates workflow changes before merge

The scheduled trigger ensures regular synchronization without manual intervention, while the manual trigger allows immediate updates when upstream changes are known. The pull request trigger enables CI validation when modifying the automation infrastructure itself.

Sources: .github/workflows/update-corpus.yml:3-12


Workflow Architecture

The following diagram maps the GitHub Actions job structure to the actual code entities defined in the workflow file:

Sources: .github/workflows/update-corpus.yml:17-44

graph TB
    subgraph "GitHub Actions Workflow: update-corpus.yml"
        TRIGGER[/"Trigger Events"/]
        
        subgraph "Job: update"
            RUNNER["runs-on: ubuntu-latest"]
STEP1["Step: Checkout repository\nuses: actions/checkout@v4"]
STEP2["Step: Build extractor image\ndocker build -t slt-gen ."]
STEP3["Step: Refresh corpus\nrm -rf test; mkdir test\ndocker run --rm -v PWD/test:/work/test slt-gen"]
STEP4["Step: Commit and push changes\nif: github.event_name != 'pull_request'"]
RUNNER --> STEP1
 
           STEP1 --> STEP2
 
           STEP2 --> STEP3
 
           STEP3 --> STEP4
        end
    end
    
 
   TRIGGER -->|schedule workflow_dispatch pull_request| RUNNER
    
 
   STEP2 -.->|Builds from| DOCKERFILE[Dockerfile]
 
   STEP3 -.->|Writes to| TESTDIR["$PWD/test/"]
STEP4 -.->|Commits| GITREPO["Git Repository"]
style TRIGGER fill:#f9f9f9
    style RUNNER fill:#f9f9f9
    style STEP1 fill:#f9f9f9
    style STEP2 fill:#f9f9f9
    style STEP3 fill:#f9f9f9
    style STEP4 fill:#f9f9f9

Execution Pipeline

Step 1: Repository Checkout

The workflow begins by checking out the repository using the actions/checkout@v4 action with write permissions enabled through the permissions: contents: write declaration at .github/workflows/update-corpus.yml:14-15 This permission is critical for the final commit step.

Sources: .github/workflows/update-corpus.yml:21-22 .github/workflows/update-corpus.yml:14-15

Step 2: Docker Image Build

The slt-gen image is built from the Dockerfile in the repository root:

The build command at .github/workflows/update-corpus.yml25 creates an image tagged as slt-gen, which contains the Fossil clone of the sqllogictest repository and the slt-extract utility defined at Dockerfile:20-33

Sources: .github/workflows/update-corpus.yml:24-25 Dockerfile:1-36

Step 3: Corpus Extraction

The extraction step performs a clean refresh of the test corpus:

The commands at .github/workflows/update-corpus.yml:28-31 ensure a clean state by removing the existing test/ directory before extraction. The Docker volume mount (-v "$PWD/test:/work/test") allows the container to write directly to the host filesystem, with files persisting after the container terminates.

Sources: .github/workflows/update-corpus.yml:27-31 Dockerfile:20-35

Step 4: Change Detection and Commit

The workflow employs intelligent change detection to avoid unnecessary commits:

stateDiagram-v2
    [*] --> CheckEventType : Step 4 executes
    
    CheckEventType --> SkipCommit : if pull_request
    CheckEventType --> CheckChanges : if NOT pull_request
    
    CheckChanges --> RunGitStatus : git status --porcelain
    
    RunGitStatus --> EvaluateOutput : Check output
    
    EvaluateOutput --> ConfigureGit : Output non-empty
    EvaluateOutput --> LogSkip : Output empty
    
    ConfigureGit --> SetUserName : git config user.name
    SetUserName --> SetUserEmail : git config user.email
    SetUserEmail --> StageFiles : git add test
    StageFiles --> Commit : git commit -m message
    Commit --> Push : git push
    Push --> [*]
    
    LogSkip --> [*] : Echo skip message
    SkipCommit --> [*] : Conditional skipped

The conditional at .github/workflows/update-corpus.yml34 prevents commits during pull request runs, ensuring the workflow can be safely tested without polluting the repository history. The git status --porcelain command at .github/workflows/update-corpus.yml36 produces machine-readable output that, when piped to grep ., will only succeed if changes exist. This pattern prevents empty commits when the upstream corpus has not changed.

The bot credentials configured at .github/workflows/update-corpus.yml:37-38 identify automated commits using the standard GitHub Actions bot user (github-actions[bot]).

Sources: .github/workflows/update-corpus.yml:33-44


File System Operations

The workflow manipulates the following filesystem locations:

PathOperationPurpose
$PWD/test/Delete, recreateClean slate for extraction
$PWD/test/Docker volume mountContainer write target
/work/test/ (container)Mount pointContainer-side path for extraction
/src/test/ (container)Source directoryFossil repository test location

The mapping between container paths and host paths is established through the volume specification: -v "$PWD/test:/work/test" at .github/workflows/update-corpus.yml31

Sources: .github/workflows/update-corpus.yml:28-31 Dockerfile:24-28


Workflow Permissions

The workflow requires elevated permissions to commit and push changes:

This declaration at .github/workflows/update-corpus.yml:14-15 grants the workflow write access to repository contents. Without this permission, the git push operation at .github/workflows/update-corpus.yml41 would fail with an authentication error.

The GITHUB_TOKEN is implicitly available to all workflow steps and provides the authentication credentials needed for the push operation. The token is automatically scoped to the repository and expires when the workflow completes.

Sources: .github/workflows/update-corpus.yml:14-15


graph TB
 
   SHELL["Shell: set -euo pipefail\n(in slt-extract)"] --> DOCKER_BUILD["Docker build failure"]
SHELL --> DOCKER_RUN["Docker run failure"]
SHELL --> GIT_PUSH["Git push failure"]
DOCKER_BUILD --> FAIL["Workflow fails"]
DOCKER_RUN --> FAIL
 
   GIT_PUSH --> FAIL
    
 
   FAIL --> NOTIFY["GitHub Actions UI shows failure"]
FAIL --> EMAIL["Email notification (if configured)"]

Error Handling

The workflow employs a fail-fast approach through shell options and Docker flags:

The --rm flag at .github/workflows/update-corpus.yml31 ensures container cleanup even if the extraction process fails. The slt-extract script uses set -euo pipefail at Dockerfile22 to halt execution on any command failure, undefined variable access, or pipeline errors.

Sources: .github/workflows/update-corpus.yml31 Dockerfile:20-33


Testing Workflow Modifications

The pull request trigger configuration enables safe testing of workflow changes:

When any of these files are modified in a pull request, the workflow executes but skips the commit step due to the conditional at .github/workflows/update-corpus.yml34 This allows verification that:

  • The Docker image builds successfully
  • The extraction process completes without errors
  • The expected test files are generated

The workflow only triggers on changes to infrastructure files, not on modifications to the extracted corpus itself, preventing unnecessary CI runs.

Sources: .github/workflows/update-corpus.yml:8-12 .github/workflows/update-corpus.yml34