Automate publishing documentation using Hugo and GitHub pages

Many of you knows my work on TFS Aggregator. Since the beginning we opted for Markdown as the format for the project documentation, at the beginning they were some files in a doc folder, then I moved the content to the project’s GitHub Wiki, today I use the same files to generate the GitHub pages at https://tfsaggregator.github.io/intro/.

In this post I will describe how this latter step works in detail to publish our open source project’s documentation.

Elements

Let’s describe the scenario elements.

The documentation authors updates the content a source repository in GitHub. This content is processed using the delicious Hugo static generator to produce the web site, images, styles and all. The web site is hosted in GitHub Pages; publication is controlled via the homonymous repository tfsaggregator.github.io.

I decided to use Visual Studio Team Services (VSTS) Build feature to setup Continuous Integration so that any change in the source repository regenerates and republishes the GitHub Pages site.

Hugo documentation has a recipe to publish on GitHub Pages, so the task appeared feasible.

Build Definition

The build process is made of six steps.

Build Steps

We will examine them one by one soon.

Note

The screenshots show the old Build Editor that will be soon replaced.

The first requirement is the connection to GitHub, a Service Endpoint in VSTS terms.

GitHub Service Endpoint

The source control definition

Git Repo definition

has the Checkout submodules option set. This is critical because I defined the repository tfsaggregator.github.io as a sub-module of the main tfsaggregator-docs-hugo repository. Also the Clean option guarantees that there cannot be leftovers if the agent is reused by subsequent builds.

Last notable thing, the build triggers only when the master branch changes. This allows the team to share and locally test changes in an isolated branch.

CI Trigger

Build Steps

The first step guarantees that the Git sub-module in the public directory, i.e. the tfsaggregator.github.io repository, uses the master branch.

Step 1 – Switch branch

Then the build downloads a fresh copy of the Hugo executable.

Step 2 – Download Hugo exe

Few lines of Powershell is all we need: Invoke-WebRequest to download the zip file and ZipFile.ExtractToDirectory method to unpack it.

$hugoZip = "$env:TEMP\hugo.zip"
$hugoBinFolder = "$env:TEMP\hugo-bin"
Invoke-WebRequest -Uri $env:HugoURL -OutFile $hugoZip
[System.Reflection.Assembly]::LoadWithPartialName('System.IO.Compression.FileSystem') | Out-Null
[System.IO.Compression.ZipFile]::ExtractToDirectory($hugoZip, $hugoBinFolder)

This is required as Hugo is not installed on the Hosted build agent; in alternative you use a custom agent with pre-installed software. Yet another alternative is to use the technique I described in Mixing TFVC and Git.

Before using Hugo to generate the site, we need another piece, hugo-lunr, to produce the search index file.

Step 3 – Get hugo-lunr

hugo-lunr is a NodeJS tool to

Generate lunr.js index files for Hugo static site search

The search feature in the documentation site requires these index files to lookup words’ positions. NodeJS is pre-installed on VSTS Hosted build agent but hugo-lunr is not. After the NodeJS library is downloaded, the build runs it

Step 4 – Generate Index

to generate the index. The step is deceptively simple as it is based on package.json that specify the name of the generated file and the type of files to parse.

{
    "scripts": {
        "index": "lunr-hugo -i \"content/**/*.md\" -o static/json/search.json -l yaml"
    }
}

Finally we get at the core step, site generation using Hugo.

Step 5 – Run Hugo

At this point only the local repository has changed, i.e. the clone of tfsaggregator.github.io present in the Hosted agent. The last step is to push tfsaggregator.github.io changes to GitHub.

Step 6 – Publish to GitHub Pages

This seems complex, but basically are three git commands: add and commit followed by push.

param( $GitHubUser, $GitHubToken )
cd public
Write-Host "git add -A"
git add -A
Write-Host "git commit"
git commit -m "Site rebuilt by build #$env:BUILD_BUILDNUMBER"
Write-Host "git push"
$b =[Convert]::ToBase64String([Text.Encoding]::ASCII.GetBytes(("{0}:{1}" -f $GitHubUser,$GitHubToken)))
git -c http.extraheader="AUTHORIZATION: basic $b" push origin master
cd ..

There are two interesting items here. The commit message records the build identifier so it will be easier to trace the history.

The second notable thing is the use of http.extraheader to pass authentication data to git. I asked GitHub for a Personal Access Token and saved in the Build Variables.

Build Variables

SYSTEM_ACCESSTOKEN is of no use here as it allows access to VSTS while we need access to GitHub.

A more elegant solution would be using a custom Build Task that wraps the git push command and access the existing Service Endpoint connection data. Using Basic Authentication is safe because the GitHub remote uses HTTPS.

Final comments

I am great fan of static web sites: they consume minimal resources, are easily cached, the best performances, no worry for security breaches, and Hugo is the best amongst the tools I tried, recommended.

Hope you found this interesting!