Walk is a Python module that provides a mechanism for running commands, where commands are not run if Walk can infer that they would not change any generated files.
-
An
LD_PRELOAD
library or syscall tracing is used to detect what files a command reads and/or writes. -
Tested and used on Linux and OpenBSD.
-
Install with:
pip install walkbuild
-
Also provided is an example Python script
walkbuild/walkfg.py
which uses Walk to work as a build system for the Flightgear open-source flight simulator.
Walk allows a build system to be written simply as a list of commands to be run, with no need for explicit dependency information.
By specifying these commands as calls to the function
walkbuild.walk.system()
, one ensures that they will not be run if the command
would not modify any existing generated files. So only the commands that are
necessary to bring things up to date, will be run.
For example to build a project consisting of two .c
files, one could do (in
Python):
import walkbuild.walk as walk
walk.system( 'cc -c -o foo.o foo.c', 'foo.o.walk')
walk.system( 'cc -c -o bar.o bar.c', 'bar.o.walk')
walk.system( 'cc -o myapp foo.o bar.o', 'myapp.walk')
-
Look at file content not modification times
Walk uses md5 hashes to detect changes to files, instead of file modification times.
This means that Walk will not do unnecessary builds if a file is merely touched without its contents being changed (for example from a
git
operation). -
Allow control over ordering of commands
Unlike conventional build systems, one can control the order in which commands are run. For example the
walkbuild/walkfg.py
Flightgear build script compiles newer source files first, which often finds compilation errors more quickly when one is developing. -
Detect non-trivial changes to commands
By default commands are always re-run if the command itself has changed.
One can provide a custom comparison function, which allows one to avoid re-running commands if they are changed in only a trivial way. For example the
walkbuild/walkfg.py
Flightgear build script ignores changes to the compiler's-W*
warning flags. -
Verbose output only on failure
By default Walk only shows commands if they have failed. This gives concise output but, unlike
make -s
for example, it still allows one to see the details of failed commands.[Command output is always shown by default.]
The Concurrent
class allows commands to be run concurrently on multiple
threads. One can use the .join()
method to wait for scheduled commands to
complete.
For example:
import walkbuild.walk as walk
# Create multiple internal worker threads.
#
walk_concurrent = walk.Concurrent( num_threads=3)
# Schedule commands to be run concurrently.
#
walk_concurrent.system( 'cc -c -o foo.o foo.c', 'foo.o.walk')
walk_concurrent.system( 'cc -c -o bar.o bar.c', 'bar.o.walk')
...
# Wait for all scheduled commands to complete.
#
walk_concurrent.join()
# Run more commands.
#
walk.system( 'cc -o myapp foo.o bar.o', 'myapp.walk')
walk_concurrent.end()
Walk supports two ways of finding out what files a command (or its sub-commands) opened for reading and/or writing:
-
An
LD_PRELOAD
library which intercepts functions such asopen()
,rename()
etc. -
Running commands under a syscall tracer:
- Linux
strace
. - OpenBSD
ktrace
.
- Linux
On Linux the LD_PRELOAD
approach doesn't work due to the ld
linker
appearing to open the output file using a direct syscall (which cannot be
easily intercepted by the preload library), so Walk defaults to using strace
.
On OpenBSD both approaches work but Walk defaults to LD_PRELOAD
as it appears
to be slightly faster.
If using the LD_PRELOAD
approach, Walk automatically builds the library in
/tmp
as required (Walk contains the C source code).
The first time Walk runs a command, it creates a per-command .walk
file which
contains the command itself, plus md5 hashes of all files that the command (or
its child commands) read or wrote.
On subsequent invocations of the command, Walk checks for changes to the md5
hashes of the files listed in the .walk
file. It also looks at whether the
command itself has changed.
If the command is unchanged and all of the hashes are unchanged, Walk does not run the command.
Otherwise Walk runs the command and recreates the .walk
file.
Walk is careful to handle previous failure to open input files (for example a failure to open for reading) where the file now exists - in this case it will always run the command.
Walk is resilient to being interrupted by signals or system crashes, because
it always write a zero-length .walk
file before re-running a command. If
the build process is killed before the command completes, then the next time
Walk runs it will find this zero-lenth .walk
file and know the command was
interrupted, and will always re-run the command.
Walk is primarily a python module, but can also be used from the command line:
walkbuild/walk.py <args> <walk-path> <command> ...
For example:
walkbuild/walk.py myapp.exe.walk cc -Wall -W -o myapp.exe foo.c bar.c
For more information run:
./walkbuild/walk.py -h
- Development: https://github.com/cgdae/walk
- Docs: https://walk.readthedocs.io/
- Pypi: https://pypi.org/project/walkbuild/
- https://code.google.com/archive/p/fabricate/
- https://github.com/buildsome/buildsome/
- https://github.com/kgaughan/memoize.py
- https://gittup.org/tup/
It might be possible to use the information in .walk
files to do automatic
command ordering: look at existing build files to find dependency information
between commands (i.e. find commands whose output files are read by other
commands) and run commands in the right order without the caller needing to
specify anything other than an unordered list of commands.
This could be extended to do automatic concurrency - run multiple commands concurrently when they are known to not depend on each otheer.
Dependency information in walk files is not available the first time a build is run, and might become incorrect if commands or input files are changed. So we would always have to re-scan walk files after commands have completed, and re-run commands in the correct order as required. But most of the time this wouldn't be necessary.
A large part of the Walk build script for building Flightgear is concerned with selecting the source files to compile and link together.
It might be possible to write code that finds the unresolved and defined symbols after each compilation and stores this information in .walk files (or a separate file next to each .walk file. Then one could tell the script to compile the file that contains main() and have it automatically look for, and build, other files that implement the unresolved symbols.
We would need help to resolve situations where more than one file implements the same symbol. And perhaps heuristics could be used to find likely source files by grepping for missing symbols names.
Copyright (C) 2020-2022 Julian Smith.
SPDX-License-Identifier: GPL-3.0-only