Understanding the suite of copy functions in bazel-lib
I finally got around learning Bazel this year, and there is a very popular library of helper functions called bazel-lib. It contains multiple functions for copying files and directories with subtle differences between them that can be hard to tease apart at a glance, especially for a newcomer. Here, I present the mental model I constructed after studying them.
DirectoryPathInfo
First, we need to understand the distinction between the built-in
DefaultInfo
provider and the DirectoryPathInfo
provider
defined by bazel-lib and used in some (but not all!) of the helper functions.
DefaultInfo
has one primary attribute, files
, which holds a depset
of File
objects. File
s are like paths, but get special treatment from
Bazel. In particular, actions get easy access to them through ctx.file
and ctx.files
, short-cutting manual access through the DefaultInfo
provider, e.g. by ctx.attr.my_file[DefaultInfo].files.to_list()[0]
.
Actions get handles to File
objects in only two ways:
- Declaring them as outputs with
ctx.actions.declare_file
andctx.actions.declare_directory
. - Reading them as inputs by pulling them out of the providers of
parameter attributes, e.g. with
ctx.file
. All of theseFile
s were declared as outputs by other actions.
Thus, every File
is a declared output of some action.
DirectoryPathInfo
is similar, in that it represents a path,
but different because (a) it is a provider, and (b) it is constructed in
just one way: by combining a File
that represents a directory
(technically called a "tree artifact") and a string relative path.
bazel-lib uses DirectoryPathInfo
to refer to files and directories created
by other actions that may not have been declared as outputs.
That is, when an action creates output files but only declares the output
directory they were created in, DirectoryPathInfo
lets us refer to those
files that we know are there, without creating a File
object. If we wanted
a File
object, we would have to create it by copying the file to a new
output. DirectoryPathInfo
lets us refer to specific files without copying
them.
Rules
The rest of these functions are rules that return either:
- a
DefaultInfo
provider with one or moreFile
objects, or - a
DirectoryPathInfo
provider pointing to one path under a directoryFile
object
In this context, it is important to know that sources files and directories
are implicitly targets providing a DefaultInfo
with exactly one File
object representing that file or directory.
directory_path
directory_path
returns a DirectoryPathInfo
constructed from a target
providing a DefaultInfo
with exactly one directory, and a string relative path.
output_files
output_files
returns a DefaultInfo
with a set of files selected from
another target's DefaultInfo
provider, or from one group in its
OutputGroupInfo
provider. Files are selected by matching their
"short path". All of the files must exist.
copy_file
copy_file
returns a DefaultInfo
with exactly one file copied from
a target providing either (a) a DirectoryPathInfo
pointing to a file or
(b) a DefaultInfo
with exactly one file (not a directory).
copy_directory
copy_directory
returns a DefaultInfo
with exactly one directory copied
from a target providing the same. In this way, it is like copy_file
except
that it does not accept a DirectoryPathInfo
. I do not know why, and this
seems like a gap in capabilities.
copy_to_directory
copy_to_directory
returns a DefaultInfo
with exactly one directory,
but it is so complicated that I never finished figuring out how it works.
I know that, unlike copy_directory
, it accepts DirectoryPathInfo
targets
in addition to DefaultInfo
targets, but I ended up writing my own simpler
rule for copying selections of files into a directory.