scm.exlib

Overview

scm.exlib is a general framework for writing exheres that check out code from a source code management system. It is designed to make the interface and feature set as uniform as reasonably possible across different SCMs, to share as much logic as possible between different backend implementations, and to easily support multiple checkouts in a single exheres.

Usage

Basics

An exheres using scm.exlib defines one or more variables defining the code that should be checked out, and then requires the exlib. While it is possible to require scm directly, it is more common to specify the backend exlib, named scm-backend — this is a shortcut that avoids the need to define the TYPE variable explicitly. Examples:

Multiple repositories

Most commonly, an exheres will only perform a single checkout. However, sometimes it is necessary to fetch code from multiple sources. In this case, one repository is designated the “primary” repository (this term is also used for the only repository in the one-checkout case) and the rest are called “secondary” and each given their own name, listed in SCM_SECONDARY_REPOSITORIES. Names must be non-empty and consist only of letters, digits and underscores. For example:

SCM_REPOSITORY="git://anongit.freedesktop.org/git/xorg/xserver"
SCM_REVISION="3336ff91de2aa35277178f39b8d025e324ae5122"
SCM_BRANCH="xgl-0-0-1"
SCM_CHECKOUT_TO="xorg-server"

SCM_mesa_REPOSITORY="git://anongit.freedesktop.org/git/mesa/mesa"
SCM_mesa_REVISION="ad6351a994fd14af9d07da4f06837a7f9b9d0de4"
SCM_mesa_BRANCH="master"

SCM_SECONDARY_REPOSITORIES="mesa"
require scm-git

Here, REPOSITORY is set for both repositories, as it must be, and REVISION, because we want to build a specific version rather than whatever happens to be the latest. BRANCH is not required here, as the default for the git backend is to fetch all branches, but it is more efficient to fetch only the one that is needed. CHECKOUT_TO is set for the primary repository so that the local git clone can be shared with the xorg-server package.

More elaborate multiple repository usage

Occasionally it is useful to set the variables after requireing the exlib. For example, when defining many similar repositories in a loop, it is convenient to use the scm_set_var function, but this is not available until scm.exlib is loaded. In this case, set SCM_NO_AUTOMATIC_FINALISE to a non-empty value before the require, and call scm_finalise after setting all the variables. Example:

SCM_NO_AUTOMATIC_FINALISE=1
require amarok scm-svn

SCM_REPOSITORY="svn://anonsvn.kde.org/home/kde/"
SCM_SUBPATH="extragear/multimedia/${PN}"

SCM_SECONDARY_REPOSITORIES="libplasma animators popupdropper"
for SCM_THIS in ${SCM_SECONDARY_REPOSITORIES}; do
    scm_set_var REPOSITORY "${SCM_REPOSITORY}"
    SCM_EXTERNAL_REFS+=" src/context/${SCM_THIS/lib}:${SCM_THIS}"
done

SCM_libplasma_SUBPATH="KDE/kdebase/workspace/libs/plasma"
SCM_animators_SUBPATH="KDE/kdebase/workspace/plasma/animators"
SCM_popupdropper_SUBPATH="playground/libs/popupdropper/popupdropper"

scm_finalise

This example also demonstrates the use of SUBPATH to fetch a single project from a large SVN repository, and EXTERNAL_REFS to deal with the svn:externals feature of SVN (externals are not fetched automatically, because all SCM checkouts should be controlled by scm.exlib).

Exheres-defined variables

Global exheres-defined variables

Variables defined in this section are defined at most once per exheres. All are simple bash variables. Unless otherwise specified, variables default to being unset/empty.

SCM_SECONDARY_REPOSITORIES
A space-separated list of the names of the secondary repositories used by the exheres. May contain option? ( ) conditionals, similar to those in dependencies.
SCM_NO_PRIMARY_REPOSITORY
If non-empty, disables the use of the primary repository (inheritance of the TYPE variable into secondary repositories still functions). Intended for cases where all checkouts should be subject to options.
SCM_NO_AUTOMATIC_FINALISE
If non-empty, prevents scm_finalise from being called automatically by scm.exlib. See the example for a situation where this is useful.

Per-repository exheres-defined variables

Variables described in this section can be defined separately for each repository. Unless otherwise specified, variables default to being unset/empty, and secondary repositories do not automatically inherit values from the primary repository. After requireing scm.exlib, some of these may automatically be sanitised or assigned default values.

To define a variable for the primary repository, set the bash variable SCM_variable-name. Examples:

To define a variable for a secondary repository, set the bash variable SCM_repository-name_variable-name. Examples:

General definition Backend-specific notes
`bzr` `cvs` `darcs` `git` `hg` `mtn` `svn`
Generic variables `TYPE` The name of the backend. For secondary repositories, defaults to that set for the primary repository, which is itself usually set by `require`ing the backend directly.
`CHECKOUT_TO` The directory into which the remote repository will be cloned/checked out/etc. Defaults to the package name for the primary repository, or the repository name for secondary repositories. If the backend doesn't support multiple branches in a single checkout and `${SLOT}` is not equal to `0`, then the default value will also have `:${SLOT}` at the end. Be sure to set this variable manually if the default behaviour isn't sufficient to prevent different branches from "fighting" over the space. Should always be set to a relative path in the exheres; `scm.exlib` will automatically prepend [`${SCM_HOME}`](#SCM_HOME)`/` to the value. A shared repository is created in this directory, then the branch is cloned to a subdirectory named by the [`BRANCH`](#BRANCH) variable; therefore, multiple branches _can_ be stored simultaneously. The Monotone database will be stored in this directory, named with the base-name of the directory and a `.mtn` extension.
`UNPACK_TO` The directory under `${WORKBASE}` to which the repository contents will be copied at the start of the build. Defaults to `${WORKBASE}/${PNV}` for the primary repository, and `${WORKBASE}/`_`name`_ for secondary repositories. Note that, although this should always be a subdirectory of `${WORKBASE}`, it will _not_ be added automatically, unlike [`CHECKOUT_TO`](#CHECKOUT_TO). Can be overridden automatically if the repository is mentioned in [`EXTERNAL_REFS`](#EXTERNAL_REFS) for some other repository.
`REPOSITORY` The address (often a URI) of the repository, in the appropriate backend-specific syntax. Must be set. This should specify the portion of the URI common to all branches in the "same" repository (it doesn't matter whether or not a shared repository is actually present on the remote end) --- the remainder should be placed in the [`BRANCH`](#BRANCH) variable. `lp:` URIs are _not_ supported, because they require network access to do anything useful with them, including to tell whether or not network access is required. Since they are just aliases for the "real" URI, you can use `HOME=/var/empty bzr info lp:`_`whatever`_ to find out what to specify instead. This should be set to the appropriate value of `CVSROOT`, aka the `-d` global option to `cvs`. This should be a `mtn://`, `ssh://`, `ssh+ux://` or `file://` URI. The older bare-hostname syntax is not supported, but simply prefixing the value with `mtn://` will work. This should be the Subversion URI up to but _not_ including the `/trunk/`, `/branches/` or `/tags/` path component. The remainder is specified by and/or inferred by the presence or absense of the [`BRANCH`](#BRANCH), [`TAG`](#TAG) and [`SUBPATH`](#SUBPATH) variables. See [`SVN_RAW_URI`](#SVN_RAW_URI) if this is not appropriate. Do not include peg revisions here; use [`REVISION`](#REVISION) instead.
`BRANCH` The branch within the repository that should be fetched and copied to the build directory. Defaults to `trunk`; use `.` to specify that the branch lives in the root of the repository. Not supported, as darcs uses a "repository = branch" model. Specifies which branch to fetch, and if neither [`TAG`](#TAG) nor [`REVISION`](#REVISION) is set, also which branch head to copy to the build directory. Defaults to `master` if neither [`TAG`](#TAG) nor [`REVISION`](#REVISION) is set, otherwise defaults to empty (in which case all branches will be fetched). Specifies the branch within a particular repository. With Mercurial, it is common to use the "repository = branch" model instead, which should be handled using [`REPOSITORY`](#REPOSITORY), and [`CHECKOUT_TO`](#CHECKOUT_TO) if multiple branches should be allowed to coexist. Defaults to `default` if neither [`TAG`](#TAG) nor [`REVISION`](#REVISION) is set. Specifies which branch to fetch from the server. If [`TAG`](#TAG) and [`REVISION`](#REVISION) are both unset, also specifies the branch head to copy to the build directory (if there is more than one head, one will be chosen and a warning generated). If [`TAG`](#TAG) is set, only revisions that are on the specified branch _and_ have the specified tag will be considered. If either [`TAG`](#TAG) or [`REVISION`](#REVISION) is set this may be empty, in which case all branches will be fetched; however, this is inefficient and may be rejected by the server. This must be a literal branch name; any wildcard characters will automatically be escaped.
`TAG` The symbolic name of the revision that should be copied to the build directory. It is generally assumed that tags are fixed at creation, even if the SCM doesn't enforce this. Please throw things at your upstream if this is not true. If [`BRANCH`](#BRANCH) is also specified then only revisions that have the specified tag _and_ are on the specified branch will be considered. It is an error for there to be more than one revision in the repository with the specified tag, after filtering by branch if applicable.
`REVISION` The identifier of the revision that should be copied to the build directory, using the appropriate backend-specific syntax. In general, this must be a literal identifier, _not_ (for SCMs that support such things) an expression that evaluates to an identifier and may even change meaning over time. Can be set automatically if the repository is mentioned in [`EXTERNAL_REFS`](#EXTERNAL_REFS) for some other repository. Must be either a revision identifier or a non-dotted revision number (the latter is only supported to allow things like `${PV#*_p}` --- in all other cases, the globally unique identifier is preferred). A timestamp of the form _`YYYY`_`.`_`MM`_`.`_`DD`_`.`_`hh`_`.`_`mm`_`.`_`ss`_. Not supported, as darcs does not have revision identifiers. See [`DARCS_CONTEXT_FILE`](#DARCS_CONTEXT_FILE) for an alternative. Must be lower-case and unabbreviated. Must be lower-case and unabbreviated. Must be lower-case and unabbreviated.
`SUBPATH` The subdirectory of the repository that should be checked out and copied to the work directory. Not supported. The CVS module that should be checked out. Defaults to the package name for the primary repository, and the repository name for secondary repositories. Not supported. Not supported. Not supported. Not supported.
`EXTERNAL_REFS` For SCMs that allow a particular subdirectory of the repository to reference another repository, a space-separated list of terms of the form _`subdirectory`_`:`_`repository`_, indicating that the specified repository matches the external reference for _`subdirectory`_. _`repository`_ may also be blank, to specify that this reference should be ignored. The external repositories are _not_ fetched automatically, so that all SCM checkouts will be accounted for by the `scm.exlib` framework (for example, recording the exact revisions of all checked out code, adding any extra dependencies needed to support all the protocols being used, etc); this variable helps make it easier to keep the secondary repository definitions in the exheres consistent with the actual requirements, as mismatches will generate errors rather than silently doing the wrong thing. For convenience, [`UNPACK_TO`](#UNPACK_TO) and [`REVISION`](#REVISION) for the referenced secondary repositories will be set automatically to the appropriate values: the former to extract the secondary repository in the appropriate subdirectory of the main repository, the latter to fetch the exact revision mentioned in the external reference (which would frequently become out of date if it had to be specified manually). Not supported. Not supported. Not supported. For use with git submodules. It is not necessary for [`REPOSITORY`](#REPOSITORY) in the referenced repository to match the URI specified in `.gitmodules`, as a git submodule always references a revision ID, which is the same no matter where it was fetched from. This can be useful if the `.gitmodules` URI refers to a slow server and/or protocol, even though a faster alternative is available. In particular, some repositories set the submodule URI to `.git`, to refer to another branch of the same repository. This can be handled by declaring multiple repositories with the same values of [`CHECKOUT_TO`](#CHECKOUT_TO) and [`REPOSITORY`](#REPOSITORY). Not supported. Not supported. For use with the `svn:externals` feature.
`METADATA_UNNEEDED` If non-empty, indicates that the package's build system doesn't use the SCM metadata (for example, to record exactly which revision was built), and therefore it can be reduced or excluded. Normally it isn't necessary to specify this even if the metadata isn't required, but it can be beneficial for large repositories. The exact effect depends on the backend, but will usually do one or more of: 1) speed up and/or reduce bandwidth requirements for the initial checkout and/or updates; 2) reduce the disk space used for the checkout under [`${SCM_HOME}`](#SCM_HOME); 3) speed up copying the code to `${WORK}`; 4) reduce the disk space requirements under `${WORK}`. If no effect is specified here for a particular backend then this variable does nothing, but it is not an error to set it anyway. Causes `svn export` to be used instead of `cp` to copy the code to `${WORK}`, thereby excluding the `.svn` directories. This uses significantly less disk space for large repositories, as these directories include an unmodified copy of each file in the repository.
Backend-specific variables `DARCS_CONTEXT_FILE` A file containing the output of `darcs changes --context` specifying the repository state that should be copied to the build directory. Usually stored in `${FILES}`.
`GIT_TAG_SIGNING_KEYS` An array listing the names of files containing public keys that should be used to verify signed tags. If specified, the tag _must_ be signed by one of these keys. Usually created with `gpg --export --armor `_`email-address`_ and stored in `${FILES}`.
`MTN_SEED` URI of an initial database that can be downloaded instead of fetching from scratch. (This is not in `DOWNLOADS` as it is expected to be updated regularly, breaking integrity checking.) If it ends in `.gz`, `.bz2`, `.lzma` or `.xz`, will be decompressed by the appropriate decompressor.
`SVN_PASSWORD` The password to pass to SVN. If empty, an explict empty password is used. Only meaningful if [`SVN_USERNAME`](#SVN_USERNAME) is set.
`SVN_RAW_URI` If non-empty, the URI specified in [`REPOSITORY`](#REPOSITORY) will be used as-is, without assuming the standard trunk/branches/tags layout; [`BRANCH`](#BRANCH), [`TAG`](#TAG) and [`SUBPATH`](#SUBPATH) must all be empty.
`SVN_USERNAME` The username to pass to SVN. See also [`SVN_PASSWORD`](#SVN_PASSWORD).

scm.exlib-defined variables

scm.exlib defines more variables than are listed here, but the remainder are for internal use only.

Global scm.exlib-defined variables

These are simple bash variables, and are defined after requireing scm.exlib, either directly or via a backend.

SCM_HOME
Equal to ${FETCHEDDIR}/scm. The default directory for storing local checkouts. Exheres will rarely need to refer to this directly.

Per-repository scm.exlib-defined variables

These are mapped to bash variables in the same way as those defined by the exheres. They are generally not defined immediately; see the description of each variable to find out when it becomes available.

General definition Backend-specific notes
`bzr` `cvs` `darcs` `git` `hg` `mtn` `svn`
Generic variables `ACTUAL_REVISION` After a repository has been checked out and copied into `${WORK}`, identifies the specific revision that is present, in whatever format is appropriate for the SCM in question. This is always a globally unique revision identifier, not a revision number. Not supported, as CVS does not have repository-wide revision identifiers (and timestamps are not suitable due to non-atomicity). See [`CVS_ACTUAL_REVISION_LIST`](#CVS_ACTUAL_REVISION_LIST) for an alternative. Not supported, as darcs does not have revision identifiers. See [`DARCS_ACTUAL_CONTEXT`](#DARCS_ACTUAL_CONTEXT) for an alternative.
Backend-specific variables `CVS_ACTUAL_REVISION_LIST` After the repository is checked out and copied to `${WORK}`, lists the name and CVS revision of each file that is present. The format is one file per line, where each line contains the filename relative to the checkout root, followed by a `:` and a space, followed by the revision. The lines are sorted according to the `C` locale.
`DARCS_ACTUAL_CONTEXT` After the repository is checked out and copied to `${WORK}`, contains the darcs context identifying the files present, as generated by `darcs changes --context`. Note: currently empty if [`TAG`](#TAG) is set.

Miscellaneous variables

SCM_THIS
Defines the currently “active” repository. If unset or empty, refers to the primary repository, otherwise refers to the named secondary repository. Exheres may set this when calling the various functions that access per-repository variables.

scm.exlib-defined functions

scm.exlib defines more functions than are listed here, but the remainder are for internal use only.

Exported phase functions

As is usual with exlib-defined phase functions, these need only be called explicitly if the exheres or another exlib needs to define its own version of the phase function.

scm_src_fetch_extra
Checks out or updates all the repositories used by the exheres.
scm_src_unpack
Copies the code from the local checkout of each repository to ${WORK}.
scm_pkg_info
For built packages (installed or binary packages), may display detailed information about the exact version of the code that was built, if the SCM does not support short unambiguous global revision identifiers. For unbuilt packages, does nothing.

Other functions

Note: most of these functions are rarely needed in exheres. In particular, scm_get_var and scm_set_var need only be used if the name of the variable or repository is not fixed at authoring time; otherwise, it is acceptable and usually preferably to reference the underlying bash variable directly. See the example for an exception.

The “active repository” referred to by the function descriptions is determined by SCM_THIS.

scm_finalise
Performs various global-scope operations. Called automatically when scm.exlib is loaded, except when SCM_NO_AUTOMATIC_FINALISE is set; see the example for such a situation.
scm_for_each
Takes one or more arguments denoting a command, and runs the command once for each active repository (the primary, if SCM_NO_PRIMARY_REPOSITORY is not set, and any secondaries except those disabled by option conditionals) with SCM_THIS set appropriately.
scm_var_name
Takes a single argument naming a per-repository variable, and outputs the name of the bash variable corresponding to the specified variable for the active repository.
scm_get_var
Takes a single argument naming a per-repository variable, and outputs the value of the specified variable for the active repository.
scm_set_var
Takes two arguments, the first naming a per-repository variable and the second specifying a value, and sets the specified variable to the specified value for the active repository.
scm_modify_var
Takes at least two arguments, the first naming a per-repository variable and the remainder denoting a command. Runs the command with the current value of the specified variable for the active repository as an extra, final argument, and sets the specified variable to the output of the command.
scm_get_array
Takes two arguments, one naming a per-repository variable and one naming a bash array in the caller’s scope, and sets the contents of the array to the value of the specified variable for the active repository.
scm_set_array
Takes at least one argument, one naming a per-repository variable and the rest denoting an array, and sets the specified variable for the active repository to the specified array value.
scm_trim_slashes
Takes zero or more of -scheme, -leading and -trailing, followed by exactly one string argument. Outputs its string argument with sequences of duplicate / characters replaced by single /s. If -scheme is specified, any text up to and including the first occurrence of :// will be unchanged. If -leading is specified, also removes any / characters at the start of the string. If -trailing is specified, also removes any / characters at the end of the string.

User configuration

scm.exlib allows users to customise some aspects of its behaviour by defining variables and functions in the package manager configuration (such as /etc/paludis/bashrc) or calling environment. Exheres should generally not define or reference these directly.

User-defined variables

SCM_OFFLINE
If non-empty, disables network access, using the existing checkout(s). If they are missing or insufficient, the build is aborted.
SCM_MIN_UPDATE_DELAY
If non-empty, must be a positive integer that specifies the minimum number of hours between updating any specific checkout. Ignored if the existing checkout is insufficient.
SCM_SVN_CONFIG_DIR
If non-empty, the path to an SVN user configuration directory, as is usually stored in ~/.subversion. This can be used to configure proxy settings (as SVN does not recognise the standard environment variables), accept SSL certificates that would normally be rejected, etc. The directory can be created with the default contents by running svn --config-dir dir --version && chmod a+rx dir/auth. The contents are documented in the Subversion book (although many of the settings are not relevant for scm.exlib).

User-defined functions

scm_user_customize
If defined, will be called just before performing any operations, to give the user an opportunity to override settings defined by the exheres. May (in most cases, must) use exheres variables such as CATEGORY, PN, SLOT, PV (possibly via the ever function) and so on to adjust the customisations for each package, either in the function body itself or to conditionally define it.

Note that scm_user_customize should be used with caution — if used to fetch code that differs too much from what the exheres expects, the build may fail. It is also unlikely that user modifications to SCM_NO_PRIMARY_REPOSITORY, SCM_SECONDARY_REPOSITORIES or the per-repository TYPE variable will work as expected. In such cases, a custom exheres will be required instead.

scm_user_customize examples

To override the repository/branch for a couple of specific packages, perhaps to use an experimental version of the package or a local work-in-progress:

case "${CATEGORY}/${PN}" in
    dev-libs/pinktrace)
        scm_user_customize () {
            SCM_BRANCH="easy"
        }
        ;;
    sys-apps/paludis)
        scm_user_customize () {
            SCM_REPOSITORY="file:///home/alip/src/paludis/.git"
        }
        ;;
esac

To enable offline mode automatically when no network connection is available according to NetworkManager (this example also allows per-package settings to be defined with one function per package, while still coexisting with the non-package-specific code):

scm_user_customize() {
    # Set SCM_OFFLINE if not connected
    esandbox allow_net --connect "unix:/run/dbus/system_bus_socket" >&/dev/null
    if [[ $(nmcli -t -f STATE g status) != connected* ]]; then
        export SCM_OFFLINE=1
    fi
    esandbox disallow_net --connect "unix:/run/dbus/system_bus_socket" >&/dev/null

    # Call the per-package scm_user_customize hook
    if type scm_user_customize_${CATEGORY}---${PN} >&/dev/null; then
        scm_user_customize_${CATEGORY}---${PN}
    fi
}

Note that SCM_OFFLINE doesn’t really need to be set using scm_user_customize, as it is purely a user variable and therefore will not be overwritten by the exheres no matter how it is set, but this method allows the NetworkManager call to be skipped when not relevant.

Writing backends

TODO


Copyright 2009, 2010, 2011 David Leverton

Copyright 2011 Ali Polatel

Copyright 2011 Alex Elsayed

This work is licensed under the Creative Commons Attribution Share Alike 3.0 License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/; or, (b) send a letter to Creative Commons, 171 2nd Street, Suite 300, San Francisco, California, 94105, USA.