alisw
diff --git a/‎docs/aliphysics-ci.md‎
Lines changed: 2 additions & 2 deletions b/‎docs/aliphysics-ci.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/infrastructure-alibi-user-guide.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/infrastructure-alibi-user-guide.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/infrastructure-alibi.md‎
Lines changed: 9 additions & 9 deletions b/‎docs/infrastructure-alibi.md‎
Lines changed: 9 additions & 9 deletions
diff --git a/‎docs/infrastructure-alienvobox.md‎
Lines changed: 3 additions & 3 deletions b/‎docs/infrastructure-alienvobox.md‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎docs/infrastructure-docker-packer.md‎
Lines changed: 6 additions & 5 deletions b/‎docs/infrastructure-docker-packer.md‎
Lines changed: 6 additions & 5 deletions
diff --git a/‎docs/infrastructure-frontend.md‎
Lines changed: 6 additions & 6 deletions b/‎docs/infrastructure-frontend.md‎
Lines changed: 6 additions & 6 deletions
diff --git a/‎docs/infrastructure-jenkins.md‎
Lines changed: 3 additions & 3 deletions b/‎docs/infrastructure-jenkins.md‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎docs/infrastructure-known-tradeoffs.md‎
Lines changed: 9 additions & 9 deletions b/‎docs/infrastructure-known-tradeoffs.md‎
Lines changed: 9 additions & 9 deletions
diff --git a/‎docs/infrastructure-logs.md‎
Lines changed: 4 additions & 4 deletions b/‎docs/infrastructure-logs.md‎
Lines changed: 4 additions & 4 deletions
@@ -4,15 +4,15 @@ layout: main
 categories: developer
 ---
 
-# Checking the status of the daily builds
+## Checking the status of the daily builds
 
 Daily build are built with Jenkins. You can check the status of the current
 build and the logs of previous one by going to:
 
 <https://alijenkins.cern.ch/job/daily-builds/job/daily-aliphysics-github/>
 
 
-# Trying out a Release Validation
+## Trying out a Release Validation
 
 If a release validation fails, one can try out the release candidate by sourcing the
 nightly environment from CVMFS:
 
@@ -112,7 +112,7 @@ TBD
 
 ## FAQ
 
-### Which Local Directories Can I Use on The  Compute Node?
+### Which Local Directories Can I Use on The Compute Node?
 Fast, local directories are available under
 ```
 $L_HOME=/home/$USER
 
@@ -14,11 +14,11 @@ To ensure that results are always reproducable, the machine setup is enforced an
 * The user is the only active user on the underlying hardware eliminating system load that might have otherwise been caused by other users. 
 * The system state corresponds to the one described in the systems initial puppet manifest. This ensures that no processes or containers from previous users are still running on the hardware as well as a consistent software stack.
 
-# Installing the AliBI system
+## Installing the AliBI system
 
 The AliBI system relies on a CERN OpenStack VM for the _head node_ (`alibilogin01.cern.ch`) and a bare metal server as _compute node_ (`alibicompute01.cern.ch`). The software stack and machine state is formalized using puppet manifests and is fully integrated in the CERN configuration management ecosystem. The setup process is fully described below.
 
-## AliBI head node
+### AliBI head node
 
 * On `aiadm.cern.ch` enter the OpenStack _Release Testing_ environment by running
 
@@ -38,11 +38,11 @@ The AliBI system relies on a CERN OpenStack VM for the _head node_ (`alibilogin0
   openstack server set --property landb-alias=alibi alibilogin01
   ```
 
-## AliBI compute node
+### AliBI compute node
 
 The compute node is a physical machine outside the CERN datacenter, which makes provisioning a bit more complicated.
 
-### Registrations (only for first time set up)
+#### Registrations (only for first time set up)
 
 * Register the machine in CERN [LANDB](https://network.cern.ch)
 * Create an entry for the machine in [Foreman](https://judy.cern.ch/):
@@ -75,7 +75,7 @@ The compute node is a physical machine outside the CERN datacenter, which makes
     * Enabled: `YES`
     * Hardware Model: `ProLiant DL380 Gen10`
 
-### Prepare installation
+#### Prepare installation
 
 * Based on the Foreman entry, a provisioning template in form of a _kickstart file_ is generated and is updated every time the configuration in Foreman is changed.
 * Since the compute node is outside of the CERN datacenter it does not have direct access to this file, so it needs to be downloaded and self hosted for the duration of the installation.
@@ -94,7 +94,7 @@ The compute node is a physical machine outside the CERN datacenter, which makes
 
 * Set Foreman environment to `alibuild/alibi`.
 
-### Installation
+#### Installation
 
 * Get IPMI/ILO access to the physical server
 * Boot machine in network boot (PXE)
@@ -123,7 +123,7 @@ The compute node is a physical machine outside the CERN datacenter, which makes
 * At this point you will notice that the `post installation` section of the installation has not been completed automatically. Since all commands are bash, it can be executed dully by copy& paste or extracted and executed as a separate script.
 * Afterwards the machine state should reflect the puppet manifests and can be fully monitored using the CERN Foreman infastruture.
 
-## Installation of packages via puppet
+### Installation of packages via puppet
 
 * Packages are installed via puppet. The configuration / manifests is taken from a special `alibi` branch on a central git repository 
   [PUPPET-HOSTGROUP](https://gitlab.cern.ch/ai/it-puppet-hostgroup-alibuild/blob/alibi).
@@ -134,9 +134,9 @@ The compute node is a physical machine outside the CERN datacenter, which makes
   puppet agent -t -v
   ```
 
-## Troubleshooting
+### Troubleshooting
 
-### Symptom: No allocations can be made, node stuck in "drain" state
+#### Symptom: No allocations can be made, node stuck in "drain" state
 
 In case `sinfo` shows:
 
 
@@ -4,7 +4,7 @@ layout: main
 categories: infrastructure
 ---
 
-# Register the VOBOX
+## Register the VOBOX
 
 An AliEn VOBOX has first to be registered to the AliEn LDAP. AliEn
 administrators can do that with two pieces of information:
@@ -15,7 +15,7 @@ administrators can do that with two pieces of information:
 A site certificate and an associated private key will be created.
 
 
-# Store credentials in Vault
+## Store credentials in Vault
 
 First create a policy:
 
@@ -55,7 +55,7 @@ Alternatively we can read it from a file:
     vault write secret/mysitevobox/host_cert value=@usercert.pem
 
 
-# Run the Ansible configuration
+## Run the Ansible configuration
 
 Our configuration is stored on Ansible. To run it, by limiting the run only to
 the AliEn VOBOXes, do - from the private configuration folder:
 
@@ -12,13 +12,13 @@ The docker image definitions are available in
 <https://registry.cern.ch/>
 
 
-# Rebuilding a Docker image
+## Rebuilding a Docker image
 
 If an image definition has changed, it must be rebuilt and pushed to the proper
 registry for Nomad to use it in new job allocations.
 
 
-## Packer-defined images
+### Packer-defined images
 
 Newer images have a ``packer.json`` file, which allows them to be built using
 Hashicorp packer
@@ -36,7 +36,7 @@ packer build packer.json
 docker push registry.cern.ch/alisw/<image-name>
 ```
 
-## Dockerfile-defined images
+### Dockerfile-defined images
 
 Older images without a `packer.json` can be built with:
 
@@ -45,13 +45,14 @@ docker build -t alisw/<image-name> <image-name>
 ```
 
 
-# Conventions
+## Conventions
 
 The CI uses an image named <arch>-builder where <arch> is the architecture of the
 image. The CI system will automatically select the correct image for a given
 architecture, so the image name must match the format exactly.
 
-https://github.com/alisw/ci-jobs/blob/master/ci/ci.nomad
+The code to infer the image names is
+[here](https://github.com/alisw/ci-jobs/blob/master/ci/ci.nomad)
 
 [ci-jobs]: https://github.com/alisw/ci-jobs
 [packer]: https://www.packer.io/
@@ -4,7 +4,7 @@ layout: main
 categories: infrastructure
 ---
 
-# Frontend setup
+## Frontend setup
 
 The ALICE build infrastructure is exposed via SSO.
 
@@ -14,9 +14,9 @@ runs apache and does the reverse proxying to the actual service.
 The machine is setup in CERN/IT puppet + OpenStack facility in the hostgroup
 `alibuild/frontend`.
 
-# Disaster recovering
+## Disaster recovering
 
-## Starting the frontend
+### Starting the frontend
 
 The quick recipe to restart the frontend is:
 
@@ -47,7 +47,7 @@ The quick recipe to restart the frontend is:
 
   and they need to have the right Ip Address registered there.
 
-## Enabling / disabling one host in the load balancing
+### Enabling / disabling one host in the load balancing
 
 Machines in the `alibuild/frontend` hostgroup participate in a load balanced DNS alias. In order to do so they must be in roger state `production`. To do so:
 
@@ -68,7 +68,7 @@ You can check their load balanced score with:
 /usr/local/sbin/lbclient -d TRACE
 ```
 
-# CERN Single Sign-On (SSO) authentication
+## CERN Single Sign-On (SSO) authentication
 
 Some web applications use Apache's OIDC support to authenticate with CERN SSO. Apache then sets [various `OIDC_CLAIM_*` headers][headers] on the forwarded requests.
 
@@ -77,7 +77,7 @@ See also [the CERN SSO documentation][cern-sso].
 [headers]: https://auth.docs.cern.ch/user-documentation/oidc/config/
 [cern-sso]: https://auth.docs.cern.ch/applications/application-configuration/
 
-## Adding a new application
+### Adding a new application
 
 Applications must be configured on the CERN SSO side through the [Application Portal][app-portal] and on the ALICE side though our Puppet-generated Apache configuration, specifically the file `it-puppet-hostgroup-alibuild/data/hostgroup/alibuild/frontend.yaml`.
 
 
@@ -13,7 +13,7 @@ Master nodes are configured through Puppet in the file:
 
 - [/code/manifests/alibuild/mesos/slave/jenkins.pp](https://gitlab.cern.ch/ai/it-puppet-hostgroup-alibuild/blob/master/code/manifests/mesos/slave/jenkins.pp)
 
-# Essential Operation Guides:
+## Essential Operation Guides:
 
 * [Create the Jenkins](#create-the-jenkins-master-only-in-case-of-disaster-recovery)
 * [Starting Jenkins](#starting-jenkins)
@@ -96,7 +96,7 @@ The step by step guide is:
 
   The `<parameters>` are formatted as in a URL: `<name>=<value>&<name2>=<value2>`.
 
-## Creating Jenkins agents with guaranteed resources
+### Creating Jenkins agents with guaranteed resources
 
 This is the main way we deploy Jenkins builders.
 The advantage of fixed builders is that we are never in a situation where there is not enough space on the cluster by accident to run a Jenkins build.
@@ -136,7 +136,7 @@ levant render -var-file <name>.yaml | nomad job plan -      # make sure job can
 levant render -var-file <name>.yaml | nomad job run -       # actually run job
 ```
 
-## Gotchas and issues:
+### Gotchas and issues:
 
 * On some systems, the CERN CA is not available by default. You can overcome this by either:
   * Go to <https://ca.cern.ch> and install all the required CA certificates. In general this is what is needed on macOS.
 
@@ -6,9 +6,9 @@ categories: infrastructure
 
 This is a list of known issues or tradeoffs in our build infrastructure. We document them and try very hard to find a viable solution to all of them, however so far the solution seems to be unaffordable or has even worse drawbacks so we decided to simply live with them when they happen. Any contribution to improve the situation is welcome.
 
-# PR checking
+## PR checking
 
-## PR checking dies due to external services (e.g. CCDB) being down
+### PR checking dies due to external services (e.g. CCDB) being down
 
 Sometimes checks fail because external services are down. Dealing with them in a proper way would imply mocking the service, but:
 
@@ -17,14 +17,14 @@ Sometimes checks fail because external services are down. Dealing with them in a
 
 As a mitigation we run our test continuously, rebuilding broken tests when there is no pending ones.
 
-## PR checks can affect each other, even if unrelated
+### PR checks can affect each other, even if unrelated
 
 In order to save time, we check our tests in the same build area, so that we rebuild only changes between one build and another. Due to limitations in CMake or undetected missing dependencies, we can however end up in a state where a given test interferes with another, in particular:
 
 * When libraries / dictionaries are moved around
 * When a missing / implicit dependency is present and the order in which PRs are build in the PR checker is by chance a working one.
 
-## PR checks introduce relocation issues a few days after merging
+### PR checks introduce relocation issues a few days after merging
 
 In order to save time, PR checkers do their best to reuse pre-built tarballs which are downloaded from a central server. However by design this requires have packages fully relocatable in particular:
 
@@ -36,23 +36,23 @@ Failing that the net result will be that a relocation issue will be present and
 Rebuilding a PR twice in two different locations is deemed to expensive.
 Doing proper sandboxing requires changing the tools we have to something like Bazel.
 
-## Errors appear in the PR checker which are not there local builds
+### Errors appear in the PR checker which are not there local builds
 
 Some of the recipes use environment variables (in particular `ALIBUILD_O2_TESTS`) to trigger different behaviors, e.g. increase the amount of testing being done and enable / disable special features. We should try to minimize their usage, however unfortunately they are still widely used.
 
-## PRs take long to complete all tests
+### PRs take long to complete all tests
 
 By construction you are limited by the longest path, and even if we try to minimize the amount of work done, one has to ultimately chose between minimizing false negatives and performance. Work is currently being done to reduce the unneeded tests in particular for the analysis. A proper solution for this would be to use a tool which imposes specifying all the hidden dependencies and takes advantage of that. However, this most likely means to move away from CMake and so far it was not considered a viable solution.
 
-# RPM generation
+## RPM generation
 
-## Updatable RPM packages have conflicting files
+### Updatable RPM packages have conflicting files
 
 Updatable RPMs are generate from the tarballs of the various packages which are also deployed in CVMFS. Those tarballs are built and installed in a separate per-package location, in order to allow multiple, coexisting installations. This means that conflicting files can be introduced without any previous warning at RPM generation time. The alternative, i.e. installing everything in a single location, would either move the problem to it's conjugate for CVMFS installation, or it would mean that what is installed in CVMFS is different from what is packaged in the updateable RPMs, duplicating CI and debugging issues.
 
 ## Externals
 
-## Old / own version of externals
+### Old / own version of externals
 
 Sometimes the externals provided in alidist are either old, or provide a rebuild of a commonly available tool. In general this happens because we need to still support Run 2 Production requirements (including ROOT5 and XRootD3) and we prefer maintain a single set of tools, rather than split our configuration management.
 
 
@@ -15,13 +15,13 @@ which is an SSO protected url exposed by machines in the `alibuild/frontend` pup
 For the SSO access you need to be an alice member, while for the S3 endpoint, you either need to be in the `alice-vm-admin`
 egroup.
 
-# Essential operation guides
+## Essential operation guides
 
 * [Creating the bucket](#creating-the-bucket)
 * [Updating the policy](#updating-the-policy)
 * [Accessing the logs programmatically](#accessing-the-logs-programmatically)
 
-## Creating the bucket
+### Creating the bucket
 
 Creating the bucket should not be needed unless some disaster happens. The current instructions to do so are:
 
@@ -32,7 +32,7 @@ Creating the bucket should not be needed unless some disaster happens. The curre
 * Set the access policy to the contents of `ali-marathon/s3/alice-build-logs-policy.json`.
 * Verify that using the `ali-bot` access_key / secret_key you can write files.
 
-## Updating the policy
+### Updating the policy
 
 In case you need to update the S3 access permission policy, e.g. in case the frontend IP changes, you need to do so in `ali-marathon/s3/alice-build-logs-policy.json` and then apply it to the `s3://alice-build-logs`
 
@@ -46,6 +46,6 @@ curl alice-build-logs.s3.cern.ch/test.txt
 
 If you get an actual reply, rather than permission denied, it means the machine can access the logs.
 
-## Accessing the logs programmatically
+### Accessing the logs programmatically
 
 Accessing the logs programmatically can be done via any S3 enabled client, e.g. `s3cmd` (command line) or `boto3` (python). Ask usual suspects for the access key, secret. An example of how new logs can be pushed via `boto3` is at <https://github.com/alisw/ali-bot/blob/master/report-pr-errors#L175-L194>.