Skip to content

[Archery] C++ Benchmark preserve improvements #49913

@AntoinePrv

Description

@AntoinePrv

Describe the enhancement requested

With archery benchmark [run|diff] --preserve, the build and source folders are preserved into a folder like the following. In my personal use, I found several improvements that would ease working with archery benchmark.

<TMP>/arrow-archery-xlzqaz4l/<GIT STR>/
  - arrow/
  - build/

A - Set the preserve directory

On MacOS, the directory ends up being something like this

/var/folders/9c/thrhbgqx2xb2xqvfk_2m6pgh0000gn/T/

Which is impossible to find again without parsing archery log.

On top of this, I'd sometime want to control more finely where the cache is stored, either for convenience of inspecting/using it, or because the path structure is not satisfying for some use case (e.g. baenchmarking same commit but with xsimd 14.1 and 14.2, with different compiler options...).

I propose adding an optional CLI argument --preserve-dir <PATH> to explicitly control where the preserve directory are stored (<TMP> in the above).

B - Always preserve benchmark output with preserve option

When --preserve is set, I recommend we always store the benchmark timings JSON file in the preserve directory:

  • It is relatively small compared the the size of the build directory, so we should be eager in saving it just in case it might be needed
  • It helps keeping track of their name. Currently we have to think of an explicit name in the --output. Having a copy automatically with the build, it is associated with the commit name, the path of the preserve-dir, and we can retract the compilation context from the build directory (compiler flags used...).

This greatly reduce the cognitive load of having to choose name, track which file correspond to which settings, reduce the length of the archery benchmark commands we type.
This would be independent from --output, which would still work as before.

There is also more information we should store, such as the invocation command.

C - Resolve git string (breaking)

Right now, with archery benchmark run main the path created is:

<TMP>/arrow-archery-xlzqaz4l/main/

I suggest replacing it automatically with

<TMP>/arrow-archery-xlzqaz4l/<GIT SHA>/

At first glance, it will be slightly harder looking at the folder that main was intended. Though in practice beleive this is more sneaky than helpful. main is a moving target and even with a day of work its meaning can change and we forget "which main". This is even more error-prone with a feature branch, and remote copies for bench-marking different platforms.

Component(s)

Archery

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions