This patch refactors the generate_test_report script, namely turning it
into a proper library, and pulling the script/unittests out into
separate files, as is standard with most python scripts. The main
purpose of this is to enable reusing the library for the new Github
premerge.
Reviewers: tstellar, DavidSpickett, Keenuts, lnihlen
Reviewed By: DavidSpickett
Pull Request: https://github.com/llvm/llvm-project/pull/133196
Before this patch, making a change to a runtime directory (like libcxx)
would cause the project to be added to the LLVM_ENABLE_PROJECTS CMake
flag which is illegal as they can only be built as part of
LLVM_ENABLE_RUNTIMES. This patch fixes that behavior. Test added.
This patch adds a python script, compute_projects, and associated unit
tests for computing the projects and runtimes that need to be tested in
premerge. Rewriting in Python opens up a couple new
improvements/opportunities:
1. I personally find python to be much easier to work with than shell
scripts for tasks like this. Particularly it becomes a lot easier to
work with paths with proper array support.
2. Unit testing becomes easier which makes it a lot easier to reason
about behavior changes, especially in review.
3. Most of the configuration is now setup in some dictionaries, which
makes changes much easier to apply for most of the common changes.
This preserves the behavior of the existing premerge scripts as much as
possible.
Reviewers: ldionne, lnihlen, Endilll, tstellar, Keenuts
Reviewed By: Keenuts
Pull Request: https://github.com/llvm/llvm-project/pull/132634
Between C++26 and Clang modules build of runtimes, which are used as an
additional testing for Clang, the only difference are
`LIBCXX_TEST_PARAMS` and `LIBCXXABI_TEST_PARAMS`. Both of them are
transformed into actual lit configuration lines, and put into
`SERIALIZED_LIT_PARAMS`, which end up in `libcxx/test/cmake-bridge.cfg`
via `configure_file` command. Notably, it seems that they are not used
in any other way.
I checked that if those variables are changed, subsequent runs of CMake
configuration step regenerate `cmake-bridge.cfg` with the new values.
Which means that we don't need to do clean builds for every runtimes
configuration we want to test.
I hope that together with #131913, this will be enough to alleviate
Linux CI pains we're having, and we wouldn't have to make a tough choice
between C++26 and Clang modules builds for pre-merge CI.
This patch cleans up the runtimes build in premerge due to queuing
delays, dropping the C++03 testing, but keeping the C++20 and Modules
configurations as they are deemed important by clang contributors.
This patch also makes it easier in the future when we need to rework the
runtimes build to anticipate the deprecation of building most of the
runtimes with LLVM_ENABLE_PROJECTS.
This reverts commit 95d28fe503cc3d2bc0bb980442d3defaf199ea5a.
I did not fully realize the implications of this change when reviewing.
With how it is set up currently, it causes clang and all of the runtimes
to be built and tested everytime a change to MLIR is made. This is a
large regression in build/test time, which seems to have been causing
large queueing delays.
Reverting for now. Once we rework the runtimes build for premerge (which
I hope to do soon, ideally in the next week), I will make sure flang-rt
gets added in.
This patch bumps the maximum number of metrics to look through when
collecting metrics data. We are currently running into issues where we
are losing data due to the most recent 1000 workflows not containing the
workflows that we actually need to query. Just double it for now.
I plan on monitoring this reasonably closely to ensure we do not run
into issues, mainly API rate limits.
The current container focuses on Github metrics. Before deprecating
BuildKite, we want to make sure the new infra quality is better, or at
least the same.
Being able to compare buildkite metrics with github metrics on grafana
will allow us to easily present the comparison.
BuildKite API allows filtering, but doesn't allow changing the result
ordering. Meaning we are left with builds ordered by IDs. This means a
completed job can appear before a running job in the list. 2 solutions
from there:
- keep the cursor on the oldest running workflow
- keep a list of running workflows to compare.
Because there is no guarantees in workflow ordering, waiting for the
oldest build to complete before reporting any newer build could mean
delaying the more recent build completion reporting by a few hours. And
because grafana cannot ingest metrics older than 2 hours, this is not an
option.
Thus we leave with the second solution: remember what jobs were running
during the last iteration, and record them as soon as they are
completed. Buildkite has at most ~100 pending jobs, so keeping all those
IDs should be OK.
Flang's runtime can now be built using LLVM's LLVM_ENABLE_RUNTIMES
mechanism, with the intent to remove the old mechanism in #124126.
Update the pre-merge builders to use the new mechanism.
In the current form, #124126 actually will add
LLVM_ENABLE_RUNTIMES=flang-rt implicitly, so no change is strictly
needed. I still think it is a good idea to do it explicitly and in
advance.
On Windows, flang-rt also requires compiler-rt, but which is not
building on Windows anyway.
Yesterday, the monitoring reported a job queued for 23h59. After some
checks, it appears no such job existed: the age of the workflows on
completion was at most 5 hours during the last 48 hours.
After some digging, I found out GitHub could return a job with a start
date slightly before the creation date, or completion date before start
date.
This would cause python to compute a negative timedelta, which would
then be reported in grafana as a full 24h delta due to the conversions.
Adding code to ignore negative delta, but logging them.
Before this patch, the job/workflow name impacted the metric name,
meaning a change in the workflow definition could break monitoring. This
patch adds a map to get a stable name on metrics from a workflow name.
In addition, it reworks a bit how we track the last processed workflow:
the github queries are broken if filtering is applied, meaning we have a
list of workflow, ordered by 'created_at', which mixes completed &
running workflows.
We have no guarantees over the order of completion, meaning we cannot
stop at the first completed job we found (even per-workflow).
This PR processed the last 1000 workflows, but allows an early stop if
the created_at time is older than 8 hours. This means we could miss
long-running workflows (>8 hours), and if the number of workflows
started before another one completes becomes high (>1000), we'll miss
it.
To detect this kind of behavior, a new metric is added "oldest workflow
processed", which should at least indicate if the depth is too small.
An alternative without arbitrary cut would be to initially parse all
workflows, and then record the last non-completed one we find and always
start from the last (moving the lower bound as they complete). But LLVM
has forever-queued workflows runs (>1 years), hence this would cause us
to iterate over a very large number of jobs.
---------
Signed-off-by: Nathan Gauër <brioche@google.com>
The current container focuses on Github metrics. Before deprecating
BuildKite, we want to make sure the new infra quality is better, or at
least the same.
Being able to compare buildkite metrics with github metrics on grafana
will allow us to easily present the comparison.
This PR requires https://github.com/llvm/llvm-zorg/pull/400 to be merged
first.
This patch adds some logging information for individual workflow jobs inside
the metrics container. This is mainly intended for debugging why we seem to be
missing metrics from some workflows within Grafana.
This patch makes the metrics container use the python logging library. This
is more of what we want given we're essentially just logging the status of
things. It also means we do not have to explicitly specify an output file
and lets us control verbosity a bit more cleanly.
The metrics script includes some logic to only read look at workflows up
to the most recent workflow it has seen previously. This was broken in a
previous patch when workflow metrics began to be emitted per job. The
logic ending the metrics gathering would never trigger, so we would
continually fetch more and more workflows until OOM.
This patch makes it so that skipped steps do not cause a job to be
considered failed. The windows premerge jobs currently skip the
build/test step if there are no projects to build/test. These show up as
failures in the dashboard even though everything executed perfectly
fine.
Reviewers: lnihlen, Keenuts
Reviewed By: lnihlen
Pull Request: https://github.com/llvm/llvm-project/pull/127279
Currently the metrics container is crashing reasonably often with
incomplete read/connection broken errors. Try moving the creation of the
Github Object into the main loop to see if recreating the object that
maybe handles some connection state fixes the issue.
Reviewers: Keenuts, lnihlen
Reviewed By: lnihlen
Pull Request: https://github.com/llvm/llvm-project/pull/127276
This patch removes an extra heartbeat metric in the metrics python file. Before
it was performed twice, once in the main function, and once in the
get_sampled_workflow_metrics function. We only need one to keep everything
happy, and I've chosen to keep the one in get_sampled_workflow_metrics as it
seems a more appropriate place to keep it.
Reviewers: Keenuts, lnihlen
Reviewed By: lnihlen
Pull Request: https://github.com/llvm/llvm-project/pull/127275
This patch makes it so that the metrics container counts the number of in
progress and queued jobs at the job level rather than at the workflow
level. This helps us distinguish windows versus linux load and also lets
us filter out the MacOS jobs that only run in the release branch.
Reviewers: Keenuts, lnihlen
Reviewed By: lnihlen
Pull Request: https://github.com/llvm/llvm-project/pull/127274
This patch makes it so that the metrics script can support multiple jobs
in a single workflow. This is needed so that we do not crash on an
assertion now that the windows job has been enabled within the premerge
workflow.
This patch makes it so that the caller of monolithic-windows.sh can set
the maximum number of parallel compile/link jobs in an environment
variable rather than manually specifying it inside of the CMake.
Additionally, the env variable definitions for CC, CXX, and LD are sunk
into the shell script due to those config options being pretty inherent
to what the pipeline is testing.
This is intended to make things more flexible/useable for the new
premerge CI pipeline, particularly as we are looking at using larger
runners and want the increased flexibility to experiment.
This commits allows the container to report 3 additional metrics at
every sampling event:
- a heartbeat
- the size of the workflow queue (filtered)
- the number of running workflows (filtered)
The heartbeat is a simple metric allowing us to monitor the metrics
health. Before this commit, a new metrics was pushed only when a
workflow was completed. This meant we had to wait a few hours
before noticing if the metrics container was unable to push metrics.
In addition to this, this commits adds a sampling of the workflow
queue size and running count. This should allow us to better understand
the load, and improve the autoscale values we pick for the cluster.
---------
Signed-off-by: Nathan Gauër <brioche@google.com>
In this build:
https://buildkite.com/llvm-project/github-pull-requests/builds/126961
The builds actually failed, probably because prerequisite of a test
suite failed to build.
However they still ran other tests and all those passed. This meant that
the test reports were green even though the build was red. On some level
this is technically correct, but it is very misleading in practice.
So I've also passed the build script's return code, as it was when we
entered the on exit handler, to the generator, so that when this happens
again, the report will draw the viewer's attention to the overall
failure. There will be a link in the report to the build's log file, so
the next step to investigate is clear.
It would be nice to say "tests failed and there was some other build
error", but we cannot tell what the non-zero return code was caused by.
Could be either.
The script handles the following situations now:
| Have Result Files? | Tests reported failed? | Return code | Report |
|--------------------|------------------------|-------------|-----------------------------------------------------------------------------|
| Yes | No | 0 | Success style report. |
| Yes | Yes | 0 | Shouldn't happen, but if it did, failure style report
showing the failures. |
| Yes | No | 1 | Failure style report, showing no failures but noting
that the build failed. |
| Yes | Yes | 1 | Failure style report, showing the test failures. |
| No | ? | 0 | No test report, success shown in the normal build
display. |
| No | ? | 1 | No test report, failure shown in the normal build
display. |
This patch makes the metrics job also detect failures in individual
steps. This is necessary now that we are setting continue-on-error in
the premerge jobs to prevent sending out unnecessary email to detect
what jobs actually fail.
LLVM Premerge Checks is running on the new GCP cluster. Tracking its
metrics will allow us to determine the stability of the presubmit and
make sure the new infra is working as intended.
---------
Signed-off-by: Nathan Gauër <brioche@google.com>
This patch modifies the monolithic shell scrips to only run if the
buildkite-agent application is present. This allows for running the
scripts to completion outside of buildkite (eg inside of a GHA
pipeline).
This patch refactors some common functionality present in the CI scripts
to a separate shell script. This is mainly intended to make it easier to
reuse this functionality inside of a Github Actions pipeline as we make
the switch.
This patch includes the script that pulls information from Github and
pushes it to Grafana. This is currently running in the cluster and
pushes information to
https://llvm.grafana.net/public-dashboards/6a1c1969b6794e0a8ee5d494c72ce2cd.
This script is designed to accept other jobs relatively easily and can
be easily modified to look at other metrics.
This resulted in the style being None and despite the report being
empty as well, we tried to send it to the agent and Python can't
send None as an argument.
To fix this return "success" style and also check whether the
report has any content before calling the agent.
This reverts commit 8a1ca6cad9cd0e972c322910cdfbbe9552c6c7ca.
I have fixed 2 things:
* The report is now sent by stdin so we do not hit the limit on the size
of command line arguments.
* The report is limited to 1MB in size and if we exceed that we fall back
to listing only the totals with a note telling you to check the full log.
This reverts commit e74a002433b4cf7f891ceedb61bd862867218a8b.
As it is failing on Linux with "OSError: [Errno 7] Argument list too long: 'buildkite-agent'".
The CI builds now send the results of every lit run to a unique file.
This means we can read them all to make a combined report for all
tests.
This report will be shown as an "annotation" in the build results:
https://buildkite.com/docs/agent/v3/cli-annotate#creating-an-annotation
Here is an example:
https://buildkite.com/llvm-project/github-pull-requests/builds/112660
(make sure it is showing "All" instead of "Failures")
This is an alternative to using the existing Buildkite plugin:
https://github.com/buildkite-plugins/junit-annotate-buildkite-plugin
As the plugin is:
* Specific to Buildkite, and we may move away from Buildkite.
* Requires docker, unless we were to fork it ourselves.
* Does not let you customise the report format unless again,
we make our own fork.
Annotations use GitHub's flavour of Markdown so the main code in the
script generates that text. There is an extra "style" argument generated
to make the formatting nicer in Buildkite.
"context" is the name of the annotation that will be created. By using
different context names for Linux and Windows results we get 2 separate
annotations.
The script also handles calling the buildkite-agent. This makes passing
extra arguments to the agent easier, rather than piping the output of
this script into the agent.
In the future we can remove the agent part of it and simply use
the report content. Either printed to stdout or as a comment on
the GitHub PR.
In this patch I'm using a new lit option so that the pipeline writes
many results files, one for each time lit is run:
```
--use-unique-output-file-name
When enabled, lit will add a unique element to the output file name, before the extension. For example "results.xml" will become "results.<something>.xml". The
"<something>" is not ordered in any way and is chosen so that existing files are not overwritten. [Default: Off]
```
(I added this to lit recently)
Alternatives were considered:
* mkfifo - does not work on bash for Windows.
* tail -f - does not print full content on file truncation
* lit wrapper script - more complication than using an option to lit
itself
* ninja/mv file/ninja/mv file etc - lots of changes needed to make the
scripts build each target separately
And after feedback I decided that using an option to lit itself is the
cleanest way to go. It can be removed when we no longer need it.
If I run the Linux build after this change:
```
$ bash ./.ci/monolithic-linux.sh "clang;lldb;lld" "check-lldb-shell check-lld" "libcxx;libcxxabi" "check-libcxx check-libcxxabi"
```
I get multiple test result files. In my case some tests fail so runtimes
aren't checked, but all projects are so there is 1 file for lldb and one
for lld:
```
$ ls build/*.xml
build/test-results.klc82utf.xml build/test-results.majylh73.xml
```
This change just collects the XML files as artifacts. Once I know that's
working, I can set up test reporting to make a summary of them.
Fixes#110265
Adding check-all causes us to run some tests twice if a project specific
target like check-clang is also added.
check-pstl is an alternative but as far as I can tell, check-all does
not include this so we have not been running the tests in CI anyway.
When I tried to run check-pstl locally I got a lot of compiler errors
but have not found any instructions on how to setup a correct build
environment. Even if such instructions exist, it's probably more than we
want to do in CI.
According to Louis Dionne, the project is probably not active. So if
it's ever revived it'll be up to the new contributors to enable testing.
Instead of "check-all" which leads to us running some tests twice if
there are other "check-..." targets. For example on one of my PRs this
script produced:
```
commands:
- './.ci/monolithic-linux.sh "clang;clang;lld;clang-tools-extra;compiler-rt;llvm" "check-all check-clang check-clang-tools" "libcxx;libcxxabi;libunwind" "check-cxx check-cxxabi check-unwind"'
commands:
- 'C:\BuildTools\Common7\Tools\VsDevCmd.bat -arch=amd64 -host_arch=amd64'
- 'bash .ci/monolithic-windows.sh "clang;clang-tools-extra;llvm" "check-clang check-clang-tools"'
```
Which meant that Linux ran the clang and clang-tools tests twice. These
extra tests were about 24% of the test run and increased testing time
(on my local machine) by 45%.
This problem can also happen with other projects but there isn't a
simple fix like this one at the moment.
* pstl has a check-pstl target but it is not part of check-all and when
I tried it locally I couldn't build it.
* libclc has no check- target.
I will deal with those projects later.
This is a temporary measure to alleviate Linux pre-commit CI waiting
times that started snowballing
[recently](https://discourse.llvm.org/t/long-wait-for-linux-presubmit-testing/79547/5).
My [initial
estimate](https://github.com/llvm/llvm-project/pull/94208#issuecomment-2155972973)
of 4 additional minutes spent per built seems to be in the right
ballpark, but looks like that was the last straw to break camel's back.
It seems that CI load got past the tipping point, and now it's not able
to burn through the queue over the night on workdays.
I don't intend to overthrow the consensus we reached in #94208, but it
shouldn't come at the expense of the whole LLVM community. I'll enable
this back as soon as we have news that we got more capacity for Linux
pre-commit CI.
This patch removes LLDB from a list of projects that are excluded from
building and testing on pre-merge CI on Linux.
Windows environment needs to be prepared in order to test LLDB
(https://github.com/llvm/llvm-project/pull/94208#issuecomment-2146256857),
but we don't have enough maintenance resources to do that at the moment.
Because LLDB has been in the list of projects that need to be tested on
Clang changes, this PR make this happen on Linux. This seems to be the
consensus in the discussion of this PR.
Flang triggers some OOM on Windows CI right now. This is disruptive to
MLIR and LLVM changes that don't touch Flang, as such we disable
building Flang on Windows only for these PR that don't touch flang. The
testing on Linux is unchanged, and the post-merge Windows testing is
still fully covering here.
Following the discussion in
https://github.com/llvm/llvm-project/pull/93233#issuecomment-2127920882,
this patch merges `clang-ci` pipeline into main `GitHub Pull Requests`
pipeline. `clang-ci` enables additional test coverage for Clang by
compiling it, and then using it to compile and test libc++, libc++abi,
and libunwind in C++03, C++26, and Clang Modules modes.
Additional work we skip and total time savings we should see:
1. Checking out the repo to generate the clang-ci pipeline (2 minutes)
2. Building Clang (3.5 minutes)
3. Uploading the artifacts once, then downloading them 3 times and
unpacking 3 times (0.5 minutes)
Note that because previously-split jobs for each mode are now under a
single Linux job, it now takes around 8 minutes more see the Linux CI
results despite total time savings.
The primary goal of this patch is to reduce the load of CI by removing
duplicated work. I consider this goal achieved. I could keep the job
parallelism we had (3 libc++ jobs depending on a main Linux job), but I
don't consider it worth the effort and opportunity cost, because
parallelism is not helping once the pool of builders is fully
subscribed.