llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-04-17 16:56:37 +00:00

Author	SHA1	Message	Date
Nathan Gauër	5d50af3f03	Revert "[CI] Extend metrics container to log BuildKite metrics" (#130770 ) Reverts llvm/llvm-project#129699	2025-03-11 14:15:44 +01:00
Nathan Gauër	3df8be3ee9	[CI] Extend metrics container to log BuildKite metrics (#129699 ) The current container focuses on Github metrics. Before deprecating BuildKite, we want to make sure the new infra quality is better, or at least the same. Being able to compare buildkite metrics with github metrics on grafana will allow us to easily present the comparison. This PR requires https://github.com/llvm/llvm-zorg/pull/400 to be merged first.	2025-03-11 14:11:07 +01:00
Aiden Grossman	cef6dbbe54	[CI] Add Logging for Workflow Jobs This patch adds some logging information for individual workflow jobs inside the metrics container. This is mainly intended for debugging why we seem to be missing metrics from some workflows within Grafana.	2025-03-01 03:06:57 +00:00
Aiden Grossman	3c518940b0	[CI] Make Metrics Container Use Python Logging This patch makes the metrics container use the python logging library. This is more of what we want given we're essentially just logging the status of things. It also means we do not have to explicitly specify an output file and lets us control verbosity a bit more cleanly.	2025-03-01 03:03:24 +00:00
Aiden Grossman	b24e14093d	[CI] Keep Track of Workflow Name Instead of Job Name The metrics script includes some logic to only read look at workflows up to the most recent workflow it has seen previously. This was broken in a previous patch when workflow metrics began to be emitted per job. The logic ending the metrics gathering would never trigger, so we would continually fetch more and more workflows until OOM.	2025-02-15 06:16:08 +00:00
Aiden Grossman	d7b89b0dca	[CI] Do Not Consider a Job Failed if Steps Were Skipped This patch makes it so that skipped steps do not cause a job to be considered failed. The windows premerge jobs currently skip the build/test step if there are no projects to build/test. These show up as failures in the dashboard even though everything executed perfectly fine. Reviewers: lnihlen, Keenuts Reviewed By: lnihlen Pull Request: https://github.com/llvm/llvm-project/pull/127279	2025-02-14 19:14:56 -08:00
Aiden Grossman	97d2cfeab3	[CI] Try Moving Github Object Into Loop Currently the metrics container is crashing reasonably often with incomplete read/connection broken errors. Try moving the creation of the Github Object into the main loop to see if recreating the object that maybe handles some connection state fixes the issue. Reviewers: Keenuts, lnihlen Reviewed By: lnihlen Pull Request: https://github.com/llvm/llvm-project/pull/127276	2025-02-14 19:12:16 -08:00
Aiden Grossman	4aeb2f1c79	[CI] Remove Duplicate Heartbeat in Metrics Script This patch removes an extra heartbeat metric in the metrics python file. Before it was performed twice, once in the main function, and once in the get_sampled_workflow_metrics function. We only need one to keep everything happy, and I've chosen to keep the one in get_sampled_workflow_metrics as it seems a more appropriate place to keep it. Reviewers: Keenuts, lnihlen Reviewed By: lnihlen Pull Request: https://github.com/llvm/llvm-project/pull/127275	2025-02-14 19:10:51 -08:00
Aiden Grossman	2d878ccf54	[CI] Track Queue/In Progress Metrics By Job Rather Than Workflow This patch makes it so that the metrics container counts the number of in progress and queued jobs at the job level rather than at the workflow level. This helps us distinguish windows versus linux load and also lets us filter out the MacOS jobs that only run in the release branch. Reviewers: Keenuts, lnihlen Reviewed By: lnihlen Pull Request: https://github.com/llvm/llvm-project/pull/127274	2025-02-14 19:08:45 -08:00
Aiden Grossman	7f24b9acd1	[CI] Support multiple jobs in metrics container (#124457 ) This patch makes it so that the metrics script can support multiple jobs in a single workflow. This is needed so that we do not crash on an assertion now that the windows job has been enabled within the premerge workflow.	2025-01-27 17:05:05 +01:00
Aiden Grossman	280c7d7198	[CI] Increase Configurability of Monolithic Windows Build (#124328 ) This patch makes it so that the caller of monolithic-windows.sh can set the maximum number of parallel compile/link jobs in an environment variable rather than manually specifying it inside of the CMake. Additionally, the env variable definitions for CC, CXX, and LD are sunk into the shell script due to those config options being pretty inherent to what the pipeline is testing. This is intended to make things more flexible/useable for the new premerge CI pipeline, particularly as we are looking at using larger runners and want the increased flexibility to experiment.	2025-01-24 15:37:36 -08:00
Nathan Gauër	13b44283e9	[CI] Add queue size, running count metrics (#122714 ) This commits allows the container to report 3 additional metrics at every sampling event: - a heartbeat - the size of the workflow queue (filtered) - the number of running workflows (filtered) The heartbeat is a simple metric allowing us to monitor the metrics health. Before this commit, a new metrics was pushed only when a workflow was completed. This meant we had to wait a few hours before noticing if the metrics container was unable to push metrics. In addition to this, this commits adds a sampling of the workflow queue size and running count. This should allow us to better understand the load, and improve the autoscale values we pick for the cluster. --------- Signed-off-by: Nathan Gauër <brioche@google.com>	2025-01-16 11:41:49 +01:00
Nathan Gauër	05f9cdd58d	[CI] Remove Check Clang Format from watched workflows (#122740 ) This was useful to test metrics before we had an actual workflow, now it generates noise. Signed-off-by: Nathan Gauër <brioche@google.com>	2025-01-14 11:09:48 +01:00
David Spickett	1b199d1990	[ci] Handle the case where all reported tests pass but the build is still a failure (#120264 ) In this build: https://buildkite.com/llvm-project/github-pull-requests/builds/126961 The builds actually failed, probably because prerequisite of a test suite failed to build. However they still ran other tests and all those passed. This meant that the test reports were green even though the build was red. On some level this is technically correct, but it is very misleading in practice. So I've also passed the build script's return code, as it was when we entered the on exit handler, to the generator, so that when this happens again, the report will draw the viewer's attention to the overall failure. There will be a link in the report to the build's log file, so the next step to investigate is clear. It would be nice to say "tests failed and there was some other build error", but we cannot tell what the non-zero return code was caused by. Could be either. The script handles the following situations now: \| Have Result Files? \| Tests reported failed? \| Return code \| Report \| \|--------------------\|------------------------\|-------------\|-----------------------------------------------------------------------------\| \| Yes \| No \| 0 \| Success style report. \| \| Yes \| Yes \| 0 \| Shouldn't happen, but if it did, failure style report showing the failures. \| \| Yes \| No \| 1 \| Failure style report, showing no failures but noting that the build failed. \| \| Yes \| Yes \| 1 \| Failure style report, showing the test failures. \| \| No \| ? \| 0 \| No test report, success shown in the normal build display. \| \| No \| ? \| 1 \| No test report, failure shown in the normal build display. \|	2025-01-13 09:05:18 +00:00
Aiden Grossman	eabf9313d4	[CI] Detect step failures in metrics job (#122564 ) This patch makes the metrics job also detect failures in individual steps. This is necessary now that we are setting continue-on-error in the premerge jobs to prevent sending out unnecessary email to detect what jobs actually fail.	2025-01-11 14:04:03 -08:00
Nathan Gauër	3bcfa1a579	[Github] Add LLVM Premerge Checks to the watchlist (#120230 ) LLVM Premerge Checks is running on the new GCP cluster. Tracking its metrics will allow us to determine the stability of the presubmit and make sure the new infra is working as intended. --------- Signed-off-by: Nathan Gauër <brioche@google.com>	2024-12-18 09:58:56 +01:00
Aiden Grossman	a24645463b	[CI] Only upload test results if buildkite-agent is present (#119954 ) This patch modifies the monolithic shell scrips to only run if the buildkite-agent application is present. This allows for running the scripts to completion outside of buildkite (eg inside of a GHA pipeline).	2024-12-16 01:01:05 -08:00
Aiden Grossman	d6cc140dfd	[CI] Refactor common functionality into separate script (#119530 ) This patch refactors some common functionality present in the CI scripts to a separate shell script. This is mainly intended to make it easier to reuse this functionality inside of a Github Actions pipeline as we make the switch.	2024-12-13 01:20:02 -08:00
David Spickett	71fd5288d2	[ci] Include a log download link when test report is truncated (#117985 ) Now "Download" will be a link to the file so people don't have to know to open the build tab and find the download button. This is a URL from a real build: https://buildkite.com/organizations/llvm-project/pipelines/github-pull-requests/builds/123979/jobs/01937132-0fc3-4c95-a884-2fc0048cb9a7/download.txt And this is how we can build it: https://buildkite.com/organizations/{BUILDKITE_ORGANIZATION_SLUG}/pipelines/{BUILDKITE_PIPELINE_SLUG}/builds/{BUILDKITE_BUILD_NUMBER}/jobs/{BUILDKITE_JOB_ID}/download.txt Given these env vars that were set in that job: BUILDKITE_ORGANIZATION_SLUG="llvm-project" BUILDKITE_PIPELINE_SLUG="github-pull-requests" BUILDKITE_BUILD_NUMBER="123979" BUILDKITE_JOB_ID="01937132-0fc3-4c95-a884-2fc0048cb9a7" In theory these will always be available but: 1. Rather safe than sorry with this script, I don't want to make a passing build a failure because this script failed. 2. It would get very annoying if you had to set all these to test the script locally.	2024-12-11 09:46:34 +00:00
Aiden Grossman	77c2b00553	[CI] Upstream metrics script and container definition (#117461 ) This patch includes the script that pulls information from Github and pushes it to Grafana. This is currently running in the cluster and pushes information to https://llvm.grafana.net/public-dashboards/6a1c1969b6794e0a8ee5d494c72ce2cd. This script is designed to accept other jobs relatively easily and can be easily modified to look at other metrics.	2024-11-29 11:15:44 -08:00
David Spickett	3b8426d340	[ci] Fix unit tests for test report generator Last time I fixed a bug here I forgot to update them.	2024-11-28 09:26:30 +00:00
David Spickett	6a12b43ac0	[ci] Fix error when no junit files are passed to report generator This resulted in the style being None and despite the report being empty as well, we tried to send it to the agent and Python can't send None as an argument. To fix this return "success" style and also check whether the report has any content before calling the agent.	2024-11-18 09:08:41 +00:00
David Spickett	889b3c9487	Reland "[ci] New script to generate test reports as Buildkite Annotations (#113447 )" This reverts commit 8a1ca6cad9cd0e972c322910cdfbbe9552c6c7ca. I have fixed 2 things: * The report is now sent by stdin so we do not hit the limit on the size of command line arguments. * The report is limited to 1MB in size and if we exceed that we fall back to listing only the totals with a note telling you to check the full log.	2024-11-13 10:39:57 +00:00
David Spickett	8a1ca6cad9	Revert "[ci] New script to generate test reports as Buildkite Annotations (#113447 )" This reverts commit e74a002433b4cf7f891ceedb61bd862867218a8b. As it is failing on Linux with "OSError: [Errno 7] Argument list too long: 'buildkite-agent'".	2024-11-12 16:29:55 +00:00
David Spickett	e74a002433	[ci] New script to generate test reports as Buildkite Annotations (#113447 ) The CI builds now send the results of every lit run to a unique file. This means we can read them all to make a combined report for all tests. This report will be shown as an "annotation" in the build results: https://buildkite.com/docs/agent/v3/cli-annotate#creating-an-annotation Here is an example: https://buildkite.com/llvm-project/github-pull-requests/builds/112660 (make sure it is showing "All" instead of "Failures") This is an alternative to using the existing Buildkite plugin: https://github.com/buildkite-plugins/junit-annotate-buildkite-plugin As the plugin is: * Specific to Buildkite, and we may move away from Buildkite. * Requires docker, unless we were to fork it ourselves. * Does not let you customise the report format unless again, we make our own fork. Annotations use GitHub's flavour of Markdown so the main code in the script generates that text. There is an extra "style" argument generated to make the formatting nicer in Buildkite. "context" is the name of the annotation that will be created. By using different context names for Linux and Windows results we get 2 separate annotations. The script also handles calling the buildkite-agent. This makes passing extra arguments to the agent easier, rather than piping the output of this script into the agent. In the future we can remove the agent part of it and simply use the report content. Either printed to stdout or as a comment on the GitHub PR.	2024-11-12 13:34:47 +00:00
David Spickett	f539d92dca	[ci] Write test results to unique file names (#113160 ) In this patch I'm using a new lit option so that the pipeline writes many results files, one for each time lit is run: ``` --use-unique-output-file-name When enabled, lit will add a unique element to the output file name, before the extension. For example "results.xml" will become "results.<something>.xml". The "<something>" is not ordered in any way and is chosen so that existing files are not overwritten. [Default: Off] ``` (I added this to lit recently) Alternatives were considered: * mkfifo - does not work on bash for Windows. * tail -f - does not print full content on file truncation * lit wrapper script - more complication than using an option to lit itself * ninja/mv file/ninja/mv file etc - lots of changes needed to make the scripts build each target separately And after feedback I decided that using an option to lit itself is the cleanest way to go. It can be removed when we no longer need it. If I run the Linux build after this change: ``` $ bash ./.ci/monolithic-linux.sh "clang;lldb;lld" "check-lldb-shell check-lld" "libcxx;libcxxabi" "check-libcxx check-libcxxabi" ``` I get multiple test result files. In my case some tests fail so runtimes aren't checked, but all projects are so there is 1 file for lldb and one for lld: ``` $ ls build/*.xml build/test-results.klc82utf.xml build/test-results.majylh73.xml ``` This change just collects the XML files as artifacts. Once I know that's working, I can set up test reporting to make a summary of them.	2024-11-12 13:24:44 +00:00
David Spickett	90149204bd	[ci] Don't add check-all target when pstl project is enabled (#111803 ) Fixes #110265 Adding check-all causes us to run some tests twice if a project specific target like check-clang is also added. check-pstl is an alternative but as far as I can tell, check-all does not include this so we have not been running the tests in CI anyway. When I tried to run check-pstl locally I got a lot of compiler errors but have not found any instructions on how to setup a correct build environment. Even if such instructions exist, it's probably more than we want to do in CI. According to Louis Dionne, the project is probably not active. So if it's ever revived it'll be up to the new contributors to enable testing.	2024-10-10 14:26:46 +01:00
David Spickett	10008f731d	[ci] Don't add a testing target for libclc (#111547 ) According to https://github.com/llvm/llvm-project/pull/111369#issuecomment-2400152471 there is no testing to be done here. Adding "check-all" only risks duplicating tests if other project specific "check-" targets are also added.	2024-10-09 09:16:37 +01:00
David Spickett	5be1024ea7	[ci] Use check-compiler-rt target for testing compiler-rt (#111515 ) Instead of "check-all" which leads to us running some tests twice if there are other "check-..." targets. For example on one of my PRs this script produced: ``` commands: - './.ci/monolithic-linux.sh "clang;clang;lld;clang-tools-extra;compiler-rt;llvm" "check-all check-clang check-clang-tools" "libcxx;libcxxabi;libunwind" "check-cxx check-cxxabi check-unwind"' commands: - 'C:\BuildTools\Common7\Tools\VsDevCmd.bat -arch=amd64 -host_arch=amd64' - 'bash .ci/monolithic-windows.sh "clang;clang-tools-extra;llvm" "check-clang check-clang-tools"' ``` Which meant that Linux ran the clang and clang-tools tests twice. These extra tests were about 24% of the test run and increased testing time (on my local machine) by 45%. This problem can also happen with other projects but there isn't a simple fix like this one at the moment. * pstl has a check-pstl target but it is not part of check-all and when I tried it locally I couldn't build it. * libclc has no check- target. I will deal with those projects later.	2024-10-09 09:15:56 +01:00
Vlad Serebrennikov	a4f6b7dfa4	[lldb] Stop testing LLDB on Clang changes in pre-commit CI (#95537 ) This is a temporary measure to alleviate Linux pre-commit CI waiting times that started snowballing [recently](https://discourse.llvm.org/t/long-wait-for-linux-presubmit-testing/79547/5). My [initial estimate](https://github.com/llvm/llvm-project/pull/94208#issuecomment-2155972973) of 4 additional minutes spent per built seems to be in the right ballpark, but looks like that was the last straw to break camel's back. It seems that CI load got past the tipping point, and now it's not able to burn through the queue over the night on workdays. I don't intend to overthrow the consensus we reached in #94208, but it shouldn't come at the expense of the whole LLVM community. I'll enable this back as soon as we have news that we got more capacity for Linux pre-commit CI.	2024-06-14 20:33:38 +04:00
Vlad Serebrennikov	d4eed43bad	Enable LLDB tests in Linux pre-merge CI (#94208 ) This patch removes LLDB from a list of projects that are excluded from building and testing on pre-merge CI on Linux. Windows environment needs to be prepared in order to test LLDB (https://github.com/llvm/llvm-project/pull/94208#issuecomment-2146256857), but we don't have enough maintenance resources to do that at the moment. Because LLDB has been in the list of projects that need to be tested on Clang changes, this PR make this happen on Linux. This seems to be the consensus in the discussion of this PR.	2024-06-08 16:23:17 +04:00
Mehdi Amini	49ef21d767	Remove debug print from CI generation script (NFC)	2024-05-29 22:02:30 -07:00
Mehdi Amini	e4b424afc4	[CI] Disable Flang from pre-commit tests when Flang files are not touched on Windows Only (#93729 ) Flang triggers some OOM on Windows CI right now. This is disruptive to MLIR and LLVM changes that don't touch Flang, as such we disable building Flang on Windows only for these PR that don't touch flang. The testing on Linux is unchanged, and the post-merge Windows testing is still fully covering here.	2024-05-29 16:27:06 -06:00
Lucile Rose Nihlen	d9dec10937	[ci] limit parallel windows compile jobs to 24 (#93329 ) This is an experiment to see if we can prevent some of the compiler OOMs happening without unduly impacting the Windows build latency.	2024-05-28 19:53:21 +00:00
Vlad Serebrennikov	1de1ee9cba	[clang][ci] Move libc++ testing into the main PR pipeline (#93318 ) Following the discussion in https://github.com/llvm/llvm-project/pull/93233#issuecomment-2127920882, this patch merges `clang-ci` pipeline into main `GitHub Pull Requests` pipeline. `clang-ci` enables additional test coverage for Clang by compiling it, and then using it to compile and test libc++, libc++abi, and libunwind in C++03, C++26, and Clang Modules modes. Additional work we skip and total time savings we should see: 1. Checking out the repo to generate the clang-ci pipeline (2 minutes) 2. Building Clang (3.5 minutes) 3. Uploading the artifacts once, then downloading them 3 times and unpacking 3 times (0.5 minutes) Note that because previously-split jobs for each mode are now under a single Linux job, it now takes around 8 minutes more see the Linux CI results despite total time savings. The primary goal of this patch is to reduce the load of CI by removing duplicated work. I consider this goal achieved. I could keep the job parallelism we had (3 libc++ jobs depending on a main Linux job), but I don't consider it worth the effort and opportunity cost, because parallelism is not helping once the pool of builders is fully subscribed.	2024-05-28 02:25:15 +04:00
Vlad Serebrennikov	243611ed4c	Disable compiling and testing Flang on Clang changes (#92740 ) This patch aims to rectify the Windows CI situation by decoupling Clang changes from Flang test suite, which is causing Windows CI to "pause" for 20 minutes (details can be found [here](https://discourse.llvm.org/t/flang-tests-are-extremely-slow-on-windows/78591/11)). This even seems desirable in the long run, because it was highlighted that the only part of Clang that Flang depends on is Driver ([Discourse post](https://discourse.llvm.org/t/flang-tests-are-extremely-slow-on-windows/78591/14)). Importantly, this patch leaves the question of _entirely_ disabling Flang tests on Windows CI out of scope.	2024-05-22 00:14:45 +04:00
Amir Ayupov	ced8497970	[ci] Add clang project dependency for bolt testing (#90262 )	2024-04-26 22:06:24 +02:00
Amir Ayupov	59bfc31068	[CI] Use trunk Clang in BOLT testing	2024-04-25 20:10:37 -07:00
Fraser Cormack	d0af554464	[CI] Fix libclc dependencies We need clang and llvm to build in-tree.	2024-04-18 07:01:13 +01:00
Marc Auberer	64f0410193	[CI] Hotfix: CI runs failing due to target escaping (#86897 ) My patch #86877 contains a mistake. Should have read the comment. Recent buildkite runs fail because of this, so it is a bit urgent.	2024-03-28 02:03:24 +01:00
Marc Auberer	0a17eedf7b	[CI][NFC] Fix shellcheck warnings in CI scripts (#86877 ) This fixes all shellcheck warnings we have in `monolithic-linux.sh` and `monolithic-windows.sh`. All of them have to do with [SC2086](https://www.shellcheck.net/wiki/SC2086) - Double quote to prevent globbing and word splitting.	2024-03-27 23:53:25 +01:00
Mehdi Amini	d35f944dde	Add missing clang to the monolithic pre-merge build (#85354 ) Clang has a custom separate pipeline integrated with libc++ that only runs in release mode. It means that changes which touches only clang won't run the clang tests in the configuration used by LLVM premerge and will break it unknowingly.	2024-03-14 22:06:45 -07:00
Connor Sughrue	a950c06d98	[CI] Run pre-merge build with -k 0 placed after "${BUILD_DIR}" (#84846 ) #84828 added `-k 0` to pre-merge CI so that if one job fails the others would continue building. This pull request fixes the location of `-k 0` in the ninja command line. Resolves #84842 and #83371	2024-03-11 18:41:50 -04:00
Mehdi Amini	65fd664daf	Run pre-merge build with -k 0 to ensure all tests runs (#84828 ) The -k option allows to continue the build after failures as much as possible. This is useful here because when we run > ninja check-llvm check-clang we would like the clang tests to run even if there is a failure in a llvm tests. The downside is that a build failure in one file that would prevent from running any test does not prevent from building more targets, wasting build resources potentially. Fixes #83371	2024-03-11 14:00:03 -07:00
Lucile Rose Nihlen	cd4e246616	repair and re-enable Windows buildkite presubmit (#82393 )	2024-02-20 15:30:38 -05:00
Tom Stellard	4ad9f5be83	ci: Temporarily disable the buildkite job on Windows (#81538 ) The failure rate is too high. See https://discourse.llvm.org/t/rfc-future-of-windows-pre-commit-ci/76840	2024-02-13 07:45:55 -08:00
Louis Dionne	5aad789481	[ci] Diff against origin/BASE-BRANCH Otherwise, when the base branch is not something that the CI runner has checked out, that reference to e.g. release/18.x is ambiguous.	2024-01-25 16:48:08 -05:00
Louis Dionne	3b76289182	[ci] Fix the base branch we use to determine changes (#79503 ) We should diff against the base branch, not always against `main`. This allows the BuildKite pre-commit CI to work properly when we target other branches, such as `release/18.x`.	2024-01-25 16:38:53 -05:00
Louis Dionne	5e894771d9	[ci] Remove unused generate-buildkite-pipeline-scheduled script (#79320 ) The "scheduled build" pipeline on BuildKite had been disabled for months and doesn't exist anymore, so this script is effectively dead code. When we set up a cron-activated build again, we should do it using Github actions (which could trigger a BK pipeline if needed). Keeping this script around just creates additional confusion about what's used and what's not used for doing CI.	2024-01-24 17:58:03 +01:00
Louis Dionne	ca8605a78b	[ci] Remove bits that are unused since we stopped using Phabricator	2024-01-24 10:46:34 -05:00

1 2

64 Commits