llvm-project

mirror of https://github.com/llvm/llvm-project.git synced 2025-04-14 19:16:24 +00:00

Author	SHA1	Message	Date
Aiden Grossman	dbeb7c1bbb	[Github][CI] Upload .ninja_log as an artifact This enables using tools like https://github.com/nico/ninjatracing for performance introspection. Reviewers: mizvekov, lnihlen, tstellar, Endilll, Keenuts Reviewed By: Keenuts Pull Request: https://github.com/llvm/llvm-project/pull/135539	2025-04-14 16:28:50 +02:00
David Spickett	d9cfd90524	[ci] Improve wording in CI test reports We weren't saying where to click, make it clear you click on a test name.	2025-04-10 09:20:13 +00:00
Nathan Gauër	fe4f666363	[CI] Always upload queue/running count (#134814 ) Before this commit, we only pushed a queue/running count when the value was not zero. This makes building Grafana alerting a bit harder. Changing this to always upload a value for watched workflows.	2025-04-08 11:16:24 +02:00
Aiden Grossman	582b1b2ac9	[CI] Use env variable to enable pip breaking system packages This patch uses an env variable instead of the --break-system-packages flag. This enables the heterogenous configuration between the old and new premerge systems as the old premerge container does not recognize the --break-system-packages flag. An env variable will work on new premerge and have no impact on old premerge.	2025-04-05 20:04:04 +00:00
Aiden Grossman	fb96d5171e	Reapply "[CI] Fix Monolithic Linux Build in Ubuntu 24.04 (#133628 )" This reverts commit d72be157823d41e7eaf457cc37ea99c07431a25c. Now that the container version got bumped, we need to reland this.	2025-04-05 07:24:36 +00:00
Aiden Grossman	d72be15782	Revert "[CI] Fix Monolithic Linux Build in Ubuntu 24.04 (#133628 )" This reverts commit 23fb048ce35f672d8db3f466a2522354bbce66e5. This broke the new premerge system as it appears the pip installations within the CI image do not support this option. Buildkite was unaffected.	2025-04-01 23:43:35 +00:00
Aiden Grossman	ce296f1eba	[CI] Exclude gn changes from running premerge (#133623 ) These changes are mostly pushed by the gnsyncbot directly to main and thus don't go through a PR, but we still test on main to see if main is broken. Given these touch llvm/, they end up burning a decent amount of testing time for no real benefit, so I think it makes sense to exclude them from premerge testing explicitly.	2025-04-01 12:58:16 -07:00
Aiden Grossman	23fb048ce3	[CI] Fix Monolithic Linux Build in Ubuntu 24.04 (#133628 ) This patch fixes the monolithic linux build in Ubuntu 24.04. Newer versions of debian/ubuntu pass a warning when installing packages at the system level using pip as it interferes with system package manager installed python packages. We do not use any system package manager installed python packages, so we just ignore the warning (that is an error without passing the flag) by passing the --break-system-packages flag.	2025-04-01 12:55:07 -07:00
Aiden Grossman	41c906fe2b	[CI] Add rich build information for github workflows This patch adds rich test failure information to the Github output, using the same library that is used for the buildkite pipeline. Eventually I think we want to add more information like reproduction information using the containers, but that is very divergent between Github and Buildkite, so we probably want to wait until we've switched over before doing that. Reviewers: Keenuts, tstellar, lnihlen, DavidSpickett Reviewed By: DavidSpickett, Keenuts Pull Request: https://github.com/llvm/llvm-project/pull/133197	2025-03-28 23:48:20 -07:00
Aiden Grossman	21eeca3db0	[CI] Exclude docs directories from triggering rebuilds Currently when someone touches a docs directory in a subproject, it is treated as if the source code of that project got touched, so the project is built, it is tested, and the same for all of its enumerated dependents. This is wasteful, particularly for patches just touching docs in places like LLVM where we might spend an hour of node time to do nothing useful given changes in the docs shouldn't cause test failures and there is already another workflow that tests the documentation build completes successfully. Reviewers: Keenuts, tstellar, lnihlen Reviewed By: tstellar Pull Request: https://github.com/llvm/llvm-project/pull/133185	2025-03-28 22:30:41 -07:00
Aiden Grossman	34d858635f	[CI] Move CI over to new project computation script This patch migrates the CI over to the new compute_projects.py script for calculating what projects need to be tested based on a change to LLVM. Reviewers: lnihlen, ldionne, tstellar, Endilll, joker-eph, Keenuts Reviewed By: Keenuts, tstellar Pull Request: https://github.com/llvm/llvm-project/pull/132642	2025-03-28 22:25:52 -07:00
Aiden Grossman	2fb53f59c1	[CI] Refactor generate_test_report script This patch refactors the generate_test_report script, namely turning it into a proper library, and pulling the script/unittests out into separate files, as is standard with most python scripts. The main purpose of this is to enable reusing the library for the new Github premerge. Reviewers: tstellar, DavidSpickett, Keenuts, lnihlen Reviewed By: DavidSpickett Pull Request: https://github.com/llvm/llvm-project/pull/133196	2025-03-27 12:59:43 -07:00
Aiden Grossman	2ca27e7c3e	[CI] Fix typo in compute_projects_test.py I apparently forgot to properly name the test before submitting the last patch. This patch properly names the test.	2025-03-27 01:51:13 +00:00
Aiden Grossman	7da71a6b71	[CI] Exclude runtimes from being tested as projects Before this patch, making a change to a runtime directory (like libcxx) would cause the project to be added to the LLVM_ENABLE_PROJECTS CMake flag which is illegal as they can only be built as part of LLVM_ENABLE_RUNTIMES. This patch fixes that behavior. Test added.	2025-03-26 23:59:41 +00:00
Aiden Grossman	9224165871	[CI] Add Python Script for Computing Projects/Runtimes to Test This patch adds a python script, compute_projects, and associated unit tests for computing the projects and runtimes that need to be tested in premerge. Rewriting in Python opens up a couple new improvements/opportunities: 1. I personally find python to be much easier to work with than shell scripts for tasks like this. Particularly it becomes a lot easier to work with paths with proper array support. 2. Unit testing becomes easier which makes it a lot easier to reason about behavior changes, especially in review. 3. Most of the configuration is now setup in some dictionaries, which makes changes much easier to apply for most of the common changes. This preserves the behavior of the existing premerge scripts as much as possible. Reviewers: ldionne, lnihlen, Endilll, tstellar, Keenuts Reviewed By: Keenuts Pull Request: https://github.com/llvm/llvm-project/pull/132634	2025-03-26 12:37:16 -07:00
Vlad Serebrennikov	8f863fcd77	[clang][CI] Reuse build dir between C++26 and modules build of runtimes (#132480 ) Between C++26 and Clang modules build of runtimes, which are used as an additional testing for Clang, the only difference are `LIBCXX_TEST_PARAMS` and `LIBCXXABI_TEST_PARAMS`. Both of them are transformed into actual lit configuration lines, and put into `SERIALIZED_LIT_PARAMS`, which end up in `libcxx/test/cmake-bridge.cfg` via `configure_file` command. Notably, it seems that they are not used in any other way. I checked that if those variables are changed, subsequent runs of CMake configuration step regenerate `cmake-bridge.cfg` with the new values. Which means that we don't need to do clean builds for every runtimes configuration we want to test. I hope that together with #131913, this will be enough to alleviate Linux CI pains we're having, and we wouldn't have to make a tough choice between C++26 and Clang modules builds for pre-merge CI.	2025-03-25 19:38:33 +04:00
Aiden Grossman	052a4b54a7	[CI] Clean up runtimes builds (#131913 ) This patch cleans up the runtimes build in premerge due to queuing delays, dropping the C++03 testing, but keeping the C++20 and Modules configurations as they are deemed important by clang contributors. This patch also makes it easier in the future when we need to rework the runtimes build to anticipate the deprecation of building most of the runtimes with LLVM_ENABLE_PROJECTS.	2025-03-21 12:39:47 -07:00
Nathan Gauër	77edfbb96c	[CI] Don't count canceled buildkite builds (#132015 ) We don't count canceled jobs on GCP, so we shouldn't count canceled jobs on Buildkite neither. Signed-off-by: Nathan Gauër <brioche@google.com>	2025-03-21 10:14:44 +01:00
Aiden Grossman	43c21f96a7	Revert "[Premerge] Add flang-rt (#128678 )" (#131915 ) This reverts commit 95d28fe503cc3d2bc0bb980442d3defaf199ea5a. I did not fully realize the implications of this change when reviewing. With how it is set up currently, it causes clang and all of the runtimes to be built and tested everytime a change to MLIR is made. This is a large regression in build/test time, which seems to have been causing large queueing delays. Reverting for now. Once we rework the runtimes build for premerge (which I hope to do soon, ideally in the next week), I will make sure flang-rt gets added in.	2025-03-18 15:52:59 -07:00
Aiden Grossman	0619892cab	[CI] Bump max workflow to process count in metrics This patch bumps the maximum number of metrics to look through when collecting metrics data. We are currently running into issues where we are losing data due to the most recent 1000 workflows not containing the workflows that we actually need to query. Just double it for now. I plan on monitoring this reasonably closely to ensure we do not run into issues, mainly API rate limits.	2025-03-18 19:34:57 +00:00
Nathan Gauër	05df923b0e	[CI] Add dateutil dependency to the metrics container (#131333 )	2025-03-14 14:45:44 +01:00
Nathan Gauër	44f4e43b4f	[CI] Extend metrics container to log BuildKite metrics (#130996 ) The current container focuses on Github metrics. Before deprecating BuildKite, we want to make sure the new infra quality is better, or at least the same. Being able to compare buildkite metrics with github metrics on grafana will allow us to easily present the comparison. BuildKite API allows filtering, but doesn't allow changing the result ordering. Meaning we are left with builds ordered by IDs. This means a completed job can appear before a running job in the list. 2 solutions from there: - keep the cursor on the oldest running workflow - keep a list of running workflows to compare. Because there is no guarantees in workflow ordering, waiting for the oldest build to complete before reporting any newer build could mean delaying the more recent build completion reporting by a few hours. And because grafana cannot ingest metrics older than 2 hours, this is not an option. Thus we leave with the second solution: remember what jobs were running during the last iteration, and record them as soon as they are completed. Buildkite has at most ~100 pending jobs, so keeping all those IDs should be OK.	2025-03-14 11:44:39 +01:00
Michael Kruse	95d28fe503	[Premerge] Add flang-rt (#128678 ) Flang's runtime can now be built using LLVM's LLVM_ENABLE_RUNTIMES mechanism, with the intent to remove the old mechanism in #124126. Update the pre-merge builders to use the new mechanism. In the current form, #124126 actually will add LLVM_ENABLE_RUNTIMES=flang-rt implicitly, so no change is strictly needed. I still think it is a good idea to do it explicitly and in advance. On Windows, flang-rt also requires compiler-rt, but which is not building on Windows anyway.	2025-03-13 12:17:59 +01:00
Nathan Gauër	1282878c52	[CI] Fix bad timestamps being reported (#130941 ) Yesterday, the monitoring reported a job queued for 23h59. After some checks, it appears no such job existed: the age of the workflows on completion was at most 5 hours during the last 48 hours. After some digging, I found out GitHub could return a job with a start date slightly before the creation date, or completion date before start date. This would cause python to compute a negative timedelta, which would then be reported in grafana as a full 24h delta due to the conversions. Adding code to ignore negative delta, but logging them.	2025-03-13 10:18:02 +01:00
Nathan Gauër	389a705b8e	[CI] Rework github workflow processing (#130317 ) Before this patch, the job/workflow name impacted the metric name, meaning a change in the workflow definition could break monitoring. This patch adds a map to get a stable name on metrics from a workflow name. In addition, it reworks a bit how we track the last processed workflow: the github queries are broken if filtering is applied, meaning we have a list of workflow, ordered by 'created_at', which mixes completed & running workflows. We have no guarantees over the order of completion, meaning we cannot stop at the first completed job we found (even per-workflow). This PR processed the last 1000 workflows, but allows an early stop if the created_at time is older than 8 hours. This means we could miss long-running workflows (>8 hours), and if the number of workflows started before another one completes becomes high (>1000), we'll miss it. To detect this kind of behavior, a new metric is added "oldest workflow processed", which should at least indicate if the depth is too small. An alternative without arbitrary cut would be to initially parse all workflows, and then record the last non-completed one we find and always start from the last (moving the lower bound as they complete). But LLVM has forever-queued workflows runs (>1 years), hence this would cause us to iterate over a very large number of jobs. --------- Signed-off-by: Nathan Gauër <brioche@google.com>	2025-03-11 14:16:18 +01:00
Nathan Gauër	5d50af3f03	Revert "[CI] Extend metrics container to log BuildKite metrics" (#130770 ) Reverts llvm/llvm-project#129699	2025-03-11 14:15:44 +01:00
Nathan Gauër	3df8be3ee9	[CI] Extend metrics container to log BuildKite metrics (#129699 ) The current container focuses on Github metrics. Before deprecating BuildKite, we want to make sure the new infra quality is better, or at least the same. Being able to compare buildkite metrics with github metrics on grafana will allow us to easily present the comparison. This PR requires https://github.com/llvm/llvm-zorg/pull/400 to be merged first.	2025-03-11 14:11:07 +01:00
Aiden Grossman	cef6dbbe54	[CI] Add Logging for Workflow Jobs This patch adds some logging information for individual workflow jobs inside the metrics container. This is mainly intended for debugging why we seem to be missing metrics from some workflows within Grafana.	2025-03-01 03:06:57 +00:00
Aiden Grossman	3c518940b0	[CI] Make Metrics Container Use Python Logging This patch makes the metrics container use the python logging library. This is more of what we want given we're essentially just logging the status of things. It also means we do not have to explicitly specify an output file and lets us control verbosity a bit more cleanly.	2025-03-01 03:03:24 +00:00
Aiden Grossman	b24e14093d	[CI] Keep Track of Workflow Name Instead of Job Name The metrics script includes some logic to only read look at workflows up to the most recent workflow it has seen previously. This was broken in a previous patch when workflow metrics began to be emitted per job. The logic ending the metrics gathering would never trigger, so we would continually fetch more and more workflows until OOM.	2025-02-15 06:16:08 +00:00
Aiden Grossman	d7b89b0dca	[CI] Do Not Consider a Job Failed if Steps Were Skipped This patch makes it so that skipped steps do not cause a job to be considered failed. The windows premerge jobs currently skip the build/test step if there are no projects to build/test. These show up as failures in the dashboard even though everything executed perfectly fine. Reviewers: lnihlen, Keenuts Reviewed By: lnihlen Pull Request: https://github.com/llvm/llvm-project/pull/127279	2025-02-14 19:14:56 -08:00
Aiden Grossman	97d2cfeab3	[CI] Try Moving Github Object Into Loop Currently the metrics container is crashing reasonably often with incomplete read/connection broken errors. Try moving the creation of the Github Object into the main loop to see if recreating the object that maybe handles some connection state fixes the issue. Reviewers: Keenuts, lnihlen Reviewed By: lnihlen Pull Request: https://github.com/llvm/llvm-project/pull/127276	2025-02-14 19:12:16 -08:00
Aiden Grossman	4aeb2f1c79	[CI] Remove Duplicate Heartbeat in Metrics Script This patch removes an extra heartbeat metric in the metrics python file. Before it was performed twice, once in the main function, and once in the get_sampled_workflow_metrics function. We only need one to keep everything happy, and I've chosen to keep the one in get_sampled_workflow_metrics as it seems a more appropriate place to keep it. Reviewers: Keenuts, lnihlen Reviewed By: lnihlen Pull Request: https://github.com/llvm/llvm-project/pull/127275	2025-02-14 19:10:51 -08:00
Aiden Grossman	2d878ccf54	[CI] Track Queue/In Progress Metrics By Job Rather Than Workflow This patch makes it so that the metrics container counts the number of in progress and queued jobs at the job level rather than at the workflow level. This helps us distinguish windows versus linux load and also lets us filter out the MacOS jobs that only run in the release branch. Reviewers: Keenuts, lnihlen Reviewed By: lnihlen Pull Request: https://github.com/llvm/llvm-project/pull/127274	2025-02-14 19:08:45 -08:00
Aiden Grossman	7f24b9acd1	[CI] Support multiple jobs in metrics container (#124457 ) This patch makes it so that the metrics script can support multiple jobs in a single workflow. This is needed so that we do not crash on an assertion now that the windows job has been enabled within the premerge workflow.	2025-01-27 17:05:05 +01:00
Aiden Grossman	280c7d7198	[CI] Increase Configurability of Monolithic Windows Build (#124328 ) This patch makes it so that the caller of monolithic-windows.sh can set the maximum number of parallel compile/link jobs in an environment variable rather than manually specifying it inside of the CMake. Additionally, the env variable definitions for CC, CXX, and LD are sunk into the shell script due to those config options being pretty inherent to what the pipeline is testing. This is intended to make things more flexible/useable for the new premerge CI pipeline, particularly as we are looking at using larger runners and want the increased flexibility to experiment.	2025-01-24 15:37:36 -08:00
Nathan Gauër	13b44283e9	[CI] Add queue size, running count metrics (#122714 ) This commits allows the container to report 3 additional metrics at every sampling event: - a heartbeat - the size of the workflow queue (filtered) - the number of running workflows (filtered) The heartbeat is a simple metric allowing us to monitor the metrics health. Before this commit, a new metrics was pushed only when a workflow was completed. This meant we had to wait a few hours before noticing if the metrics container was unable to push metrics. In addition to this, this commits adds a sampling of the workflow queue size and running count. This should allow us to better understand the load, and improve the autoscale values we pick for the cluster. --------- Signed-off-by: Nathan Gauër <brioche@google.com>	2025-01-16 11:41:49 +01:00
Nathan Gauër	05f9cdd58d	[CI] Remove Check Clang Format from watched workflows (#122740 ) This was useful to test metrics before we had an actual workflow, now it generates noise. Signed-off-by: Nathan Gauër <brioche@google.com>	2025-01-14 11:09:48 +01:00
David Spickett	1b199d1990	[ci] Handle the case where all reported tests pass but the build is still a failure (#120264 ) In this build: https://buildkite.com/llvm-project/github-pull-requests/builds/126961 The builds actually failed, probably because prerequisite of a test suite failed to build. However they still ran other tests and all those passed. This meant that the test reports were green even though the build was red. On some level this is technically correct, but it is very misleading in practice. So I've also passed the build script's return code, as it was when we entered the on exit handler, to the generator, so that when this happens again, the report will draw the viewer's attention to the overall failure. There will be a link in the report to the build's log file, so the next step to investigate is clear. It would be nice to say "tests failed and there was some other build error", but we cannot tell what the non-zero return code was caused by. Could be either. The script handles the following situations now: \| Have Result Files? \| Tests reported failed? \| Return code \| Report \| \|--------------------\|------------------------\|-------------\|-----------------------------------------------------------------------------\| \| Yes \| No \| 0 \| Success style report. \| \| Yes \| Yes \| 0 \| Shouldn't happen, but if it did, failure style report showing the failures. \| \| Yes \| No \| 1 \| Failure style report, showing no failures but noting that the build failed. \| \| Yes \| Yes \| 1 \| Failure style report, showing the test failures. \| \| No \| ? \| 0 \| No test report, success shown in the normal build display. \| \| No \| ? \| 1 \| No test report, failure shown in the normal build display. \|	2025-01-13 09:05:18 +00:00
Aiden Grossman	eabf9313d4	[CI] Detect step failures in metrics job (#122564 ) This patch makes the metrics job also detect failures in individual steps. This is necessary now that we are setting continue-on-error in the premerge jobs to prevent sending out unnecessary email to detect what jobs actually fail.	2025-01-11 14:04:03 -08:00
Nathan Gauër	3bcfa1a579	[Github] Add LLVM Premerge Checks to the watchlist (#120230 ) LLVM Premerge Checks is running on the new GCP cluster. Tracking its metrics will allow us to determine the stability of the presubmit and make sure the new infra is working as intended. --------- Signed-off-by: Nathan Gauër <brioche@google.com>	2024-12-18 09:58:56 +01:00
Aiden Grossman	a24645463b	[CI] Only upload test results if buildkite-agent is present (#119954 ) This patch modifies the monolithic shell scrips to only run if the buildkite-agent application is present. This allows for running the scripts to completion outside of buildkite (eg inside of a GHA pipeline).	2024-12-16 01:01:05 -08:00
Aiden Grossman	d6cc140dfd	[CI] Refactor common functionality into separate script (#119530 ) This patch refactors some common functionality present in the CI scripts to a separate shell script. This is mainly intended to make it easier to reuse this functionality inside of a Github Actions pipeline as we make the switch.	2024-12-13 01:20:02 -08:00
David Spickett	71fd5288d2	[ci] Include a log download link when test report is truncated (#117985 ) Now "Download" will be a link to the file so people don't have to know to open the build tab and find the download button. This is a URL from a real build: https://buildkite.com/organizations/llvm-project/pipelines/github-pull-requests/builds/123979/jobs/01937132-0fc3-4c95-a884-2fc0048cb9a7/download.txt And this is how we can build it: https://buildkite.com/organizations/{BUILDKITE_ORGANIZATION_SLUG}/pipelines/{BUILDKITE_PIPELINE_SLUG}/builds/{BUILDKITE_BUILD_NUMBER}/jobs/{BUILDKITE_JOB_ID}/download.txt Given these env vars that were set in that job: BUILDKITE_ORGANIZATION_SLUG="llvm-project" BUILDKITE_PIPELINE_SLUG="github-pull-requests" BUILDKITE_BUILD_NUMBER="123979" BUILDKITE_JOB_ID="01937132-0fc3-4c95-a884-2fc0048cb9a7" In theory these will always be available but: 1. Rather safe than sorry with this script, I don't want to make a passing build a failure because this script failed. 2. It would get very annoying if you had to set all these to test the script locally.	2024-12-11 09:46:34 +00:00
Aiden Grossman	77c2b00553	[CI] Upstream metrics script and container definition (#117461 ) This patch includes the script that pulls information from Github and pushes it to Grafana. This is currently running in the cluster and pushes information to https://llvm.grafana.net/public-dashboards/6a1c1969b6794e0a8ee5d494c72ce2cd. This script is designed to accept other jobs relatively easily and can be easily modified to look at other metrics.	2024-11-29 11:15:44 -08:00
David Spickett	3b8426d340	[ci] Fix unit tests for test report generator Last time I fixed a bug here I forgot to update them.	2024-11-28 09:26:30 +00:00
David Spickett	6a12b43ac0	[ci] Fix error when no junit files are passed to report generator This resulted in the style being None and despite the report being empty as well, we tried to send it to the agent and Python can't send None as an argument. To fix this return "success" style and also check whether the report has any content before calling the agent.	2024-11-18 09:08:41 +00:00
David Spickett	889b3c9487	Reland "[ci] New script to generate test reports as Buildkite Annotations (#113447 )" This reverts commit 8a1ca6cad9cd0e972c322910cdfbbe9552c6c7ca. I have fixed 2 things: * The report is now sent by stdin so we do not hit the limit on the size of command line arguments. * The report is limited to 1MB in size and if we exceed that we fall back to listing only the totals with a note telling you to check the full log.	2024-11-13 10:39:57 +00:00
David Spickett	8a1ca6cad9	Revert "[ci] New script to generate test reports as Buildkite Annotations (#113447 )" This reverts commit e74a002433b4cf7f891ceedb61bd862867218a8b. As it is failing on Linux with "OSError: [Errno 7] Argument list too long: 'buildkite-agent'".	2024-11-12 16:29:55 +00:00
David Spickett	e74a002433	[ci] New script to generate test reports as Buildkite Annotations (#113447 ) The CI builds now send the results of every lit run to a unique file. This means we can read them all to make a combined report for all tests. This report will be shown as an "annotation" in the build results: https://buildkite.com/docs/agent/v3/cli-annotate#creating-an-annotation Here is an example: https://buildkite.com/llvm-project/github-pull-requests/builds/112660 (make sure it is showing "All" instead of "Failures") This is an alternative to using the existing Buildkite plugin: https://github.com/buildkite-plugins/junit-annotate-buildkite-plugin As the plugin is: * Specific to Buildkite, and we may move away from Buildkite. * Requires docker, unless we were to fork it ourselves. * Does not let you customise the report format unless again, we make our own fork. Annotations use GitHub's flavour of Markdown so the main code in the script generates that text. There is an extra "style" argument generated to make the formatting nicer in Buildkite. "context" is the name of the annotation that will be created. By using different context names for Linux and Windows results we get 2 separate annotations. The script also handles calling the buildkite-agent. This makes passing extra arguments to the agent easier, rather than piping the output of this script into the agent. In the future we can remove the agent part of it and simply use the report content. Either printed to stdout or as a comment on the GitHub PR.	2024-11-12 13:34:47 +00:00

1 2

89 Commits