llvm-project/offload/test/offloading/memory_manager.cpp

// RUN: %libomptarget-compilexx-run-and-check-generic

// REQUIRES: nvidiagpu

#include <omp.h>

#include <cassert>
#include <iostream>

int main(int argc, char *argv[]) {
#pragma omp parallel for
  for (int i = 0; i < 16; ++i) {
    for (int n = 1; n < (1 << 13); n <<= 1) {
      void *p = omp_target_alloc(n * sizeof(int), 0);
      omp_target_free(p, 0);
    }
  }

#pragma omp parallel for
  for (int i = 0; i < 16; ++i) {
    for (int n = 1; n < (1 << 13); n <<= 1) {
      int *p = (int *)omp_target_alloc(n * sizeof(int), 0);
#pragma omp target teams distribute parallel for is_device_ptr(p)
      for (int j = 0; j < n; ++j) {
        p[j] = i;
      }
      int buffer[n];
#pragma omp target teams distribute parallel for is_device_ptr(p)              \
    map(from : buffer)
      for (int j = 0; j < n; ++j) {
        buffer[j] = p[j];
      }
      for (int j = 0; j < n; ++j) {
        assert(buffer[j] == i);
      }
      omp_target_free(p, 0);
    }
  }

  std::cout << "PASS\n";
  return 0;
}

// CHECK: PASS
[OpenMP][libomptarget] Separate lit tests for different offloading targets (2/2) This patch fuses the RUN lines for most libomptarget tests. The previous patch D101315 created separate test targets for each supported offloading triple. This patch updates the RUN lines in libomptarget tests to use a generic run line independent of the offloading target selected for the lit instance. In cases, where no RUN line was defined for a specific offloading target, the corresponding target is declared as XFAIL. If it turns out that a test actually supports the target, the XFAIL line can be removed. Differential Revision: https://reviews.llvm.org/D101326 2021-04-27 15:50:53 +02:00			`// RUN: %libomptarget-compilexx-run-and-check-generic`
[OpenMP] Introduce target memory manager Target memory manager is introduced in this patch which aims to manage target memory such that they will not be freed immediately when they are not used because the overhead of memory allocation and free is very large. For CUDA device, cuMemFree even blocks the context switch on device which affects concurrent kernel execution. The memory manager can be taken as a memory pool. It divides the pool into multiple buckets according to the size such that memory allocation/free distributed to different buckets will not affect each other. In this version, we use the exact-equality policy to find a free buffer. This is an open question: will best-fit work better here? IMO, best-fit is not good for target memory management because computation on GPU usually requires GBs of data. Best-fit might lead to a serious waste. For example, there is a free buffer of size 1960MB, and now we need a buffer of size 1200MB. If best-fit, the free buffer will be returned, leading to a 760MB waste. The allocation will happen when there is no free memory left, and the memory free on device will take place in the following two cases: 1. The program ends. Obviously. However, there is a little problem that plugin library is destroyed before the memory manager is destroyed, leading to a fact that the call to target plugin will not succeed. 2. Device is out of memory when we request a new memory. The manager will walk through all free buffers from the bucket with largest base size, pick up one buffer, free it, and try to allocate immediately. If it succeeds, it will return right away rather than freeing all buffers in free list. Update: A threshold (8KB by default) is set such that users could control what size of memory will be managed by the manager. It can also be configured by an environment variable `LIBOMPTARGET_MEMORY_MANAGER_THRESHOLD`. Reviewed By: jdoerfert, ye-luo, JonChesterfield Differential Revision: https://reviews.llvm.org/D81054 2020-08-19 23:12:02 -04:00
[Offload][test]Fix typo of requires (#98327) Typos in 8823448807f3b1a1362d1417e062d763734e02f5. 2024-07-10 07:51:47 -07:00			`// REQUIRES: nvidiagpu`
[openmp] Disable the tests that block CI for amdgpu and host offloading. 2021-08-19 20:43:05 +01:00
[OpenMP] Introduce target memory manager Target memory manager is introduced in this patch which aims to manage target memory such that they will not be freed immediately when they are not used because the overhead of memory allocation and free is very large. For CUDA device, cuMemFree even blocks the context switch on device which affects concurrent kernel execution. The memory manager can be taken as a memory pool. It divides the pool into multiple buckets according to the size such that memory allocation/free distributed to different buckets will not affect each other. In this version, we use the exact-equality policy to find a free buffer. This is an open question: will best-fit work better here? IMO, best-fit is not good for target memory management because computation on GPU usually requires GBs of data. Best-fit might lead to a serious waste. For example, there is a free buffer of size 1960MB, and now we need a buffer of size 1200MB. If best-fit, the free buffer will be returned, leading to a 760MB waste. The allocation will happen when there is no free memory left, and the memory free on device will take place in the following two cases: 1. The program ends. Obviously. However, there is a little problem that plugin library is destroyed before the memory manager is destroyed, leading to a fact that the call to target plugin will not succeed. 2. Device is out of memory when we request a new memory. The manager will walk through all free buffers from the bucket with largest base size, pick up one buffer, free it, and try to allocate immediately. If it succeeds, it will return right away rather than freeing all buffers in free list. Update: A threshold (8KB by default) is set such that users could control what size of memory will be managed by the manager. It can also be configured by an environment variable `LIBOMPTARGET_MEMORY_MANAGER_THRESHOLD`. Reviewed By: jdoerfert, ye-luo, JonChesterfield Differential Revision: https://reviews.llvm.org/D81054 2020-08-19 23:12:02 -04:00			`#include <omp.h>`

			`#include <cassert>`
			`#include <iostream>`

			`int main(int argc, char *argv[]) {`
			`#pragma omp parallel for`
			`for (int i = 0; i < 16; ++i) {`
			`for (int n = 1; n < (1 << 13); n <<= 1) {`
			`void p = omp_target_alloc(n sizeof(int), 0);`
			`omp_target_free(p, 0);`
			`}`
			`}`

			`#pragma omp parallel for`
			`for (int i = 0; i < 16; ++i) {`
			`for (int n = 1; n < (1 << 13); n <<= 1) {`
			`int p = (int )omp_target_alloc(n * sizeof(int), 0);`
			`#pragma omp target teams distribute parallel for is_device_ptr(p)`
			`for (int j = 0; j < n; ++j) {`
			`p[j] = i;`
			`}`
			`int buffer[n];`
			`#pragma omp target teams distribute parallel for is_device_ptr(p) \`
[Libomptarget][NFC] clang-format the libomptarget OpenMP tests Summary: Recent changes to clang-format improved the handling of OpenMP pragmas. Clean up the existing libomptarget tests. 2022-10-19 08:26:35 -05:00			`map(from : buffer)`
[OpenMP] Introduce target memory manager Target memory manager is introduced in this patch which aims to manage target memory such that they will not be freed immediately when they are not used because the overhead of memory allocation and free is very large. For CUDA device, cuMemFree even blocks the context switch on device which affects concurrent kernel execution. The memory manager can be taken as a memory pool. It divides the pool into multiple buckets according to the size such that memory allocation/free distributed to different buckets will not affect each other. In this version, we use the exact-equality policy to find a free buffer. This is an open question: will best-fit work better here? IMO, best-fit is not good for target memory management because computation on GPU usually requires GBs of data. Best-fit might lead to a serious waste. For example, there is a free buffer of size 1960MB, and now we need a buffer of size 1200MB. If best-fit, the free buffer will be returned, leading to a 760MB waste. The allocation will happen when there is no free memory left, and the memory free on device will take place in the following two cases: 1. The program ends. Obviously. However, there is a little problem that plugin library is destroyed before the memory manager is destroyed, leading to a fact that the call to target plugin will not succeed. 2. Device is out of memory when we request a new memory. The manager will walk through all free buffers from the bucket with largest base size, pick up one buffer, free it, and try to allocate immediately. If it succeeds, it will return right away rather than freeing all buffers in free list. Update: A threshold (8KB by default) is set such that users could control what size of memory will be managed by the manager. It can also be configured by an environment variable `LIBOMPTARGET_MEMORY_MANAGER_THRESHOLD`. Reviewed By: jdoerfert, ye-luo, JonChesterfield Differential Revision: https://reviews.llvm.org/D81054 2020-08-19 23:12:02 -04:00			`for (int j = 0; j < n; ++j) {`
			`buffer[j] = p[j];`
			`}`
			`for (int j = 0; j < n; ++j) {`
			`assert(buffer[j] == i);`
			`}`
			`omp_target_free(p, 0);`
			`}`
			`}`

			`std::cout << "PASS\n";`
			`return 0;`
			`}`

			`// CHECK: PASS`