Add cmake option to build without CUDA VMM

# Prerequisites

Please answer the following questions for yourself before submitting an issue.

- [X ] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [X ] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md).
- [X ] I [searched using keywords relevant to my issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/filtering-and-searching-issues-and-pull-requests) to make sure that I am creating a new issue that is not already open (or closed).
- [X ] I reviewed the [Discussions](https://github.com/ggerganov/llama.cpp/discussions), and have a new bug or useful enhancement to share.

# Feature Description

As today ggml-cuda.cu tries to take advantage of cuda VMM if possible: 
https://github.com/ggerganov/llama.cpp/blob/784e11dea1f5ce9638851b2b0dddb107e2a609c8/ggml-cuda.cu#L116
This is not necessarilly possible/desired eg:
https://forums.developer.nvidia.com/t/potential-nvshmem-allocated-memory-performance-issue/275416/4

Would you mind if I add a cmake/make option/define in order to build ggml-cuda with(default) or without VMM support ?

Best
WT

# Motivation

In order to build llamacpp/ggml without vmm.

# Possible Implementation
Adding a cmake arg 'GGML_USE_CUDA_VMM' (default ON)
and then line 115:
```
#if !defined(GGML_USE_HIPBLAS) && defined(GGML_USE_CUDA_VMM)
        CUdevice device;
        CU_CHECK(cuDeviceGet(&device, id));
        CU_CHECK(cuDeviceGetAttribute(&device_vmm, CU_DEVICE_ATTRIBUTE_VIRTUAL_MEMORY_MANAGEMENT_SUPPORTED, device));
     ...
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add cmake option to build without CUDA VMM #6889

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add cmake option to build without CUDA VMM #6889

Description

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions