-
Notifications
You must be signed in to change notification settings - Fork 157
macOS: queue for munmap operations #1993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
macOS: queue for munmap operations #1993
Conversation
Executing many mmap/munmap calls alternately can cause a huge load on
macOS. In order to reduce it, we should temporarily store munmap
operations in a queue and process them all at once when the queue is
filled. When the program terminates, we can discard any remaining munmap
operations as corresponding mmaped regions are automatically reclaimed.
Add a queue for munmap operations to perform them all at once.
Here are some example timings. On the Linux kernel repository that
requires about 1700 mmap/munmap calls:
time git ls-tree -r -l --full-tree 211ddde > /dev/null
Before:
real 0m2.083s
user 0m0.201s
sys 0m1.873s
After:
real 0m0.243s
user 0m0.179s
sys 0m0.052s
On a private repository that requires about 943000 mmap/munmap calls:
time git ls-tree -r -l --full-tree xxxxxxx > /dev/null
Before:
real 27m15.138s
user 0m5.084s
sys 27m9.636s
After:
real 0m24.209s
user 0m3.055s
sys 0m21.123s
Signed-off-by: Koji Nakamaru <[email protected]>
ea23f90 to
6c619c6
Compare
|
/preview |
|
Preview email sent as [email protected] |
|
/submit |
|
Submitted as [email protected] To fetch this version into To fetch this version to local tag |
|
On the Git mailing list, Torsten Bögershausen wrote (reply to this): Some comments inline, all up to improvements
On Mon, Oct 20, 2025 at 10:35:02PM +0000, Koji Nakamaru via GitGitGadget wrote:
> From: Koji Nakamaru <[email protected]>
>
> Executing many mmap/munmap calls alternately can cause a huge load on
> macOS. In order to reduce it, we should temporarily store munmap
> operations in a queue and process them all at once when the queue is
> filled. When the program terminates, we can discard any remaining munmap
> operations as corresponding mmaped regions are automatically reclaimed.
>
> Add a queue for munmap operations to perform them all at once.
>
Suggestions for rewording:
In order to reduce the peak load store all munmap operations in a queue.
Process them all at once (and more efficient) when the queue is filled.
The queue may be ignored when the git process terminates. The operating
system will do all munmap() when the process exits.
> Here are some example timings. On the Linux kernel repository that
> requires about 1700 mmap/munmap calls:
>
> time git ls-tree -r -l --full-tree 211ddde > /dev/null
>
> Before:
> real 0m2.083s
> user 0m0.201s
> sys 0m1.873s
>
> After:
> real 0m0.243s
> user 0m0.179s
> sys 0m0.052s
>
> On a private repository that requires about 943000 mmap/munmap calls:
>
> time git ls-tree -r -l --full-tree xxxxxxx > /dev/null
>
> Before:
> real 27m15.138s
> user 0m5.084s
> sys 27m9.636s
>
> After:
> real 0m24.209s
> user 0m3.055s
> sys 0m21.123s
>
> Signed-off-by: Koji Nakamaru <[email protected]>
> ---
> macOS: queue for munmap operations
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1993%2FKojiNakamaru%2Ffeature%2Fosx-queued-munmap-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1993/KojiNakamaru/feature/osx-queued-munmap-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1993
>
> Makefile | 1 +
> compat/osxmmap.c | 49 +++++++++++++++++++++++++++++
> compat/posix.h | 7 +++++
> contrib/buildsystems/CMakeLists.txt | 4 +++
> meson.build | 2 ++
> 5 files changed, 63 insertions(+)
> create mode 100644 compat/osxmmap.c
>
> diff --git a/Makefile b/Makefile
> index f79c905bdc..058bc83753 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1654,6 +1654,7 @@ ifeq ($(uname_S),Darwin)
> COMPAT_CFLAGS += -DAPPLE_COMMON_CRYPTO
> endif
> PTHREAD_LIBS =
> + COMPAT_OBJS += compat/osxmmap.o
> endif
>
> ifdef NO_LIBGEN_H
> diff --git a/compat/osxmmap.c b/compat/osxmmap.c
> new file mode 100644
> index 0000000000..5f9cf633ca
> --- /dev/null
> +++ b/compat/osxmmap.c
> @@ -0,0 +1,49 @@
> +#include <pthread.h>
> +#include "../git-compat-util.h"
> +/* We need original mmap/munmap here. */
> +#undef mmap
> +#undef munmap
> +
> +/*
> + * OSX doesn't have any specific setting like Linux's vm.max_map_count,
> + * so COUNT_MAX can be any large number. We here set it to the default
> + * value of Linux's vm.max_map_count.
> + */
> +#define COUNT_MAX (65530)
Why the parantheses ?
And would a less generic name be better, like
MAX_UNMAP_COUNT
> +
> +struct munmap_queue {
> + void *start;
> + size_t length;
> +};
> +
> +void *git_mmap(void *start, size_t length, int prot, int flags, int fd, off_t offset)
> +{
> + /*
> + * We can simply discard munmap operations in the queue by
> + * restricting mmap arguments.
> + */
Should I read this as
The munmap queue is only ment to defere read-only mappings.
And that is what Git does at the moment.
> + if (start != NULL || flags != MAP_PRIVATE || prot != PROT_READ)
> + die("invalid usage of mmap");
> + return mmap(start, length, prot, flags, fd, offset);
> +}
> +
> +int git_munmap(void *start, size_t length)
> +{
> + static pthread_mutex_t mutex;
> + static struct munmap_queue *queue;
> + static int count;
> + int i;
> +
> + pthread_mutex_lock(&mutex);
> + if (!queue)
> + queue = xmalloc(COUNT_MAX * sizeof(struct munmap_queue));
> + queue[count].start = start;
> + queue[count].length = length;
> + if (++count == COUNT_MAX) {
> + for (i = 0; i < COUNT_MAX; i++)
> + munmap(queue[i].start, queue[i].length);
> + count = 0;
> + }
> + pthread_mutex_unlock(&mutex);
> + return 0;
> +}
> diff --git a/compat/posix.h b/compat/posix.h
> index 067a00f33b..3fa1218289 100644
> --- a/compat/posix.h
> +++ b/compat/posix.h
> @@ -278,6 +278,13 @@ int git_munmap(void *start, size_t length);
>
> #include <sys/mman.h>
>
> +#if defined(__APPLE__)
I think it would be better to have a global Makefile knob here.
Which
a) allows to take out this patch once the MacOs kernel is improved
b) allows to hook in this code for other OS
Something like DEFER_MUNMAPS - better suggestions welcome
> +#define mmap git_mmap
> +#define munmap git_munmap
> +void *git_mmap(void *start, size_t length, int prot, int flags, int fd, off_t offset);
> +int git_munmap(void *start, size_t length);
> +#endif
> +
> #endif /* NO_MMAP || USE_WIN32_MMAP */
>
> #ifndef MAP_FAILED
> diff --git a/contrib/buildsystems/CMakeLists.txt b/contrib/buildsystems/CMakeLists.txt
> index edb0fc04ad..5c08f2fe5c 100644
> --- a/contrib/buildsystems/CMakeLists.txt
> +++ b/contrib/buildsystems/CMakeLists.txt
> @@ -271,6 +271,10 @@ if(CMAKE_SYSTEM_NAME STREQUAL "Windows")
> compat/strdup.c)
> set(NO_UNIX_SOCKETS 1)
>
> +elseif(CMAKE_SYSTEM_NAME STREQUAL "Darwin")
> + list(APPEND compat_SOURCES
> + compat/osxmmap.c)
> +
> elseif(CMAKE_SYSTEM_NAME STREQUAL "Linux")
> add_compile_definitions(PROCFS_EXECUTABLE_PATH="/proc/self/exe" HAVE_DEV_TTY )
> list(APPEND compat_SOURCES unix-socket.c unix-stream-server.c compat/linux/procinfo.c)
> diff --git a/meson.build b/meson.build
> index cee9424475..b9b6e731b1 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -1275,6 +1275,8 @@ elif host_machine.system() == 'windows'
> else
> libgit_sources += 'compat/mingw.c'
> endif
> +elif host_machine.system() == 'darwin'
> + libgit_sources += 'compat/osxmmap.c'
> endif
>
> if host_machine.system() == 'linux'
>
> base-commit: 4253630c6f07a4bdcc9aa62a50e26a4d466219d1
> --
> gitgitgadget
> |
|
User |
|
On the Git mailing list, Jeff King wrote (reply to this): On Mon, Oct 20, 2025 at 10:35:02PM +0000, Koji Nakamaru via GitGitGadget wrote:
> From: Koji Nakamaru <[email protected]>
>
> Executing many mmap/munmap calls alternately can cause a huge load on
> macOS. In order to reduce it, we should temporarily store munmap
> operations in a queue and process them all at once when the queue is
> filled. When the program terminates, we can discard any remaining munmap
> operations as corresponding mmaped regions are automatically reclaimed.
>
> Add a queue for munmap operations to perform them all at once.
>
> Here are some example timings. On the Linux kernel repository that
> requires about 1700 mmap/munmap calls:
>
> time git ls-tree -r -l --full-tree 211ddde > /dev/null
Why is it doing so many mmap calls? Do you have a ton of loose objects?
We have to mmap loose objects individually (because they're all in
separate files), but each pack only gets a single map (well, there's a
window parameter, but it's 1GB on 64-bit systems, so you should get a
handful of maps at most).
If you run "git gc", how does the resulting ls-tree perform? I have only
27 mmap() calls on my system.
I know that running "git gc" is relatively expensive, but it is also
bringing other optimizations (like the fact that we don't have to open()
and map each of those files in the first place!).
> On a private repository that requires about 943000 mmap/munmap calls:
>
> time git ls-tree -r -l --full-tree xxxxxxx > /dev/null
Ditto here. I'd be curious how well packed the repo is, and how it does
after a repack. If it has a very large packfile, you might also try:
git config core.packedGitWindowSize 4G
or similar (though for just an ls-tree, we should only be looking at
tree objects, which in general I'd expect to be in a confined area of
the packfile; so the 1GB window is probably plenty).
> +int git_munmap(void *start, size_t length)
> +{
> + static pthread_mutex_t mutex;
> + static struct munmap_queue *queue;
> + static int count;
> + int i;
> +
> + pthread_mutex_lock(&mutex);
> + if (!queue)
> + queue = xmalloc(COUNT_MAX * sizeof(struct munmap_queue));
> + queue[count].start = start;
> + queue[count].length = length;
> + if (++count == COUNT_MAX) {
> + for (i = 0; i < COUNT_MAX; i++)
> + munmap(queue[i].start, queue[i].length);
> + count = 0;
> + }
> + pthread_mutex_unlock(&mutex);
> + return 0;
> +}
Does batching those unmaps actually make them faster? Or is it just that
the commands you showed did not fill the queue, so we essentially just
leaked all of those maps until the program exited?
If the latter, then I'd wonder:
1. Does this increase memory pressure, since the OS has no idea we're
not actually interested in those maps anymore? Some of them can be
quite large, if the command is looking at blobs.
2. How does it perform on a command that actually fills the queue? I
guess something like "git log --raw" might do it (though if my
guesses above are right, you'd need on the order of 64,000 loose
trees).
-Peff |
|
User |
|
On the Git mailing list, Koji Nakamaru wrote (reply to this): Thank you for pointing out many unusual mmap calls and other details. As
discussed below, the root cause was simply my ~/.gitconfig. This patch
may be useful in some rare/edge cases but a somewhat unusual hack, so
I'm withdrawing it.
On Tue, Oct 21, 2025 at 5:07 PM Jeff King <[email protected]> wrote:
>
> On Mon, Oct 20, 2025 at 10:35:02PM +0000, Koji Nakamaru via GitGitGadget wrote:
>
> > From: Koji Nakamaru <[email protected]>
> >
> > Executing many mmap/munmap calls alternately can cause a huge load on
> > macOS. In order to reduce it, we should temporarily store munmap
> > operations in a queue and process them all at once when the queue is
> > filled. When the program terminates, we can discard any remaining munmap
> > operations as corresponding mmaped regions are automatically reclaimed.
> >
> > Add a queue for munmap operations to perform them all at once.
> >
> > Here are some example timings. On the Linux kernel repository that
> > requires about 1700 mmap/munmap calls:
> >
> > time git ls-tree -r -l --full-tree 211ddde > /dev/null
>
> Why is it doing so many mmap calls? Do you have a ton of loose objects?
> We have to mmap loose objects individually (because they're all in
> separate files), but each pack only gets a single map (well, there's a
> window parameter, but it's 1GB on 64-bit systems, so you should get a
> handful of maps at most).
>
> If you run "git gc", how does the resulting ls-tree perform? I have only
> 27 mmap() calls on my system.
>
> I know that running "git gc" is relatively expensive, but it is also
> bringing other optimizations (like the fact that we don't have to open()
> and map each of those files in the first place!).
>
> > On a private repository that requires about 943000 mmap/munmap calls:
> >
> > time git ls-tree -r -l --full-tree xxxxxxx > /dev/null
>
> Ditto here. I'd be curious how well packed the repo is, and how it does
> after a repack. If it has a very large packfile, you might also try:
>
> git config core.packedGitWindowSize 4G
>
> or similar (though for just an ls-tree, we should only be looking at
> tree objects, which in general I'd expect to be in a confined area of
> the packfile; so the 1GB window is probably plenty).
Following your suggestion, I investigated the number of mmap calls in
other environments and found much smaller counts. I tracked how
xmmap_gently() was called in packfile.c and found
settings->packed_git_window_size was different between environments. My
~/.gitconfig defined "packedGitLimit = 128m" and this caused many calls.
> > +int git_munmap(void *start, size_t length)
> > +{
> > + static pthread_mutex_t mutex;
> > + static struct munmap_queue *queue;
> > + static int count;
> > + int i;
> > +
> > + pthread_mutex_lock(&mutex);
> > + if (!queue)
> > + queue = xmalloc(COUNT_MAX * sizeof(struct munmap_queue));
> > + queue[count].start = start;
> > + queue[count].length = length;
> > + if (++count == COUNT_MAX) {
> > + for (i = 0; i < COUNT_MAX; i++)
> > + munmap(queue[i].start, queue[i].length);
> > + count = 0;
> > + }
> > + pthread_mutex_unlock(&mutex);
> > + return 0;
> > +}
>
> Does batching those unmaps actually make them faster? Or is it just that
> the commands you showed did not fill the queue, so we essentially just
> leaked all of those maps until the program exited?
>
> If the latter, then I'd wonder:
>
> 1. Does this increase memory pressure, since the OS has no idea we're
> not actually interested in those maps anymore? Some of them can be
> quite large, if the command is looking at blobs.
>
> 2. How does it perform on a command that actually fills the queue? I
> guess something like "git log --raw" might do it (though if my
> guesses above are right, you'd need on the order of 64,000 loose
> trees).
In my extreme cases, this batching makes them faster. Queue flushing has
occurred several times for the private repository case and not occurred
for the Linux kernel case. Though I haven't investigated in detail,
memory pressure doesn't seem to be critical (and it could also be
possible to adopt smarter thresholds).
I tested git log --raw for the Linux kernel repository. For reference,
the results are shown below:
without "packedGitLimit = 128m":
mmap 9
# without batching
real 1m3.970s
user 1m2.232s
sys 0m1.725s
# with batching
real 1m5.991s
user 0m58.637s
sys 0m4.315s
with "packedGitLimit = 128m":
mmap 3072538
# without batching
(It took too long so I stopped the execution)
real 518m6.928s
user 0m41.126s
sys 517m24.072s
# with batching
real 2m26.276s
user 1m8.495s
sys 1m3.230s |
|
On the Git mailing list, Koji Nakamaru wrote (reply to this): Thank you for detailed suggestions. As I discussed in another thread,
the root cause of many mmap/munmap calls was simply my ~/.gitconfig, so
I'm withdrawing this patch. I'll answer some of your comments below.
On Tue, Oct 21, 2025 at 3:26 PM Torsten Bögershausen <[email protected]> wrote:
>
> Some comments inline, all up to improvements
>
> On Mon, Oct 20, 2025 at 10:35:02PM +0000, Koji Nakamaru via GitGitGadget wrote:
> > From: Koji Nakamaru <[email protected]>
> >
> > Executing many mmap/munmap calls alternately can cause a huge load on
> > macOS. In order to reduce it, we should temporarily store munmap
> > operations in a queue and process them all at once when the queue is
> > filled. When the program terminates, we can discard any remaining munmap
> > operations as corresponding mmaped regions are automatically reclaimed.
> >
> > Add a queue for munmap operations to perform them all at once.
> >
>
> Suggestions for rewording:
> In order to reduce the peak load store all munmap operations in a queue.
> Process them all at once (and more efficient) when the queue is filled.
> The queue may be ignored when the git process terminates. The operating
> system will do all munmap() when the process exits.
Thank you, it is much clear.
> > Here are some example timings. On the Linux kernel repository that
> > requires about 1700 mmap/munmap calls:
> >
> > time git ls-tree -r -l --full-tree 211ddde > /dev/null
> >
> > Before:
> > real 0m2.083s
> > user 0m0.201s
> > sys 0m1.873s
> >
> > After:
> > real 0m0.243s
> > user 0m0.179s
> > sys 0m0.052s
> >
> > On a private repository that requires about 943000 mmap/munmap calls:
> >
> > time git ls-tree -r -l --full-tree xxxxxxx > /dev/null
> >
> > Before:
> > real 27m15.138s
> > user 0m5.084s
> > sys 27m9.636s
> >
> > After:
> > real 0m24.209s
> > user 0m3.055s
> > sys 0m21.123s
> >
> > Signed-off-by: Koji Nakamaru <[email protected]>
> > ---
> > macOS: queue for munmap operations
> >
> > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1993%2FKojiNakamaru%2Ffeature%2Fosx-queued-munmap-v1
> > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1993/KojiNakamaru/feature/osx-queued-munmap-v1
> > Pull-Request: https://github.com/gitgitgadget/git/pull/1993
> >
> > Makefile | 1 +
> > compat/osxmmap.c | 49 +++++++++++++++++++++++++++++
> > compat/posix.h | 7 +++++
> > contrib/buildsystems/CMakeLists.txt | 4 +++
> > meson.build | 2 ++
> > 5 files changed, 63 insertions(+)
> > create mode 100644 compat/osxmmap.c
> >
> > diff --git a/Makefile b/Makefile
> > index f79c905bdc..058bc83753 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -1654,6 +1654,7 @@ ifeq ($(uname_S),Darwin)
> > COMPAT_CFLAGS += -DAPPLE_COMMON_CRYPTO
> > endif
> > PTHREAD_LIBS =
> > + COMPAT_OBJS += compat/osxmmap.o
> > endif
> >
> > ifdef NO_LIBGEN_H
> > diff --git a/compat/osxmmap.c b/compat/osxmmap.c
> > new file mode 100644
> > index 0000000000..5f9cf633ca
> > --- /dev/null
> > +++ b/compat/osxmmap.c
> > @@ -0,0 +1,49 @@
> > +#include <pthread.h>
> > +#include "../git-compat-util.h"
> > +/* We need original mmap/munmap here. */
> > +#undef mmap
> > +#undef munmap
> > +
> > +/*
> > + * OSX doesn't have any specific setting like Linux's vm.max_map_count,
> > + * so COUNT_MAX can be any large number. We here set it to the default
> > + * value of Linux's vm.max_map_count.
> > + */
> > +#define COUNT_MAX (65530)
>
> Why the parantheses ?
> And would a less generic name be better, like
> MAX_UNMAP_COUNT
The parentheses are not required but I prefer them as discussed in [1].
I agree MAX_UNMAP_COUNT is more clear.
> > +
> > +struct munmap_queue {
> > + void *start;
> > + size_t length;
> > +};
> > +
> > +void *git_mmap(void *start, size_t length, int prot, int flags, int fd, off_t offset)
> > +{
> > + /*
> > + * We can simply discard munmap operations in the queue by
> > + * restricting mmap arguments.
> > + */
> Should I read this as
> The munmap queue is only ment to defere read-only mappings.
> And that is what Git does at the moment.
Yes. This part is actually borrowed from compat/mmap.c and I've also
verified that the predicate is valid by searching all mmap calls.
> > + if (start != NULL || flags != MAP_PRIVATE || prot != PROT_READ)
> > + die("invalid usage of mmap");
> > + return mmap(start, length, prot, flags, fd, offset);
> > +}
> > +
> > +int git_munmap(void *start, size_t length)
> > +{
> > + static pthread_mutex_t mutex;
> > + static struct munmap_queue *queue;
> > + static int count;
> > + int i;
> > +
> > + pthread_mutex_lock(&mutex);
> > + if (!queue)
> > + queue = xmalloc(COUNT_MAX * sizeof(struct munmap_queue));
> > + queue[count].start = start;
> > + queue[count].length = length;
> > + if (++count == COUNT_MAX) {
> > + for (i = 0; i < COUNT_MAX; i++)
> > + munmap(queue[i].start, queue[i].length);
> > + count = 0;
> > + }
> > + pthread_mutex_unlock(&mutex);
> > + return 0;
> > +}
> > diff --git a/compat/posix.h b/compat/posix.h
> > index 067a00f33b..3fa1218289 100644
> > --- a/compat/posix.h
> > +++ b/compat/posix.h
> > @@ -278,6 +278,13 @@ int git_munmap(void *start, size_t length);
> >
> > #include <sys/mman.h>
> >
> > +#if defined(__APPLE__)
> I think it would be better to have a global Makefile knob here.
> Which
> a) allows to take out this patch once the MacOs kernel is improved
> b) allows to hook in this code for other OS
> Something like DEFER_MUNMAPS - better suggestions welcome
I followed your suggestion and adjusted code and Makefile, etc. (locally)
> > [snip]
[1] https://stackoverflow.com/questions/9081479/is-there-a-good-reason-for-always-enclosing-a-define-in-parentheses-in-c |
|
On the Git mailing list, Jeff King wrote (reply to this): On Wed, Oct 22, 2025 at 10:21:32AM +0900, Koji Nakamaru wrote:
> > Ditto here. I'd be curious how well packed the repo is, and how it does
> > after a repack. If it has a very large packfile, you might also try:
> >
> > git config core.packedGitWindowSize 4G
> >
> > or similar (though for just an ls-tree, we should only be looking at
> > tree objects, which in general I'd expect to be in a confined area of
> > the packfile; so the 1GB window is probably plenty).
>
> Following your suggestion, I investigated the number of mmap calls in
> other environments and found much smaller counts. I tracked how
> xmmap_gently() was called in packfile.c and found
> settings->packed_git_window_size was different between environments. My
> ~/.gitconfig defined "packedGitLimit = 128m" and this caused many calls.
Ah, very interesting. Yes, I think that helps explain why there were so
many mmap calls. I don't think there's a good reason to lower that
number in general, assuming the OS is reasonably good at dropping mapped
pages from RAM when there's memory pressure.
> In my extreme cases, this batching makes them faster. Queue flushing has
> occurred several times for the private repository case and not occurred
> for the Linux kernel case. Though I haven't investigated in detail,
> memory pressure doesn't seem to be critical (and it could also be
> possible to adopt smarter thresholds).
OK, that's quite interesting that batching makes such a difference. I
guess somebody with more knowledge of macOS kernel internals could
probably explain it. Though it sounds like your problem was sufficiently
solved by dropping the extra config, it's a good fact for us to know
about in general.
-Peff |
cc: Torsten Bögershausen [email protected]
cc: Jeff King [email protected]