Skip to content

Conversation

@GeorgianaElena
Copy link
Member

Fixes #6639

Running fatrace showed that the following files were written at startup:

root@storage-quota-home-nfs-57f6f75bd5-gfl64:/export# fatrace -f W -c
unknown(0): CW  /export/staging/georgianaelena-402i2c-2eorg/.jupyter/migrated
unknown(0): W   /export/staging/georgianaelena-402i2c-2eorg/.npm/_logs/2025-11-10T13_07_33_076Z-debug-0.log
unknown(0): CW  /export/staging/georgianaelena-402i2c-2eorg/.npm/_logs/2025-11-10T13_07_33_076Z-debug-0.log
unknown(0): W   /export/staging/georgianaelena-402i2c-2eorg/.jupyter/lab/workspaces/default-37a8.jupyterlab-workspace
unknown(0): CW  /export/staging/georgianaelena-402i2c-2eorg/.jupyter/lab/workspaces/default-37a8.jupyterlab-workspace
unknown(0): W   /export/staging/georgianaelena-402i2c-2eorg/.jupyter/lab/workspaces/default-37a8.jupyterlab-workspace
unknown(0): W   /export/staging/georgianaelena-402i2c-2eorg/.jupyter/lab/workspaces/default-37a8.jupyterlab-workspace
unknown(0): CW  /export/staging/georgianaelena-402i2c-2eorg/.jupyter/lab/workspaces/default-37a8.jupyterlab-workspace
unknown(0): CW  /export/staging/georgianaelena-402i2c-2eorg/.jupyter/migrated
unknown(0): W   /export/staging/georgianaelena-402i2c-2eorg/.npm/_logs/2025-11-10T13_10_02_551Z-debug-0.log
unknown(0): CW  /export/staging/georgianaelena-402i2c-2eorg/.npm/_logs/2025-11-10T13_10_02_551Z-debug-0.log

Unfortunaly, the migrated dirname is hardcoded, so we have to put the entire .jupyter config in /tmp.

Is that acceptable? I am thinking specifically if the workspaces are to be persisted, that this new setup will make it impossible.

@github-actions

This comment was marked as resolved.

@yuvipanda
Copy link
Member

I was looking at why migrate was being called at all, and found https://github.com/jupyter/jupyter_core/blob/2e63fd3928a979844fa0c2a247ee1937bbae9a99/jupyter_core/application.py#L166 as the likely culprit. But that looks like it should handle not being able to write .migrated just fine, because it catches OSError.

Can you tell me when the fatrace was run? Was it against a user who was already at full quota? I think filling up a user's home directory to quota, then trying to start their server will yield useful information both from the server logs and fatrace, because it is possible that some of these catch OSError and handle just fine.

I agree that we should not screw up workspaces if we can avoid it. I'll note that workspaces can be set separately via https://github.com/jupyterlab/jupyterlab_server/blob/f64f554291a09f072e479ff52ae2212084aaac39/jupyterlab_server/config.py#L273 as well it looks like.

@GeorgianaElena
Copy link
Member Author

@yuvipanda, the server does start if .jupyter/migrated exists, even if the storage quota is hit. So the OSError catching does its job.

This is the fatrace output from when starting a server, then trying to open up a new notebook, while I had reached the quota I had set for my username.

unknown(0): CW  /export/staging/georgianaelena-402i2c-2eorg/.jupyter/migrated
unknown(0): W   /export/staging/georgianaelena-402i2c-2eorg/.jupyter/lab/user-settings/@jupyterlab/filebrowser-extension/browser.jupyterlab-settings
unknown(0): CW  /export/staging/georgianaelena-402i2c-2eorg/.jupyter/lab/user-settings/@jupyterlab/filebrowser-extension/browser.jupyterlab-settings
unknown(0): CW  /export/staging/georgianaelena-402i2c-2eorg/.jupyter/lab/user-settings/@jupyterlab/filebrowser-extension/browser.jupyterlab-settings
unknown(0): CW  /export/staging/georgianaelena-402i2c-2eorg/.local/share/jupyter/nbsignatures.db
unknown(0): CW  /export/staging/georgianaelena-402i2c-2eorg/.local/share/jupyter/nbsignatures.db
unknown(0): CW  /export/staging/georgianaelena-402i2c-2eorg/.local/share/jupyter/nbsignatures.db
unknown(0): CW  /export/staging/georgianaelena-402i2c-2eorg/.local/share/jupyter/nbsignatures.db

@GeorgianaElena
Copy link
Member Author

The OSError: [Errno 28] No space left on device: '/home/jovyan/.jupyter/migrated' shows up when .jupyter/migrated doesn't exist at all. I have checked with an empty migrated file and the server started just fine.

I believe this is the case when there's a user trying to start up their server for the first time on a full homedir disk and not when they are hitting their quota. Otherwise, the migrated file should be there, if they'd started their servers before, right?

Then solving this would be solving 2i2c-org/jupyterhub-home-nfs#41?

Is there something that I'm missing here that I'm not able to reproduce this with the quota?

@yuvipanda
Copy link
Member

2i2c-org/jupyterhub-home-nfs#41 should be unrelated, as that's mostly about the underlying disk when it's 100% full, rather than anything about an individual user's quota.

@yuvipanda
Copy link
Member

So I filled up my openscapes staging hub quota, and looking at logs, I see:

[W 2025-11-12 23:10:21.550 LabApp] wrote error: "[Errno 28] No space left on device: '/home/jovyan/.jupyter/lab/workspaces/auto-c-7f4b.jupyterlab-workspace'"
    Traceback (most recent call last):
      File "/srv/conda/envs/notebook/lib/python3.11/site-packages/jupyterlab_server/workspaces_handler.py", line 220, in put
        self.manager.save(space_name, raw)
      File "/srv/conda/envs/notebook/lib/python3.11/site-packages/jupyterlab_server/workspaces_handler.py", line 164, in save
        workspace_path.write_text(raw, encoding="utf-8")
      File "/srv/conda/envs/notebook/lib/python3.11/pathlib.py", line 1078, in write_text
        with self.open(mode='w', encoding=encoding, errors=errors, newline=newline) as f:
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/srv/conda/envs/notebook/lib/python3.11/pathlib.py", line 1044, in open
        return io.open(self, mode, buffering, encoding, errors, newline)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    OSError: [Errno 28] No space left on device: '/home/jovyan/.jupyter/lab/workspaces/auto-c-7f4b.jupyterlab-workspace'
    
    The above exception was the direct cause of the following exception:
    
    Traceback (most recent call last):
      File "/srv/conda/envs/notebook/lib/python3.11/site-packages/tornado/web.py", line 1846, in _execute
        result = method(*self.path_args, **self.path_kwargs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/srv/conda/envs/notebook/lib/python3.11/site-packages/tornado/web.py", line 3375, in wrapper
        return method(self, *args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/srv/conda/envs/notebook/lib/python3.11/site-packages/jupyterlab_server/workspaces_handler.py", line 224, in put
        raise web.HTTPError(500, str(e)) from e
    tornado.web.HTTPError: HTTP 500: Internal Server Error ([Errno 28] No space left on device: '/home/jovyan/.jupyter/lab/workspaces/auto-c-7f4b.jupyterlab-workspace')
[W 2025-11-12 23:10:22.242 LabApp] wrote error: "[Errno 28] No space left on device: '/home/jovyan/.jupyter/lab/workspaces/auto-c-7f4b.jupyterlab-workspace'"
    Traceback (most recent call last):
      File "/srv/conda/envs/notebook/lib/python3.11/site-packages/jupyterlab_server/workspaces_handler.py", line 220, in put
        self.manager.save(space_name, raw)
      File "/srv/conda/envs/notebook/lib/python3.11/site-packages/jupyterlab_server/workspaces_handler.py", line 164, in save
        workspace_path.write_text(raw, encoding="utf-8")
      File "/srv/conda/envs/notebook/lib/python3.11/pathlib.py", line 1078, in write_text
        with self.open(mode='w', encoding=encoding, errors=errors, newline=newline) as f:
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/srv/conda/envs/notebook/lib/python3.11/pathlib.py", line 1044, in open
        return io.open(self, mode, buffering, encoding, errors, newline)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    OSError: [Errno 28] No space left on device: '/home/jovyan/.jupyter/lab/workspaces/auto-c-7f4b.jupyterlab-workspace'
    
    The above exception was the direct cause of the following exception:
    
    Traceback (most recent call last):
      File "/srv/conda/envs/notebook/lib/python3.11/site-packages/tornado/web.py", line 1846, in _execute
        result = method(*self.path_args, **self.path_kwargs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/srv/conda/envs/notebook/lib/python3.11/site-packages/tornado/web.py", line 3375, in wrapper
        return method(self, *args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/srv/conda/envs/notebook/lib/python3.11/site-packages/jupyterlab_server/workspaces_handler.py", line 224, in put
        raise web.HTTPError(500, str(e)) from e
    tornado.web.HTTPError: HTTP 500: Internal Server Error ([Errno 28] No space left on device: '/home/jovyan/.jupyter/lab/workspaces/auto-c-7f4b.jupyterlab-workspace')
[W 2025-11-12 23:10:24.274 LabApp] wrote error: "[Errno 28] No space left on device: '/home/jovyan/.jupyter/lab/workspaces/auto-c-7f4b.jupyterlab-workspace'"
    Traceback (most recent call last):
      File "/srv/conda/envs/notebook/lib/python3.11/site-packages/jupyterlab_server/workspaces_handler.py", line 220, in put
        self.manager.save(space_name, raw)
      File "/srv/conda/envs/notebook/lib/python3.11/site-packages/jupyterlab_server/workspaces_handler.py", line 164, in save
        workspace_path.write_text(raw, encoding="utf-8")
      File "/srv/conda/envs/notebook/lib/python3.11/pathlib.py", line 1078, in write_text
        with self.open(mode='w', encoding=encoding, errors=errors, newline=newline) as f:
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/srv/conda/envs/notebook/lib/python3.11/pathlib.py", line 1044, in open
        return io.open(self, mode, buffering, encoding, errors, newline)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    OSError: [Errno 28] No space left on device: '/home/jovyan/.jupyter/lab/workspaces/auto-c-7f4b.jupyterlab-workspace'
    
    The above exception was the direct cause of the following exception:
    
    Traceback (most recent call last):
      File "/srv/conda/envs/notebook/lib/python3.11/site-packages/tornado/web.py", line 1846, in _execute
        result = method(*self.path_args, **self.path_kwargs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/srv/conda/envs/notebook/lib/python3.11/site-packages/tornado/web.py", line 3375, in wrapper
        return method(self, *args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/srv/conda/envs/notebook/lib/python3.11/site-packages/jupyterlab_server/workspaces_handler.py", line 224, in put
        raise web.HTTPError(500, str(e)) from e
    tornado.web.HTTPError: HTTP 500: Internal Server Error ([Errno 28] No space left on device: '/home/jovyan/.jupyter/lab/workspaces/auto-c-7f4b.jupyterlab-workspace')
[I 2025-11-12 23:10:34.345 ServerApp] New terminal with automatic name: 1
[W 2025-11-12 23:10:34.980 LabApp] wrote error: "[Errno 28] No space left on device: '/home/jovyan/.jupyter/lab/workspaces/auto-c-7f4b.jupyterlab-workspace'"
    Traceback (most recent call last):
      File "/srv/conda/envs/notebook/lib/python3.11/site-packages/jupyterlab_server/workspaces_handler.py", line 220, in put
        self.manager.save(space_name, raw)
      File "/srv/conda/envs/notebook/lib/python3.11/site-packages/jupyterlab_server/workspaces_handler.py", line 164, in save
        workspace_path.write_text(raw, encoding="utf-8")
      File "/srv/conda/envs/notebook/lib/python3.11/pathlib.py", line 1078, in write_text
        with self.open(mode='w', encoding=encoding, errors=errors, newline=newline) as f:
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/srv/conda/envs/notebook/lib/python3.11/pathlib.py", line 1044, in open
        return io.open(self, mode, buffering, encoding, errors, newline)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    OSError: [Errno 28] No space left on device: '/home/jovyan/.jupyter/lab/workspaces/auto-c-7f4b.jupyterlab-workspace'
    
    The above exception was the direct cause of the following exception:
    
    Traceback (most recent call last):
      File "/srv/conda/envs/notebook/lib/python3.11/site-packages/tornado/web.py", line 1846, in _execute
        result = method(*self.path_args, **self.path_kwargs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/srv/conda/envs/notebook/lib/python3.11/site-packages/tornado/web.py", line 3375, in wrapper
        return method(self, *args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/srv/conda/envs/notebook/lib/python3.11/site-packages/jupyterlab_server/workspaces_handler.py", line 224, in put
        raise web.HTTPError(500, str(e)) from e
    tornado.web.HTTPError: HTTP 500: Internal Server Error ([Errno 28] No space left on device: '/home/jovyan/.jupyter/lab/workspaces/auto-c-7f4b.jupyterlab-workspace')

However, the server starts. So it looks like none of these are blockers to the server actually failing!

However, I filled up my cloudbank staging hub, and in logs I see:

Defaulted container "notebook" out of: notebook, block-cloud-metadata (init), block-nfs-access (init)
Activating profile: /srv/conda/etc/profile.d/conda.sh
/srv/conda/envs/notebook/lib/python3.11/subprocess.py:1016: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used
  self.stdout = io.open(c2pread, 'rb', bufsize)
INFO: Executing provided command
Traceback (most recent call last):
  File "/usr/local/bin/repo2docker-entrypoint", line 114, in <module>
    main()
  File "/usr/local/bin/repo2docker-entrypoint", line 95, in main
    tee(chunk)
  File "/usr/local/bin/repo2docker-entrypoint", line 81, in tee
    f.flush()
OSError: [Errno 28] No space left on device

So that's coming from https://github.com/jupyterhub/repo2docker/blob/main/repo2docker/buildpacks/repo2docker-entrypoint.

I think the outcome of my investigation thus is:

  1. Jupyter Server is able to recover from a full disk! this is excellent news :)
  2. repo2docker built images (that don't use a dockerfile) are not able to recover.

I think the path forward here is:

  1. Make a PR to https://github.com/jupyterhub/repo2docker/blob/main/repo2docker/buildpacks/repo2docker-entrypoint to handle the disk being full gracefully (just stop writing the logs)
  2. Merge it
  3. Verify that repo2docker-action uses the newest repo2docker
  4. Rebuild the image and see if that fixes the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make sure that user servers can start even with disks full

2 participants