Add optional data mounting #149

eleftherioszisis · 2025-11-07T08:21:17Z

Allow specifying a local asset store when instantiating a Client. The mount location will be used to symlink the files from the assets instead of downloading, if they exist.

Example:

from entitysdk import Client, LocalAssetStore, models
client = Client(..., local_store=LocalAssetStore(prefix="/data"))
client.download_file(...)

If the asset.full_path exists in /data/{full_path} then the file is not downloaded via the api but it is instead symlinked. This works for both client.download_content, client.download_file and all methods that use them.

codecov · 2025-11-07T14:02:37Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

Flag	Coverage Δ
pytest	`100.00% <100.00%> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
src/entitysdk/__init__.py	`100.00% <100.00%> (ø)`
src/entitysdk/client.py	`100.00% <100.00%> (ø)`
src/entitysdk/core.py	`100.00% <100.00%> (ø)`
src/entitysdk/store.py	`100.00% <100.00%> (ø)`

... and 9 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

james-isbister · 2025-11-07T15:20:58Z

Seems like a nice simple way of specifying the data should be mounted.

Also it seems quite handy that it only creates symlinks if the file exists in the mounted directory. I remember there was some discussion that in some cases we might want to raise an error if the mount doesn't exist? Maybe this behaviour could be specified as an optional bool in the DataMount constructor, if this behaviour is favourable in some cases? If you guys recommend this, it would be better for the DataMount object to do this, then having to do this check manually in every obi-one endpoint I think.

Also do we need some custom logic for the staging functions? At the moment would the staging functions create symlinks when they call download functions internally, and then they'd try to edit the files at those symlinks? Rather we probably want that symlinks are first created through the download functions, but then in the editing step (i.e. for the circuit_config.json), the symlink is deleted and the files to be edited are copied to the specified download locations, and then edited?

eleftherioszisis · 2025-11-08T10:39:40Z

Also it seems quite handy that it only creates symlinks if the file exists in the mounted directory. I remember there was some discussion that in some cases we might want to raise an error if the mount doesn't exist?

When a DataMount is created it checks if the prefix (/data) passed to it exists, otherwise it raises. Is that what you mean? Or do you mean it should raise if a file does not exist in the mount?

Also do we need some custom logic for the staging functions? At the moment would the staging functions create symlinks when they call download functions internally, and then they'd try to edit the files at those symlinks? Rather we probably want that symlinks are first created through the download functions, but then in the editing step (i.e. for the circuit_config.json), the symlink is deleted and the files to be edited are copied to the specified download locations, and then edited?

If there is such a case, the staging functions should not work directly on input data (links) but rather separate inputs/outputs. iirc for circuit_config.json specifically I think it is downloaded with client.download_content, in which case the file from the DataMount is used to read the bytes of the file but there is no symlink made.

In any case, given that the data/ mount point will be read only, if the staging trying to modify these files, it will raise an error.

james-isbister · 2025-11-10T07:34:03Z

When a DataMount is created it checks if the prefix (/data) passed to it exists, otherwise it raises

Yes that's what I meant. Seems good if that's always behaviour then

If there is such a case, the staging functions should not work directly on input data (links) but rather separate inputs/outputs. iirc for circuit_config.json specifically I think it is downloaded with client.download_content, in which case the file from the DataMount is used to read the bytes of the file but there is no symlink made.

Yes, I didn't mean that the data in the mount would be edited, but rather a file would be created at the path of the symlink (not at the path the symlink points to). But I see now the distinction that the files which are edited during staging use download_content rather than download_file. At least stage_simulation uses download_content for the simulation_config: https://github.com/openbraininstitute/entitysdk/blob/04f80af09e9e93c2ad9a4b09c4f8407737fe4978/src/entitysdk/staging/simulation.py#L55C4-L55C5

This seems like a good pattern, thanks for clarrifying!

mgeplf

The implementation looks fine; my concern is that the word Mount doesn't express the intent of what is going on - or at least it confused me at first.

To me, mounting is the act of attaching a filesystem. What is happening here can happen regardless of where the data originates - it's more of a AssetStore or AssetDepot, or perhaps a cache or an indirection?

I hate to nitpick about this, as it's quite an invasive change. Perhaps better a better description would help to make it more clear?

eleftherioszisis · 2025-11-12T14:08:58Z

The implementation looks fine; my concern is that the word Mount doesn't express the intent of what is going on - or at least it confused me at first.

To me, mounting is the act of attaching a filesystem. What is happening here can happen regardless of where the data originates - it's more of a AssetStore or AssetDepot, or perhaps a cache or an indirection?

I hate to nitpick about this, as it's quite an invasive change. Perhaps better a better description would help to make it more clear?

I couldn't come up with a good enough name that describes the role of that object. I think DataCache feels closer to its true meaning.

mgeplf · 2025-11-12T14:18:34Z

I couldn't come up with a good enough name that describes the role of that object. I think DataCache feels closer to its true meaning.

Ok. I'd still be tempted to name it something with Asset in the name, since it's only assets being saved, not general data.

I'm willing to do the change.

eleftherioszisis · 2025-11-12T14:49:37Z

After some discussion we converged to LocalAssetStore
If no objections we'll go with that.

eleftherioszisis added 7 commits November 6, 2025 12:51

Add support for a data mount

2f7f44f

Add data mount support for download file and content

b7245cc

Fix test

70a2c3f

More tests

8602cf1

Fix lint

d0ecdd9

Expose DataMount

fe4caff

Fix py310

a509418

eleftherioszisis marked this pull request as ready for review November 7, 2025 14:02

eleftherioszisis requested a review from GianlucaFicarelli November 7, 2025 14:02

eleftherioszisis requested review from james-isbister and mgeplf November 7, 2025 14:02

eleftherioszisis mentioned this pull request Nov 10, 2025

Optionally use mounted data for EntitySDK staging and downloading #137

Closed

mgeplf reviewed Nov 12, 2025

View reviewed changes

eleftherioszisis added 2 commits November 12, 2025 16:18

Rename DataMount -> LocalAssetStore

5336b12

Finishing touches.

3cbced5

mgeplf approved these changes Nov 13, 2025

View reviewed changes

eleftherioszisis merged commit ab29917 into main Nov 13, 2025
12 checks passed

eleftherioszisis deleted the mount branch November 13, 2025 07:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add optional data mounting #149

Add optional data mounting #149

Uh oh!

eleftherioszisis commented Nov 7, 2025 •

edited

Loading

Uh oh!

codecov bot commented Nov 7, 2025 •

edited

Loading

Uh oh!

james-isbister commented Nov 7, 2025 •

edited

Loading

Uh oh!

eleftherioszisis commented Nov 8, 2025 •

edited

Loading

Uh oh!

james-isbister commented Nov 10, 2025

Uh oh!

mgeplf left a comment

Uh oh!

eleftherioszisis commented Nov 12, 2025

Uh oh!

mgeplf commented Nov 12, 2025

Uh oh!

eleftherioszisis commented Nov 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add optional data mounting #149

Add optional data mounting #149

Uh oh!

Conversation

eleftherioszisis commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

james-isbister commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eleftherioszisis commented Nov 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

james-isbister commented Nov 10, 2025

Uh oh!

mgeplf left a comment

Choose a reason for hiding this comment

Uh oh!

eleftherioszisis commented Nov 12, 2025

Uh oh!

mgeplf commented Nov 12, 2025

Uh oh!

eleftherioszisis commented Nov 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

eleftherioszisis commented Nov 7, 2025 •

edited

Loading

codecov bot commented Nov 7, 2025 •

edited

Loading

james-isbister commented Nov 7, 2025 •

edited

Loading

eleftherioszisis commented Nov 8, 2025 •

edited

Loading