-
Notifications
You must be signed in to change notification settings - Fork 101
✨ ENH: Add Neighbourhood Querying Support To AnnotationStore
#540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Adds a new method for performing neighbourhood queries to an annotation store.
AnnotationStoreAnnotationStore
Codecov Report
@@ Coverage Diff @@
## develop #540 +/- ##
========================================
Coverage 99.74% 99.74%
========================================
Files 63 63
Lines 6744 6781 +37
Branches 1107 1117 +10
========================================
+ Hits 6727 6764 +37
Misses 8 8
Partials 9 9
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
AnnotationStoreAnnotationStore
|
Please can you also fix documentation issue as highlighted here #532 |
- Fix missing eol
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
measty
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've had another look through, and made one comment about the boxpoint-boxpoint implementation which may or may not help. But its purely to do with efficiency and not functionality, so am happy to approve this as is. If the change in the comment actually improves the benchmark, it could be incuded in this PR before its merged, or added in a separate one.
Storing as int would cause some queries on points to be inaccurate.
measty
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The updates look good, happy to approve
## 1.4.0 (2023-04-24) ### Major Updates and Feature Improvements - Adds Python 3.11 support \[experimental\] #500 - Python 3.11 is not fully supported by `pytorch` pytorch/pytorch#86566 and `openslide` openslide/openslide-python#188 - Removes Python 3.7 support - This allows upgrading all the dependencies which were dependent on an older version of Python. - Adds Neighbourhood Querying Support To AnnotationStore #540 - This enables easy and efficient querying of annotations within a neighbourhood of other annotations. - Adds `MultiTaskSegmentor` engine #424 - Fixes an issue with stain augmentation to apply augmentation to only tissue regions. - #546 contributed by @navidstuv - Filters logger output to stdout instead of stderr. - Fixes #255 - Allows import of some modules at higher level for improved usability - `WSIReader` can now be imported as `from tiatoolbox.wsicore import WSIReader` - `WSIMeta` can now be imported as `from tiatoolbox.wsicore import WSIMeta` - `HoVerNet`, `HoVerNetPlus`, `IDaRS`, `MapDe`, `MicroNet`, `NuClick`, `SCCNN` can now be imported as \`from tiatoolbox.models import HoVerNet, HoVerNetPlus, IDaRS, MapDe, MicroNet, NuClick, SCCNN - Improves `PatchExtractor` performance. Updates `WSIPatchDataset` to be consistent. #571 - Updates documentation for `License` for clarity on source code and model weights license. ### Changes to API - Updates SCCNN architecture to make it consistent with other models. #544 ### Bug Fixes and Other Changes - Fixes Parsing Missing Omero Version NGFF Metadata #568 - Fixes #535 raised by @benkamphaus - Fixes reading of DICOM WSIs at the correct level #564 - Fixes #529 - Fixes `scipy`, `matplotlib`, `scikit-image` deprecated code - Fixes breaking changes in `DICOMWSIReader` to make it compatible with latest `wsidicom` version. #539, #580 - Updates `shapely` dependency to version >=2.0.0 and fixes any breaking changes. - Fixes bug with `DictionaryStore.bquery` and `geometry=None`, i.e. only a where predicate given. - Partly Fixes #532 raised by @blaginin - Fixes local tests for Windows/Linux - Fixes `flake8`, `deepsource` errors. - Uses `logger` instead of `warnings` and `print` statements to properly log runs. ### Development related changes - Upgrades dependencies which are dependent on Python 3.7 - Moves `requirements*.txt` files to `requirements` folder - Removes `tox` - Uses `pyproject.toml` for `bdist_wheel`, `pytest` and `isort` - Adds `joblib` and `numba` as dependencies.
## 1.4.0 (2023-04-24) ### Major Updates and Feature Improvements - Adds Python 3.11 support \[experimental\] #500 - Python 3.11 is not fully supported by `pytorch` pytorch/pytorch#86566 and `openslide` openslide/openslide-python#188 - Removes Python 3.7 support - This allows upgrading all the dependencies which were dependent on an older version of Python. - Adds Neighbourhood Querying Support To AnnotationStore #540 - This enables easy and efficient querying of annotations within a neighbourhood of other annotations. - Adds `MultiTaskSegmentor` engine #424 - Fixes an issue with stain augmentation to apply augmentation to only tissue regions. - #546 contributed by @navidstuv - Filters logger output to stdout instead of stderr. - Fixes #255 - Allows import of some modules at higher level for improved usability - `WSIReader` can now be imported as `from tiatoolbox.wsicore import WSIReader` - `WSIMeta` can now be imported as `from tiatoolbox.wsicore import WSIMeta` - `HoVerNet`, `HoVerNetPlus`, `IDaRS`, `MapDe`, `MicroNet`, `NuClick`, `SCCNN` can now be imported as \`from tiatoolbox.models import HoVerNet, HoVerNetPlus, IDaRS, MapDe, MicroNet, NuClick, SCCNN - Improves `PatchExtractor` performance. Updates `WSIPatchDataset` to be consistent. #571 - Updates documentation for `License` for clarity on source code and model weights license. ### Changes to API - Updates SCCNN architecture to make it consistent with other models. #544 ### Bug Fixes and Other Changes - Fixes Parsing Missing Omero Version NGFF Metadata #568 - Fixes #535 raised by @benkamphaus - Fixes reading of DICOM WSIs at the correct level #564 - Fixes #529 - Fixes `scipy`, `matplotlib`, `scikit-image` deprecated code - Fixes breaking changes in `DICOMWSIReader` to make it compatible with latest `wsidicom` version. #539, #580 - Updates `shapely` dependency to version >=2.0.0 and fixes any breaking changes. - Fixes bug with `DictionaryStore.bquery` and `geometry=None`, i.e. only a where predicate given. - Partly Fixes #532 raised by @blaginin - Fixes local tests for Windows/Linux - Fixes `flake8`, `deepsource` errors. - Uses `logger` instead of `warnings` and `print` statements to properly log runs. ### Development related changes - Upgrades dependencies which are dependent on Python 3.7 - Moves `requirements*.txt` files to `requirements` folder - Removes `tox` - Uses `pyproject.toml` for `bdist_wheel`, `pytest` and `isort` - Adds `joblib` and `numba` as dependencies.

Enable easy and efficient querying of annotations within a neighbourhood of other annotations. This is planned to be executed in the query domain (e.g. by sqlite, outside of the Python GIL) where possible which should be faster than a user writing a for loop and performing many queries for this common use case.
Docs
Initial Concept
Pseudo query example
This is now implemented as:
Query Modes
In this PR
DictionaryStore. This special geometry predicate was added as a special optimized case which is true if the bounding box of the query geometry intersects the bounding box of the stored geometry. This was never implemented forDicstionaryStore.nqueryfunction which allows for querying for annotations which are within the neighbourhood of other annotations. This is done by supplying a query geometry and a where predicate to select the initial annotations to use as the centre of each neighbourhood. A secondn_wherepredicate,distance, andmodeargument are used to find neighbours.distanceis the size of the neighbourhood to search (see alsomode). For a polygon-polygon check, this is via a buffer applied to the geometry, for point-point checks this is a radius, for box-box checks this is added to each side of the bounding box.n_whereis a predicate, similar towhere, but used for filtering neighbours after selecting the annotations to query around viawhere.modemay be "poly-poly" (full expensive polygon-polygon intersection), "box-box" (bounding box intersection) or "boxpoint-boxpoint" (centre of bounding box withindistance). Currently, supported modes are only "poly-poly", "box-box", and "boxpoint-boxpoint". Others are possible, but would complicate implementation of these two sets. This would lose information about exactly which annotation was within the neighbourhood of another, but would be more efficient at finding intersections of two groups. Perhaps this could simply be shown as a 'recipe' in the documentation.Benchmarking
Some initial benchmarking shows that
SQLiteStoresignificantly outperformsDictionaryStore.In this test an$n \times n$ grid of cell boundaries is generated and overlaid with another $n \times n$ grid. A query is then performed to find overlapping geometries via either any overlap of the bounding boxes (box-box) or the centre of the bounding boxes (boxpoint-boxpoint) within a distance $k$ . Therefore, the largest test here where $n=100$ is a grid of $100 \times 100$ artificial cell boundaries labelled as class "A", overlaid with another $100 \times 100$ grid of cell boundaries labelled with class "B" for a total of $20,000$ geometries.
This plot is produced from runs on a 6 core Intel(R) Core(TM) i5-8500 CPU @ 3.00GHz CPU:
It is curious that the "boxpoint-boxpoint" query mode is slower than "poly-poly". I have a feeling that it may have something to do with Shapely 2.0 optimisations where it can vectorise batch geometry operations, or maybe there is some condition for the polygon-polygon intersection that allows for fast failing of tests.
Notes
To-Do