Skip to content

Conversation

@John-P
Copy link
Contributor

@John-P John-P commented Feb 16, 2023

Enable easy and efficient querying of annotations within a neighbourhood of other annotations. This is planned to be executed in the query domain (e.g. by sqlite, outside of the Python GIL) where possible which should be faster than a user writing a for loop and performing many queries for this common use case.

Docs

Initial Concept

Pseudo query example

SELECT purple WITHIN 10 points of red

Neighbourhood Querying drawio

This is now implemented as:

results = store.nquery(
  where="props['color'] == 'red'",
  n_where="props['color'] == 'purple'",
  distance=10,
  # Use centre points of bounding boxes for distance
  mode="boxpoint-boxpoint",
)

Query Modes

poly-poly box-box
boxpoint-boxpoint

In this PR

  • Fix special "bbox_intersects" for DictionaryStore. This special geometry predicate was added as a special optimized case which is true if the bounding box of the query geometry intersects the bounding box of the stored geometry. This was never implemented for DicstionaryStore.
  • Add geometry predicate "bbox_centre_within" which functions similarly to the special "bbox_intersects" predicate. "bbox_centre_within" returns True if the centre of the bounding box of the stored geometry is within the query bounds.
  • Add nquery function which allows for querying for annotations which are within the neighbourhood of other annotations. This is done by supplying a query geometry and a where predicate to select the initial annotations to use as the centre of each neighbourhood. A second n_where predicate, distance, and mode argument are used to find neighbours.
    • distance is the size of the neighbourhood to search (see also mode). For a polygon-polygon check, this is via a buffer applied to the geometry, for point-point checks this is a radius, for box-box checks this is added to each side of the bounding box.
    • n_where is a predicate, similar to where, but used for filtering neighbours after selecting the annotations to query around via where.
    • mode may be "poly-poly" (full expensive polygon-polygon intersection), "box-box" (bounding box intersection) or "boxpoint-boxpoint" (centre of bounding box within distance). Currently, supported modes are only "poly-poly", "box-box", and "boxpoint-boxpoint". Others are possible, but would complicate implementation of these two sets. This would lose information about exactly which annotation was within the neighbourhood of another, but would be more efficient at finding intersections of two groups. Perhaps this could simply be shown as a 'recipe' in the documentation.

Benchmarking

Some initial benchmarking shows that SQLiteStore significantly outperforms DictionaryStore.

In this test an $n \times n$ grid of cell boundaries is generated and overlaid with another $n \times n$ grid. A query is then performed to find overlapping geometries via either any overlap of the bounding boxes (box-box) or the centre of the bounding boxes (boxpoint-boxpoint) within a distance $k$. Therefore, the largest test here where $n=100$ is a grid of $100 \times 100$ artificial cell boundaries labelled as class "A", overlaid with another $100 \times 100$ grid of cell boundaries labelled with class "B" for a total of $20,000$ geometries.

This plot is produced from runs on a 6 core Intel(R) Core(TM) i5-8500 CPU @ 3.00GHz CPU:

nquery2

It is curious that the "boxpoint-boxpoint" query mode is slower than "poly-poly". I have a feeling that it may have something to do with Shapely 2.0 optimisations where it can vectorise batch geometry operations, or maybe there is some condition for the polygon-polygon intersection that allows for fast failing of tests.

Notes

  • It is already much faster for SQLiteStore but this could be likely improved further by using a subquery.
  • The implementation currently performs one query to get all annotations which will be the centre of $m$ secondary queries performed via a loop in Python. This could be refactored to be a nested sqlite subquery which will likely be faster. I think this optimisation could be a new PR though as this basic functionality is already usable.
  • This is currently implemented in the base class, so all subclasses get it for 'free'.
  • Another approach which may be faster for some cases (but more limited) is to query for initial annotations, then apply a distance buffer, merge overlapping geometries (reduce total number assuming many overlaps), query for all candidate matches once, compute intesection.

To-Do

  • Test coverage (it's as good as I can get it for now)
  • Fix linter issues
  • Add diagrams to docstrings?
  • Add support for um units as distance? (Decided to do in a follow up PR)
  • Improve docstrings and add more examples? (suggestions welcome)
  • More tests? (suggestions welcome)

Adds a new method for performing neighbourhood queries to an annotation store.
@John-P John-P added the enhancement New feature or request label Feb 16, 2023
@John-P John-P requested a review from measty February 16, 2023 15:31
@John-P John-P marked this pull request as draft February 16, 2023 15:32
@John-P John-P requested a review from shaneahmed February 16, 2023 15:32
@John-P John-P changed the title ✨ Add Neighbourhood Querying Support To AnnotationStore ✨ WIP Add Neighbourhood Querying Support To AnnotationStore Feb 16, 2023
@codecov
Copy link

codecov bot commented Feb 16, 2023

Codecov Report

Merging #540 (0c9f6f3) into develop (bb78d2e) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff            @@
##           develop     #540   +/-   ##
========================================
  Coverage    99.74%   99.74%           
========================================
  Files           63       63           
  Lines         6744     6781   +37     
  Branches      1107     1117   +10     
========================================
+ Hits          6727     6764   +37     
  Misses           8        8           
  Partials         9        9           
Impacted Files Coverage Δ
tiatoolbox/annotation/storage.py 99.63% <100.00%> (+0.01%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@John-P John-P changed the title ✨ WIP Add Neighbourhood Querying Support To AnnotationStore ✨ WIP: Add Neighbourhood Querying Support To AnnotationStore Feb 17, 2023
@shaneahmed
Copy link
Member

Please can you also fix documentation issue as highlighted here #532

@shaneahmed shaneahmed mentioned this pull request Mar 3, 2023
4 tasks
@John-P John-P linked an issue Mar 10, 2023 that may be closed by this pull request
@measty
Copy link
Collaborator

measty commented Mar 24, 2023

Looks good! also tested it out with a real world example:
image
Glands with at least 1 neutrophil nearby

@John-P
Copy link
Contributor Author

John-P commented Mar 24, 2023

Looks good! also tested it out with a real world example: image Glands with at least 1 neutrophil nearby

Nice! This might be nice to put in the docs as an example. Perhaps with all glands + neutrophils shown also.

@shaneahmed shaneahmed added this to the Release v1.4.0 milestone Apr 10, 2023
- Fix missing eol
Copy link
Member

@shaneahmed shaneahmed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @John-P. I have made an online commit to fix eol.
@measty Please let me know when you are happy with this PR. I will merge it then.

Copy link
Collaborator

@measty measty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've had another look through, and made one comment about the boxpoint-boxpoint implementation which may or may not help. But its purely to do with efficiency and not functionality, so am happy to approve this as is. If the change in the comment actually improves the benchmark, it could be incuded in this PR before its merged, or added in a separate one.

@John-P John-P requested review from measty and shaneahmed May 3, 2023 18:14
Copy link
Collaborator

@measty measty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updates look good, happy to approve

@shaneahmed shaneahmed merged commit 6039197 into develop May 4, 2023
@shaneahmed shaneahmed deleted the feature-neighbourhood-query branch May 4, 2023 13:48
This was referenced May 5, 2023
shaneahmed added a commit that referenced this pull request May 5, 2023
## 1.4.0 (2023-04-24)

### Major Updates and Feature Improvements

- Adds Python 3.11 support \[experimental\] #500
  - Python 3.11 is not fully supported by `pytorch` pytorch/pytorch#86566 and `openslide` openslide/openslide-python#188
- Removes Python 3.7 support
  - This allows upgrading all the dependencies which were dependent on an older version of Python.
- Adds Neighbourhood Querying Support To AnnotationStore #540
  - This enables easy and efficient querying of annotations within a neighbourhood of other annotations.
- Adds `MultiTaskSegmentor` engine #424
- Fixes an issue with stain augmentation to apply augmentation to only tissue regions.
  - #546 contributed by @navidstuv
- Filters logger output to stdout instead of stderr.
  - Fixes #255
- Allows import of some modules at higher level for improved usability
  - `WSIReader` can now be imported as `from tiatoolbox.wsicore import WSIReader`
  - `WSIMeta` can now be imported as `from tiatoolbox.wsicore import WSIMeta`
  - `HoVerNet`, `HoVerNetPlus`, `IDaRS`, `MapDe`, `MicroNet`, `NuClick`, `SCCNN` can now be imported as \`from tiatoolbox.models import HoVerNet, HoVerNetPlus, IDaRS, MapDe, MicroNet, NuClick, SCCNN
- Improves `PatchExtractor` performance. Updates `WSIPatchDataset` to be consistent. #571
- Updates documentation for `License` for clarity on source code and model weights license.

### Changes to API

- Updates SCCNN architecture to make it consistent with other models. #544

### Bug Fixes and Other Changes

- Fixes Parsing Missing Omero Version NGFF Metadata #568
  - Fixes #535 raised by @benkamphaus
- Fixes reading of DICOM WSIs at the correct level #564
  - Fixes #529
- Fixes `scipy`, `matplotlib`, `scikit-image` deprecated code
- Fixes breaking changes in `DICOMWSIReader` to make it compatible with latest `wsidicom` version. #539, #580
- Updates `shapely` dependency to version >=2.0.0 and fixes any breaking changes.
- Fixes bug with `DictionaryStore.bquery` and `geometry=None`, i.e. only a where predicate given.
  - Partly Fixes #532 raised by @blaginin
- Fixes local tests for Windows/Linux
- Fixes `flake8`, `deepsource` errors.
- Uses `logger` instead of `warnings` and `print` statements to properly log runs.

### Development related changes

- Upgrades dependencies which are dependent on Python 3.7
- Moves `requirements*.txt` files to `requirements` folder
- Removes `tox`
- Uses `pyproject.toml` for `bdist_wheel`, `pytest` and `isort`
- Adds `joblib` and `numba` as dependencies.
shaneahmed added a commit that referenced this pull request May 5, 2023
## 1.4.0 (2023-04-24)

### Major Updates and Feature Improvements

- Adds Python 3.11 support \[experimental\] #500
  - Python 3.11 is not fully supported by `pytorch` pytorch/pytorch#86566 and `openslide` openslide/openslide-python#188
- Removes Python 3.7 support
  - This allows upgrading all the dependencies which were dependent on an older version of Python.
- Adds Neighbourhood Querying Support To AnnotationStore #540
  - This enables easy and efficient querying of annotations within a neighbourhood of other annotations.
- Adds `MultiTaskSegmentor` engine #424
- Fixes an issue with stain augmentation to apply augmentation to only tissue regions.
  - #546 contributed by @navidstuv
- Filters logger output to stdout instead of stderr.
  - Fixes #255
- Allows import of some modules at higher level for improved usability
  - `WSIReader` can now be imported as `from tiatoolbox.wsicore import WSIReader`
  - `WSIMeta` can now be imported as `from tiatoolbox.wsicore import WSIMeta`
  - `HoVerNet`, `HoVerNetPlus`, `IDaRS`, `MapDe`, `MicroNet`, `NuClick`, `SCCNN` can now be imported as \`from tiatoolbox.models import HoVerNet, HoVerNetPlus, IDaRS, MapDe, MicroNet, NuClick, SCCNN
- Improves `PatchExtractor` performance. Updates `WSIPatchDataset` to be consistent. #571
- Updates documentation for `License` for clarity on source code and model weights license.

### Changes to API

- Updates SCCNN architecture to make it consistent with other models. #544

### Bug Fixes and Other Changes

- Fixes Parsing Missing Omero Version NGFF Metadata #568
  - Fixes #535 raised by @benkamphaus
- Fixes reading of DICOM WSIs at the correct level #564
  - Fixes #529
- Fixes `scipy`, `matplotlib`, `scikit-image` deprecated code
- Fixes breaking changes in `DICOMWSIReader` to make it compatible with latest `wsidicom` version. #539, #580
- Updates `shapely` dependency to version >=2.0.0 and fixes any breaking changes.
- Fixes bug with `DictionaryStore.bquery` and `geometry=None`, i.e. only a where predicate given.
  - Partly Fixes #532 raised by @blaginin
- Fixes local tests for Windows/Linux
- Fixes `flake8`, `deepsource` errors.
- Uses `logger` instead of `warnings` and `print` statements to properly log runs.

### Development related changes

- Upgrades dependencies which are dependent on Python 3.7
- Moves `requirements*.txt` files to `requirements` folder
- Removes `tox`
- Uses `pyproject.toml` for `bdist_wheel`, `pytest` and `isort`
- Adds `joblib` and `numba` as dependencies.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Inconsistent API / Broken LSP

4 participants