NEW: Annotation Stores #135

John-P · 2021-08-19T22:59:32Z

Add classes which act as a mutable mapping (dictionary like) structure and enables efficient management of annotations. An annotation here is defined as a geometry and some associated JSON data. Currently, supported features are:

A common interface for multiple backends:
- SQLite (with an rtree index for fast spatial queries)
- Pure Python dict (stored as geoJSON on disk)
Serialisation and deserialisation to and from disk
Compression (SQLite only)
Conversion to and from other formats:
- geoJSON
- JSON Lines / ndjson
- Pandas dataframe
Spatial queries using a bounding box or shapely Polygon
Customisable binary shape predicate (defaults to intersection)
Query predicates on properties using:
- A subset of python (fastest, compatible across backends with some limitations)
- A pickled function
- A python callable (slowest but easiest and can be any Python code)
Custom indexes (SQLite only) for accelerated queries

Example (`SQLiteStore`)

from shapely.geometry import Polygon
from tiatoolbox.annotation.storage import SQLiteStore, Annotation

store = SQLiteStore("polygons.db")
# Create a test geometry (polygon, point, or line string)
triangle = Polygon([(0, 0), (1, 1), (0, 2)])

# Store an annotation geometry with a class label
key = store.append(Annotation(triangle, {"class": 1})) 
# The value returned in a unique key (UUID4 by default)

# Get the stored annotation (tuple of geometry and properties dict returned)
print(store[index])

# Store an annotation geometry with a custom key
store.append(Annotation(triangle, key="foo")) 
print(store["foo"])
# or use __setitem__ syntax:
store[key] = Annotation(triangle)

# Change the properties of an annotation in the store
store.patch(key, {"class": 2})

# Query in a bounding box
results = store.query([0, 0, 128, 128])

# Query using any polygon
results = store.query(Polygon.from_bounds([0, 0, 128, 128]))

# Query in a bounding box but just return only the indexes (rather than the polygons and properties)
# This can be much faster than returning geometries or doing a geometry query
# as the indexes are small and neither the geometries nor the properties have to 
# be decoded for the query to run.
results = store.query_index([0, 0, 128, 128])

# Query with a predicate statement
results = store.query([0, 0, 128, 128], where="props['class']==4")

# Create an index (SQLite only)
# Can give significant speedup (100x in some tests) even for simple property access e.g.
key = store.create_index("example_index", "props['class']==4")

To-Dos

tiatoolbox/annotation/storage.py

tests/test_annotation_stores.py

tiatoolbox/annotation/storage.py

tests/test_annotation_stores.py

tiatoolbox/annotation/storage.py

John-P · 2021-08-27T10:46:10Z

At the moment export to various formats is done via sub-classes with different load and dump. I am considering changing this to be one class but with to_format and from_format functions like with Pandas DataFrames. This would make it easier to read in one format and output to another. This could currently be done by changing the class after reading (casting) but this is a pain to do in python.

vqdang · 2021-08-27T17:31:07Z

Escalate to here for tracking. Basically, I want to do this but I haven't found a way to via the current API.

inst_dict = {
  UUID: {
     'box' : number[],
     'contour' : number[],
  }
}
store  = SQLite3RTreeStore('dumb.db')
store.remove(list_of_uuid)
store.append(list_of_uuid)

Given the sample API in the OP. I guess you would like users to do this?

box_store  = SQLite3RTreeStore('dumb.db')
box_store.append([v['box'] for v in inst_dict])

contour_store  = SQLite3RTreeStore('dumb2.db')
contour_store.append([v['contour'] for v in inst_dict])

But removal will require UUID to sync. Also, may need to write to the store based on UUID rather than simply appending any geometries.

John-P · 2021-08-27T21:09:45Z

Escalate to here for tracking. Basically, I want to do this but I haven't found a way to via the current API.
inst_dict = {
  UUID: {
     'box' : number[],
     'contour' : number[],
  }
}
store  = SQLite3RTreeStore('dumb.db')
store.remove(list_of_uuid)
store.append(list_of_uuid)
...

You can currently add a list of polygons. However, you cannot add a list of just indexes/UUIDs because the rtree data structure requires the geometry. You can do something equivalent like this:

store  = SQLite3RTreeStore('dumb.db')
list_of_ids = store.append(list_of_polygons)
store.remove(list_of_ids)

I haven't exposed a way to let you specify the ID at the moment as this is generated as a hash of the geometry when appending to avoid duplicate geometries. This could be changed to be UUIDs instead or a manual ID could be allowed but would require some extra error handling etc.

In your above sample, the geometry would not be serialised to disk. Therefore it would not be known when loaded again later and it would not be possible to spatially query the data. Additionally, storing the geometry using the class (rather than just using a bounding box) allows for optimised polygon intersection queries.

John-P · 2021-08-27T21:20:36Z

...
Given the sample API in the OP. I guess you would like users to do this?

box_store  = SQLite3RTreeStore('dumb.db')
box_store.append([v['box'] for v in inst_dict])

contour_store  = SQLite3RTreeStore('dumb2.db')
contour_store.append([v['contour'] for v in inst_dict])

...

Yes, the way it is currently implemented you could either make two stores (one for boxes and one for polygons), or you could store them as separate annotations in the same store. However, I am unsure why you are needing to store both. You can get the bounding box from the polygon via polygon.bounds. The bounding box is already stored for the rtree indexing (this is found at append time via polygon.bounds). This could be exposed if is useful. However, it would add complexity where I cannot currently see why this is required.

John-P · 2021-08-27T22:08:39Z

...
But removal will require UUID to sync. Also, may need to write to the store based on UUID rather than simply appending any geometries.

It sounds like you want something to act as an rtree index without storing the geometry and for you to handle the annotation in memory in a separate structure such as a list or dictionary. For this I would suggest simply using an RTree class (as in shapely or the rtree package). I could add a class to do this in memory with sqlite if you like (essentially the same as the current class but with no storage of geometry or properties to disk, just the rtree and an ID string). However, you would be losing the benefits of optimised storage go geometry on disk, and fast queries on large numbers of annotations (more than could fit in memory) etc.

This PR is more aimed at creating a way to read and write a large number of annotation to and from disk efficiently (fast and in a with low memory usage so that you can work with more annotations that would fit in memory at once). Here the store class effectively is your dict of geometries and properties in this case. It handles the generation of IDs, spatial indexing and reading and writing from disk for you. You would only keep another list of dict in memory as a working set e.g. for performing operations on a subset of the annotations before updating the store class.

tiatoolbox/annotation/storage.py

shaneahmed

Thanks @John-P
Please make the requested changes. we can merge then.

…/tiatoolbox into feature-annotation-store

New line at the end of docstring.

shaneahmed

Thanks @john

### Major Updates and Feature Improvements - Adds nucleus instance segmentation base class - Adds [HoVerNet](https://www.sciencedirect.com/science/article/abs/pii/S1361841519301045) architecture - Adds multi-task segmentor [HoVerNet+](https://arxiv.org/abs/2108.13904) model - Adds <a href="https://www.thelancet.com/journals/landig/article/PIIS2589-7500(2100180-1/fulltext">IDaRS</a> pipeline - Adds [SlideGraph](https://arxiv.org/abs/2110.06042) pipeline - Adds PCam patch classification models - Adds support for stain augmentation feature - Adds classes and functions under `tiatoolbox.tools.graph` to enable construction of graphs in a format which can be used with PyG (PyTorch Geometric). - Add classes which act as a mutable mapping (dictionary like) structure and enables efficient management of annotations. (#135) - Adds example notebook for adding advanced models - Adds classes which can generate zoomify tiles from a WSIReader object. - Adds WSI viewer using Zoomify/WSIReader API (#212) - Adds README to example page for clarity - Adds support to override or specify mpp and power ### Changes to API - Replaces `models.controller` API with `models.engine` - Replaces `CNNPatchPredictor` with `PatchPredictor` ### Bug Fixes and Other Changes - Fixes Fix `filter_coordinates` read wrong resolutions for patch extraction - For `PatchPredictor` - `ioconfig` will supersede everything - if `ioconfig` is not provided - If `model` is pretrained (defined in `pretrained_model.yaml` ) - Use the yaml ioconfig - Any other input patch reading arguments will overwrite the yaml ioconfig (at the same keyword). - If `model` is not defined, all input patch reading arguments must be provided else exception will be thrown. - Improves performance of mask based patch extraction ### Development related changes - Improve tests performance for Travis runs - Adds feature detection mechanism to detect the platform and installed packages etc. - On demand imports for some libraries for performance - Improves performance of mask based patch extraction Co-authored-by: Shan Raza <[email protected]>

@tialab

- Bump version: 0.8.0 → 1.0.0 ### Major Updates and Feature Improvements - Adds nucleus instance segmentation base class - Adds [HoVerNet](https://www.sciencedirect.com/science/article/abs/pii/S1361841519301045) architecture - Adds multi-task segmentor [HoVerNet+](https://arxiv.org/abs/2108.13904) model - Adds <a href="https://www.thelancet.com/journals/landig/article/PIIS2589-7500(2100180-1/fulltext">IDaRS</a> pipeline - Adds [SlideGraph](https://arxiv.org/abs/2110.06042) pipeline - Adds PCam patch classification models - Adds support for stain augmentation feature - Adds classes and functions under `tiatoolbox.tools.graph` to enable construction of graphs in a format which can be used with PyG (PyTorch Geometric). - Add classes which act as a mutable mapping (dictionary like) structure and enables efficient management of annotations. (#135) - Adds example notebook for adding advanced models - Adds classes which can generate zoomify tiles from a WSIReader object. - Adds WSI viewer using Zoomify/WSIReader API (#212) - Adds README to example page for clarity - Adds support to override or specify mpp and power ### Changes to API - Replaces `models.controller` API with `models.engine` - Replaces `CNNPatchPredictor` with `PatchPredictor` ### Bug Fixes and Other Changes - Fixes Fix `filter_coordinates` read wrong resolutions for patch extraction - For `PatchPredictor` - `ioconfig` will supersede everything - if `ioconfig` is not provided - If `model` is pretrained (defined in `pretrained_model.yaml` ) - Use the yaml ioconfig - Any other input patch reading arguments will overwrite the yaml ioconfig (at the same keyword). - If `model` is not defined, all input patch reading arguments must be provided else exception will be thrown. - Improves performance of mask based patch extraction ### Development related changes - Improve tests performance for Travis runs - Adds feature detection mechanism to detect the platform and installed packages etc. - On demand imports for some libraries for performance - Improves performance of mask based patch extraction Co-authored-by: @tialab

John-P and others added 2 commits August 13, 2021 21:45

DEV: Add Shapely Dependency

a28dcc9

NEW: Add Annotation Store Classes

afba277

John-P marked this pull request as draft August 19, 2021 23:00

shaneahmed added the enhancement New feature or request label Aug 20, 2021

BUG: Fix Indexing In DictionaryStore getitem

0cee4fc

shaneahmed reviewed Aug 20, 2021

View reviewed changes

tiatoolbox/annotation/storage.py Show resolved Hide resolved

shaneahmed reviewed Aug 20, 2021

View reviewed changes

tests/test_annotation_stores.py Show resolved Hide resolved

shaneahmed reviewed Aug 20, 2021

View reviewed changes

tiatoolbox/annotation/storage.py Show resolved Hide resolved

shaneahmed assigned John-P Aug 20, 2021

Merge branch 'develop' into feature-annotation-store

d525fbf

John-P added this to the Release v1.0.0 milestone Aug 25, 2021

John-P added 11 commits August 26, 2021 00:09

ENH: Add Call To Super Init From Subclasses

d0d7a5e

ENH: Avoid Shadowing Variable In Scope

70e0ca0

EHN: Change Dict Generators To Comprehensions

fd05039

ENH: Remove Redundant list()

f5b9f0d

ENH: Make Overridden Method Signature Match

793615d

DOC: Remove Docstring Whitespace

9585f49

MAINT: Remove Unnecessary Literal

ac7e49b

DOC: Add Docstring Blank End Lines

f456ed7

MAINT: Rename Import To Abovoid Shadowing io

c0db627

ENH: Refactor Dict Comprehension

a293670

MAINT: Add Guard To Next In Comprehension

09f8665

shaneahmed requested a review from ghadjigeorghiou August 27, 2021 05:26

shaneahmed reviewed Aug 27, 2021

View reviewed changes

tests/test_annotation_stores.py Show resolved Hide resolved

shaneahmed reviewed Aug 27, 2021

View reviewed changes

tiatoolbox/annotation/storage.py Outdated Show resolved Hide resolved