-
Notifications
You must be signed in to change notification settings - Fork 102
ENH: Improve Mask-Based Patch Extraction Speed & Add Units Conversion #217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…enhance-patchextraction
Codecov Report
@@ Coverage Diff @@
## develop #217 +/- ##
===========================================
+ Coverage 99.81% 99.85% +0.03%
===========================================
Files 53 53
Lines 4870 5346 +476
Branches 800 924 +124
===========================================
+ Hits 4861 5338 +477
Misses 2 2
+ Partials 7 6 -1
Continue to review full report at Codecov.
|
tiatoolbox/utils/transforms.py
Outdated
| return np.add(bounds, padding * signs) | ||
|
|
||
|
|
||
| def convert_resolution(input_res, input_unit, baseline_mpp=None, baseline_power=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something doesn't seem right about this, but I need a while to properly go through it. Also, returning conversion to all units seems a bit like doing a lot of unnecessary work if only one conversion is being used.. Surely a conversion from one unit to another would be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @John-P , I hope you find what's not right about this sooner :-D
Re returning conversions to all units: first off, most of the units are needed to be calculated during the calculation of one of them so why not keep all of them?
Also, if we allow requesting for a specific output unit, then have added an unnecessary argument to the function whereas we should still write the function like this (calculate all units) and select the desired unit from the dictionary. Otherwise, the function would comprise many if statement to do the right conversion!
if input_unit is such and requested_unit is such:
do this calculation
This part should be repeated 4x4=16 times (imaging writing tests for it)! At least, I don't know a better way to write it.
Not to mention, the whole thing really need a small amount of computation (it tooks 12.2 ns on my laptop). So I guess we don't need to be worry about it.
However, if you insist on returning only one desired unit, I'm happy to add another line of code the end of the function, extracting the desired unit from the output dictionary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I notice that the output in terms of levels is not right. Levels do not always decrease by half each time. Some TIFF have a downsample of 4x (and it can be different between levels in the same image). To do this correctly, you'd need the information in WSIMeta.
…enhance-patchextraction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addess changes before merging. Resolution thingy needed to be dealed with by WSIMetadata. Also reading mask should be done as a different mode for coord_space (receiving coordinates at WSI target resolution but actual read done only at thumb resolution w/o any rescaling). These will need to be done via another PR given the urgency of finishing the paper.
tiatoolbox/utils/transforms.py
Outdated
| "level": output_level, | ||
| "baseline": unit_baseline_scale, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return a warning to stricly denote these values can totally be garbage. Or set to None to straight out crashing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default values for the parameters have been set to None. User/developer can check the values when they call the function according to their application.
tiatoolbox/utils/transforms.py
Outdated
| return np.add(bounds, padding * signs) | ||
|
|
||
|
|
||
| def convert_resolution(input_res, input_unit, baseline_mpp=None, baseline_power=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
baseline => target to make it more clear, also use dictionary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please refer to my response to John above.
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
…enhance-patchextraction
b17bcf8 to
c4f3471
Compare
|
Thanks @vqdang for the review. Re:
I believe this function can be called in a WSIMeta method, where there is access to WSI metadata and we can control what data should be fed into the Re:
I agree with you and it will be good to have such an option in future. However, for that we need to consider scaling the |
…enhance-patchextraction
In the current implementation, the
filter_coordinatesfunction is used to filter out generated coordinates based on the input mask. This is done by reading the mask region frommask_readerobject (which is a VirtualWSI) at the same resolution and region requested in the patch extraction. Therefore, this process can take very long specially if the whole slide mask region is big and user requests to extract small patches from highest resolution (not to mention the increase in time due to overlap in patch extraction). For example, if you want extract 256x256 patches at 40x from the below WSI, it might take up to 10 minutes for checking filter the coordinatesIn this PR, I have introduced
filter_coordinates_fastwhich uses the overview ofmask_readerobject (a numpy array) at a fixed-low resolution (mpp=8 or power=1.25) to speed of the region mask reading process and hence improve the speed of coordinate filtering and whole patch extraction processes. For example, the above image takes about 0.5 seconds (at least 1,000x improvement in speed in comparison to previous method).To be able to do this, I also introduce a method to
WSIReaderclass that converts resolution from one unit to all other units, because thefilter_coordinates_fastexpects its input mask resolution to be inmpp,power, orbaselineunits.