Skip to content

tiff_to_zarr for geotiff with compression: zarr reads strange values #317

@croth1

Description

@croth1
imagecodecs               2023.1.23          py38h5408aff_0    conda-forge
zarr                      2.13.6               pyhd8ed1ab_0    conda-forge
kerchunk                  0.1.0+32.g39658f8          pypi_0    pypi

When I translate geotiffs created by rasterio with tiff_to_zarr, it only seems to work for uncompressed files. As soon as I choose a compression, the values seem off:

import warnings
from pathlib import Path

import numpy as np
import rasterio as rio
import zarr
from kerchunk.tiff import tiff_to_zarr

warnings.simplefilter("ignore")

tiff_file = Path("data.tif").absolute()

x, y, z = 512, 512, 5


opts = dict(tiled=True, blockxsize=256, blockysize=256, dtype=np.uint32)

compressions = ["NONE", "ZSTD", "LZW", "DEFLATE", "LERC"]
for compression in compressions:
    with rio.open(
        tiff_file,
        "w",
        driver="GTiff",
        width=x,
        height=y,
        count=z,
        compress=compression,
        **opts,
    ) as out:
        datablock = np.arange(x * y * z).reshape(z, x, y)
        out.write(datablock)

    # sanity check - reading the full block back into memory
    with rio.open(tiff_file) as tiff_in:
        block2 = tiff_in.read()
        np.testing.assert_array_equal(block2, datablock)

    try:
        zarr_file = Path("data_zarr.json").absolute()
        tiff_to_zarr(
            f"file://{tiff_file}",
            target=f"file://{zarr_file}",
            target_options=dict(mode="w"),
        )

        arr = zarr.open("reference://", storage_options=dict(fo="data_zarr.json"))
        np.testing.assert_equal(arr[42, 42, 4], datablock[4, 42, 42])
        print(f"{compression}: success.")

    except Exception as e:
        print(f"{compression}: failed ({arr.compressor})- ({e}).")
NONE: success.
ZSTD: failed (None)- (
Items are not equal:
 ACTUAL: 466472691
 DESIRED: 1070122).
LZW: failed (None)- (
Items are not equal:
 ACTUAL: 282600147
 DESIRED: 1070122).
DEFLATE: failed (None)- (
Items are not equal:
 ACTUAL: 1692767126
 DESIRED: 1070122).
LERC: failed (None)- (
Items are not equal:
 ACTUAL: 1075052576
 DESIRED: 1070122).

The values it reads are much larger than 512*512*5 = 1310720, which should be the largest value in the array. Also for some reason the array does not seem to have a compressor set, although the kerchunk generated files do mention a compressor Any ideas what I am doing wrong?

".zarray": "{\n \"chunks\": [\n  256,\n  256,\n  5\n ],\n \"compressor\": {\n  \"id\": \"imagecodecs_lerc\"\n },\n \"dtype\": \"<u4\",\n \"fill_value\": 0,\n \"filters\": null,\n \"order\": \"C\",\n \"shape\": [\n  512,\n  512,\n  5\n ],\n \"zarr_format\": 2\n}",

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions