Skip to content

Out of range when extending on a slice of string array imported through FFI #5896

@viirya

Description

@viirya

Describe the bug

We move buffer pointer of offset buffer when slicing a string array and keep data buffer pointer unchanged. When exporting it through FFI, we simply export the moved pointer of the offset buffer.

When importing the array, we calculate the length of data buffer by taking the difference of last offset and first offset in the (slice) offset buffer. Note that the calculated length is not correct.

For example, the original string array's data buffer is 346536 bytes, last offset is 346536. We take a slice of 8192 strings from it, the slice of offsets are [147456, ..., 294912]. The calculated length is 294912 - 147456 = 147456. But actually the length of data buffer is 346536. So the data buffer of the imported array has incorrect length.

It doesn't cause issues so far because we access imported data buffer using pointers at most time (and we don't actually check the range). But for some cases where we access the data as slice (i.e., []), for example when we extend it, it will cause runtime panic like:

---- ffi::tests_from_ffi::test_extend_imported_string_slice stdout ----
thread 'ffi::tests_from_ffi::test_extend_imported_string_slice' panicked at arrow-data/src/transform/variable_size.rs:38:29:
range end index 10890 out of range for slice of length 5500

Note test_extend_imported_string_slice is new test I added in apache/arrow#5895.

It also makes the exported array possibly cannot be imported to be used in other Arrow implementations like Java Arrow: apache/arrow-java#74. Because in Java Arrow, its buffer implementation ArrowBuf possibly checks index range.

To Reproduce

Expected behavior

Additional context

The bug was found during debugging apache/datafusion-comet#540.

Metadata

Metadata

Assignees

No one assigned

    Labels

    arrowChanges to the arrow cratebug

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions