Skip to content

Optionally disable data validation for arrow-ipc #6933

@totoroyyb

Description

@totoroyyb

Which part is this question about
Regarding the library API usage.

Describe your question
I am using high-level API (FileReader and FileDecoder) to read IPC files via mmap. I have noticed that validate_data() in the Array building process (here) adds significant overhead.

I am targeting an ultra-low-latency scenario. With validate_data I got 290ms for reading a 2.2GB IPC file (via mmap), and 3.8ms without validate_data, which I tested locally by commenting that out. 3.8ms latency is pretty much identical to c++ arrow implementation I tested, and I suspect c++ codebase didn't do this sanity check (not entirely sure).

The functions for the "unchecked" building are here in the codebase, but they are not accessible from high-level API, where I can easily disable them without creating my own array and everything on top of it.

I wonder if there is any better way to achieve that?

Additional context
Low latency is critical in my case. Thus, I am trying to avoid any additional overhead (C++ codebase as the baseline, maybe?)

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions