-
Notifications
You must be signed in to change notification settings - Fork 1k
Define a "arrow-pyrarrow" crate to implement the "pyarrow" feature. #7694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
20ac057 to
c8e4396
Compare
This follows the pattern of other parts of the arrow-rs codebase: arrow-array, arrow-schema, etc. With this change, polyglot codebases can use pyarrow without making all their crates that use arrow pull in pyarrow (& pyo3). Issue: apache#7668.
c8e4396 to
4d04947
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @brunal and @kylebarron -- this looks great to me
I think the benefit of this PR is that other crates can use pyarrow without pulling in all of the |
|
FYI @linhr |
|
Thanks again @brunal and @kylebarron -- I merged this one in to get it into |
|
I hit some error probably related to this PR when making a release candidate: |
…yarrow` and `parquet-variant` (#7745) Note this targets a release branch, not main I have a different proposed fix for `main`: - #7742 I will also make a fix for parquet-variant test failures # Which issue does this PR close? - Related to #7394 - Related to #7736 - Related to #7746 # Rationale for this change `cargo test --all` requires python and pyarrow installed, where it did not in previous versions of arrow due to breaking out `arrow-pyarrow` into its own crate in - #7694 Also the new `parquet-variant` crate tests fail as part of the verification script too -- see ttps://github.com//issues/7746 In order to get a script that can automatically verify a release candidate, let's ignore this new module for now. More details - #7736 - #7742 Note that the arrow-pyarrow tests do run as part of CI and succeed on this branch # What changes are included in this PR? 1. Exclude running the `arrow-pyarrow` and `parquet-variant` tests as part of `verify-release-acndidate` # Testing I verified locally that `./dev/release/verify-release-candidate.sh 55.2.0 1` passes with this script # Are there any user-facing changes? No this is a development process only change
# Which issue does this PR close? - Part of #7394 # Rationale for this change - #7694 adds a new `arrow-pyarrow` crate, so we need to add it to the list of things to release # What changes are included in this PR? # Are these changes tested? no, but they are doc only changes # Are there any user-facing changes? no
…`arrow-pyarrow-testing` crate (#7742) # Which issue does this PR close? - Related to #7394 - Closes #7736 # Rationale for this change At its core, if someone isn't using / modifying the pyarrow integration for arrow-rs they shouldn't have to install / configure python to get the tests working in `arrow-rs` - after the change in #7694 Running `cargo test --workspace` now also runs tests that require python to be setup and the `pyarrow` module to be installed. This is problematic because: 1. Some people may not have that environment setup 2. Apparently you can not use virtualenvs with py03 in Mac due to PyO3/pyo3#1741 The second item was very confusing for me while I tried to debug what going on as I ket getting an error about pyarrow not being installed, even though it was installed in my `venv`: ``` thread 'test_to_pyarrow' panicked at arrow-pyarrow/tests/pyarrow.rs:43:6: called `Result::unwrap()` on an `Err` value: PyErr { type: <class 'ModuleNotFoundError'>, value: ModuleNotFoundError("No module named 'pyarrow'"), traceback: None } ``` # What changes are included in this PR? 1. Move the tests that require pyarrow to be installed into `arrow-pyarrow-testing`, which is not part of the workspace and thus not run with `cargo test --all` 2. Remove `cargo test --exclude arrow-pyarrow` 3. Add documentation on rationale and hints about running the test # Frequently Asked Questions ## Why not add ` --exclude arrow-pyarrow` to `verify_release_candidate.sh`? While the minimal fix would be to add ` --exclude arrow-pyarrow` to verify_release_candidate.sh this requires all users of arrow to remember to add `--exclude arrow-pyarrow` to their tests even if they don't care about python ## Why not in `pyarrow-arrow-integration-testing` ? I did not put this test in `pyarrow-arrow-integration-testing` because that module doesn't compile for me with the stock python install Somehow python needs to be installed with the ability to make dynamic libraries that I haven't figured out and don't really want to. It seems maybe related to https://pyo3.rs/v0.18.1/getting_started#python (thanks to @Xuanwo for the pointer in PyO3/pyo3#2136 / apache/opendal#1675) ``` (venv) root@5e8d0406fabe:/arrow-rs/arrow-pyarrow-integration-testing# cargo test --test pyarrow warning: `/arrow-rs/arrow-pyarrow-integration-testing/.cargo/config` is deprecated in favor of `config.toml` note: if you need to support cargo 1.38 or earlier, you can symlink `config` to `config.toml` Compiling target-lexicon v0.13.2 Compiling flatbuffers v25.2.10 Compiling pyo3-build-config v0.24.2 Compiling arrow-ipc v55.2.0 (/arrow-rs/arrow-ipc) Compiling pyo3-macros-backend v0.24.2 Compiling pyo3-ffi v0.24.2 Compiling pyo3 v0.24.2 Compiling pyo3-macros v0.24.2 Compiling arrow-pyarrow v55.2.0 (/arrow-rs/arrow-pyarrow) Compiling arrow v55.2.0 (/arrow-rs/arrow) Compiling arrow-pyarrow-integration-testing v0.1.0 (/arrow-rs/arrow-pyarrow-integration-testing) error: linking with `cc` failed: exit status: 1 | = note: "cc" "/tmp/rustc0jx15I/symbols.o" "<43 object files omitted>" "-Wl,--as-needed" "-Wl,-Bstatic" "<sysroot>/lib/rustlib/aarch64-unknown-linux-gnu/lib/{libtest-*,libgetopts-*,libunicode_width-*,librustc_std_workspace_std-*}.rlib" "/arrow-rs/arrow-pyarrow-integration-testing/target/debug/deps/{libarrow-7996898a6777f964.rlib,libarrow_row-63508de6e52f4d4d.rlib,libarrow_pyarrow-8b510eeadc952ad2.rlib,libpyo3-c463c3a2243eeab9.rlib,libmemoffset-836dc1ddd866c614.rlib,libpyo3_ffi-fbf18d9f712874be.rlib,libunindent-2b8a456e13fa9700.rlib,libarrow_json-a7b4960a4b4d1cb5.rlib,libsimdutf8-7e080cbee40e41cd.rlib,libserde_json-0288fe0f1ec1bcde.rlib,libindexmap-ebb707a4eec26692.rlib,libequivalent-4762261bc2781d11.rlib,libarrow_ipc-085ebaaded386ff8.rlib,libflatbuffers-1f88fdf138129305.rlib,libarrow_csv-e5d679eef2b85a1b.rlib,libcsv-2288f6dec5308d9c.rlib,libitoa-fa5c9b2503c605f5.rlib,libserde-33ccdec93d601cce.rlib,libcsv_core-122def45831e6a2c.rlib,libarrow_string-17ebd7a5409511da.rlib,libregex-97f4021e65bafbca.rlib,libregex_automata-b62a0db5ace54d45.rlib,libaho_corasick-547ec01718db652c.rlib,libregex_syntax-f3065c7bb7c4592a.rlib,libmemchr-547fa7a4048cbc2e.rlib,libarrow_cast-0b7117723b343c65.rlib,libatoi-c9a52adfe9dd2564.rlib,libryu-243c2c0ae3ed75b4.rlib,libbase64-1cab23258b68443b.rlib,liblexical_core-c2a41d0a6941285f.rlib,liblexical_write_float-9d65854ce5ab8f07.rlib,liblexical_write_integer-894216b914487c18.rlib,liblexical_parse_float-274078b1af50d567.rlib,liblexical_parse_integer-781bcb0a42285559.rlib,liblexical_util-4a71e416d58e0125.rlib,libstatic_assertions-4f12831487497211.rlib,libarrow_arith-00bcbf2ec2eb3322.rlib,libarrow_ord-fcdece9f7e87a9bf.rlib,libarrow_select-31b4c3cfa277427b.rlib,libarrow_array-c2dc23f827508dc6.rlib,libahash-d573f36c088b3179.rlib,libgetrandom-31b11224f8e2ea08.rlib,liblibc-045cef5bc264baa9.rlib,libonce_cell-83f4df333969eacb.rlib,libzerocopy-0db6330db7e4b762.rlib,libhashbrown-2c7527cd2fd4322d.rlib,libchrono-6d1bc7062186f166.rlib,libiana_time_zone-875b10f893e8f81d.rlib,libarrow_data-be090acb3cb83adb.rlib,libarrow_schema-25469a5878e8c886.rlib,libbitflags-2614952e3652d907.rlib,libarrow_buffer-8eb56dc26cbe25a3.rlib,libbytes-8b0f150f04d16150.rlib,libhalf-ed72603b54882276.rlib,libcfg_if-9dbfdc9eaf8f6a2d.rlib,libnum-436acb7880d5b290.rlib,libnum_iter-87f263003ea3e8dd.rlib,libnum_rational-d812b535c653cc6e.rlib,libnum_complex-c12c249f79450951.rlib,libnum_bigint-ff983ebd6646ce72.rlib,libnum_integer-f946a0e48063a631.rlib,libnum_traits-d0a5f363c632fb69.rlib}.rlib" "<sysroot>/lib/rustlib/aarch64-unknown-linux-gnu/lib/{libstd-*,libpanic_unwind-*,libobject-*,libmemchr-*,libaddr2line-*,libgimli-*,librustc_demangle-*,libstd_detect-*,libhashbrown-*,librustc_std_workspace_alloc-*,libminiz_oxide-*,libadler2-*,libunwind-*,libcfg_if-*,liblibc-*,liballoc-*,librustc_std_workspace_core-*,libcore-*,libcompiler_builtins-*}.rlib" "-Wl,-Bdynamic" "-lgcc_s" "-lutil" "-lrt" "-lpthread" "-lm" "-ldl" "-lc" "-L" "/tmp/rustc0jx15I/raw-dylibs" "-Wl,--eh-frame-hdr" "-Wl,-z,noexecstack" "-L" "<sysroot>/lib/rustlib/aarch64-unknown-linux-gnu/lib" "-o" "/arrow-rs/arrow-pyarrow-integration-testing/target/debug/deps/pyarrow-00909cd9e7866a35" "-Wl,--gc-sections" "-pie" "-Wl,-z,relro,-z,now" "-nodefaultlibs" = note: some arguments are omitted. use `--verbose` to show all linker arguments = note: /usr/bin/ld: /arrow-rs/arrow-pyarrow-integration-testing/target/debug/deps/libarrow_pyarrow-8b510eeadc952ad2.rlib(arrow_pyarrow-8b510eeadc952ad2.8xxa5xo5oql7wlj24034o033n.rcgu.o): in function `<pyo3::instance::Borrowed<pyo3::types::tuple::PyTuple> as pyo3::call::PyCallArgs>::call_positional': /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/pyo3-0.24.2/src/call.rs:213: undefined reference to `PyObject_Call' ``` # Are there any user-facing changes? If there are user-facing changes then we may require documentation to be updated before approving the PR. If there are any breaking changes to public APIs, please call them out. --------- Co-authored-by: Copilot <[email protected]>
…yarrow` and `parquet-variant` (apache#7745) Note this targets a release branch, not main I have a different proposed fix for `main`: - apache#7742 I will also make a fix for parquet-variant test failures # Which issue does this PR close? - Related to apache#7394 - Related to apache#7736 - Related to apache#7746 # Rationale for this change `cargo test --all` requires python and pyarrow installed, where it did not in previous versions of arrow due to breaking out `arrow-pyarrow` into its own crate in - apache#7694 Also the new `parquet-variant` crate tests fail as part of the verification script too -- see ttps://github.com/apache/issues/7746 In order to get a script that can automatically verify a release candidate, let's ignore this new module for now. More details - apache#7736 - apache#7742 Note that the arrow-pyarrow tests do run as part of CI and succeed on this branch # What changes are included in this PR? 1. Exclude running the `arrow-pyarrow` and `parquet-variant` tests as part of `verify-release-acndidate` # Testing I verified locally that `./dev/release/verify-release-candidate.sh 55.2.0 1` passes with this script # Are there any user-facing changes? No this is a development process only change
…yarrow` and `parquet-variant` (apache#7745) Note this targets a release branch, not main I have a different proposed fix for `main`: - apache#7742 I will also make a fix for parquet-variant test failures # Which issue does this PR close? - Related to apache#7394 - Related to apache#7736 - Related to apache#7746 # Rationale for this change `cargo test --all` requires python and pyarrow installed, where it did not in previous versions of arrow due to breaking out `arrow-pyarrow` into its own crate in - apache#7694 Also the new `parquet-variant` crate tests fail as part of the verification script too -- see ttps://github.com/apache/issues/7746 In order to get a script that can automatically verify a release candidate, let's ignore this new module for now. More details - apache#7736 - apache#7742 Note that the arrow-pyarrow tests do run as part of CI and succeed on this branch # What changes are included in this PR? 1. Exclude running the `arrow-pyarrow` and `parquet-variant` tests as part of `verify-release-acndidate` # Testing I verified locally that `./dev/release/verify-release-candidate.sh 55.2.0 1` passes with this script # Are there any user-facing changes? No this is a development process only change
…yarrow` and `parquet-variant` (apache#7745) Note this targets a release branch, not main I have a different proposed fix for `main`: - apache#7742 I will also make a fix for parquet-variant test failures # Which issue does this PR close? - Related to apache#7394 - Related to apache#7736 - Related to apache#7746 # Rationale for this change `cargo test --all` requires python and pyarrow installed, where it did not in previous versions of arrow due to breaking out `arrow-pyarrow` into its own crate in - apache#7694 Also the new `parquet-variant` crate tests fail as part of the verification script too -- see ttps://github.com/apache/issues/7746 In order to get a script that can automatically verify a release candidate, let's ignore this new module for now. More details - apache#7736 - apache#7742 Note that the arrow-pyarrow tests do run as part of CI and succeed on this branch # What changes are included in this PR? 1. Exclude running the `arrow-pyarrow` and `parquet-variant` tests as part of `verify-release-acndidate` # Testing I verified locally that `./dev/release/verify-release-candidate.sh 55.2.0 1` passes with this script # Are there any user-facing changes? No this is a development process only change
This follows the pattern of other parts of the arrow-rs codebase: arrow-array, arrow-schema, etc.
With this change, polyglot codebases can use pyarrow without making all their crates that use arrow pull in pyarrow (& pyo3).
It also allows interfacing with PyArrow without pulling in Arrow.
Which issue does this PR close?
Closes #7668.
Rationale for this change
Part of a codebase can use pyarrow without arrow pulling in pyo3 across the codebase.
Are there any user-facing changes?
Nope.