Skip to content

Conversation

@brunal
Copy link
Contributor

@brunal brunal commented Jun 17, 2025

This follows the pattern of other parts of the arrow-rs codebase: arrow-array, arrow-schema, etc.

With this change, polyglot codebases can use pyarrow without making all their crates that use arrow pull in pyarrow (& pyo3).

It also allows interfacing with PyArrow without pulling in Arrow.

Which issue does this PR close?

Closes #7668.

Rationale for this change

Part of a codebase can use pyarrow without arrow pulling in pyo3 across the codebase.

Are there any user-facing changes?

Nope.

@brunal brunal marked this pull request as draft June 17, 2025 19:27
@github-actions github-actions bot added the arrow Changes to the arrow crate label Jun 17, 2025
@brunal brunal force-pushed the standalone-pyarrow branch 2 times, most recently from 20ac057 to c8e4396 Compare June 18, 2025 19:44
This follows the pattern of other parts of the arrow-rs codebase:
arrow-array, arrow-schema, etc.

With this change, polyglot codebases can use pyarrow without making all
their crates that use arrow pull in pyarrow (& pyo3).

Issue: apache#7668.
@brunal brunal force-pushed the standalone-pyarrow branch from c8e4396 to 4d04947 Compare June 18, 2025 20:04
@brunal brunal marked this pull request as ready for review June 18, 2025 20:07
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @brunal and @kylebarron -- this looks great to me

@alamb
Copy link
Contributor

alamb commented Jun 20, 2025

With this change, polyglot codebases can use pyarrow without making all their crates that use arrow pull in pyarrow (& pyo3).

I think the benefit of this PR is that other crates can use pyarrow without pulling in all of the arrow crate, only the parts they need for pyarrow

@alamb
Copy link
Contributor

alamb commented Jun 20, 2025

FYI @linhr

@alamb alamb merged commit 469c7ee into apache:main Jun 20, 2025
33 checks passed
@alamb
Copy link
Contributor

alamb commented Jun 20, 2025

Thanks again @brunal and @kylebarron -- I merged this one in to get it into

@alamb
Copy link
Contributor

alamb commented Jun 22, 2025

I hit some error probably related to this PR when making a release candidate:

alamb added a commit that referenced this pull request Jun 23, 2025
…yarrow` and `parquet-variant` (#7745)

Note this targets a release branch, not main

I have a different proposed fix for `main`:
- #7742

I will also make a fix for parquet-variant test failures

# Which issue does this PR close?

- Related to #7394
- Related to #7736
- Related to #7746

# Rationale for this change

`cargo test --all` requires python and pyarrow installed, where it did
not in previous versions of arrow due to breaking out `arrow-pyarrow`
into its own crate in
- #7694

Also the new `parquet-variant` crate tests fail as part of the
verification script too -- see
ttps://github.com//issues/7746


In order to get a script that can automatically verify a release
candidate, let's ignore this new module for now.

More details 
- #7736 
- #7742

Note that the arrow-pyarrow tests do run as part of CI and succeed on
this branch


# What changes are included in this PR?

1. Exclude running the `arrow-pyarrow` and `parquet-variant` tests as
part of `verify-release-acndidate`


# Testing
I verified locally that `./dev/release/verify-release-candidate.sh
55.2.0 1` passes with this script

# Are there any user-facing changes?
No this is a development process only change
alamb added a commit that referenced this pull request Jun 27, 2025
# Which issue does this PR close?

- Part of #7394

# Rationale for this change

- #7694 adds a new
`arrow-pyarrow` crate, so we need to add it to the list of things to
release

# What changes are included in this PR?



# Are these changes tested?
no, but they are doc only changes

# Are there any user-facing changes?

no
alamb added a commit that referenced this pull request Jul 7, 2025
…`arrow-pyarrow-testing` crate (#7742)

# Which issue does this PR close?

- Related to #7394
- Closes #7736

# Rationale for this change

At its core, if someone isn't using / modifying the pyarrow integration
for arrow-rs they shouldn't have to install / configure python to get
the tests working in `arrow-rs`

- after the change in #7694

Running `cargo test --workspace` now also runs tests that require python
to be setup and the `pyarrow` module to be installed. This is
problematic because:
1. Some people may not have that environment setup
2. Apparently you can not use virtualenvs with py03 in Mac due to
PyO3/pyo3#1741

The second item was very confusing for me while I tried to debug what
going on as I ket getting an error about pyarrow not being installed,
even though it was installed in my `venv`:

```
thread 'test_to_pyarrow' panicked at arrow-pyarrow/tests/pyarrow.rs:43:6:
called `Result::unwrap()` on an `Err` value: PyErr { type: <class 'ModuleNotFoundError'>, value: ModuleNotFoundError("No module named 'pyarrow'"), traceback: None }
```

# What changes are included in this PR?

1. Move the tests that require pyarrow to be installed into
`arrow-pyarrow-testing`, which is not part of the workspace and thus not
run with `cargo test --all`
2. Remove `cargo test --exclude arrow-pyarrow`
3. Add documentation on rationale and hints about running the test

# Frequently Asked Questions

## Why not add ` --exclude arrow-pyarrow` to
`verify_release_candidate.sh`?

While the minimal fix would be to add ` --exclude arrow-pyarrow` to
verify_release_candidate.sh this requires all users of arrow to remember
to add `--exclude arrow-pyarrow` to their tests even if they don't care
about python

## Why not in `pyarrow-arrow-integration-testing` ?
I did not put this test in `pyarrow-arrow-integration-testing` because
that module doesn't compile for me with the stock python install

Somehow python needs to be installed with the ability to make dynamic
libraries that I haven't figured out and don't really want to. It seems
maybe related to https://pyo3.rs/v0.18.1/getting_started#python (thanks
to @Xuanwo for the pointer in PyO3/pyo3#2136 /
apache/opendal#1675)


```
(venv) root@5e8d0406fabe:/arrow-rs/arrow-pyarrow-integration-testing# cargo test --test pyarrow
warning: `/arrow-rs/arrow-pyarrow-integration-testing/.cargo/config` is deprecated in favor of `config.toml`
note: if you need to support cargo 1.38 or earlier, you can symlink `config` to `config.toml`
   Compiling target-lexicon v0.13.2
   Compiling flatbuffers v25.2.10
   Compiling pyo3-build-config v0.24.2
   Compiling arrow-ipc v55.2.0 (/arrow-rs/arrow-ipc)
   Compiling pyo3-macros-backend v0.24.2
   Compiling pyo3-ffi v0.24.2
   Compiling pyo3 v0.24.2
   Compiling pyo3-macros v0.24.2
   Compiling arrow-pyarrow v55.2.0 (/arrow-rs/arrow-pyarrow)
   Compiling arrow v55.2.0 (/arrow-rs/arrow)
   Compiling arrow-pyarrow-integration-testing v0.1.0 (/arrow-rs/arrow-pyarrow-integration-testing)
error: linking with `cc` failed: exit status: 1
  |
  = note:  "cc" "/tmp/rustc0jx15I/symbols.o" "<43 object files omitted>" "-Wl,--as-needed" "-Wl,-Bstatic" "<sysroot>/lib/rustlib/aarch64-unknown-linux-gnu/lib/{libtest-*,libgetopts-*,libunicode_width-*,librustc_std_workspace_std-*}.rlib" "/arrow-rs/arrow-pyarrow-integration-testing/target/debug/deps/{libarrow-7996898a6777f964.rlib,libarrow_row-63508de6e52f4d4d.rlib,libarrow_pyarrow-8b510eeadc952ad2.rlib,libpyo3-c463c3a2243eeab9.rlib,libmemoffset-836dc1ddd866c614.rlib,libpyo3_ffi-fbf18d9f712874be.rlib,libunindent-2b8a456e13fa9700.rlib,libarrow_json-a7b4960a4b4d1cb5.rlib,libsimdutf8-7e080cbee40e41cd.rlib,libserde_json-0288fe0f1ec1bcde.rlib,libindexmap-ebb707a4eec26692.rlib,libequivalent-4762261bc2781d11.rlib,libarrow_ipc-085ebaaded386ff8.rlib,libflatbuffers-1f88fdf138129305.rlib,libarrow_csv-e5d679eef2b85a1b.rlib,libcsv-2288f6dec5308d9c.rlib,libitoa-fa5c9b2503c605f5.rlib,libserde-33ccdec93d601cce.rlib,libcsv_core-122def45831e6a2c.rlib,libarrow_string-17ebd7a5409511da.rlib,libregex-97f4021e65bafbca.rlib,libregex_automata-b62a0db5ace54d45.rlib,libaho_corasick-547ec01718db652c.rlib,libregex_syntax-f3065c7bb7c4592a.rlib,libmemchr-547fa7a4048cbc2e.rlib,libarrow_cast-0b7117723b343c65.rlib,libatoi-c9a52adfe9dd2564.rlib,libryu-243c2c0ae3ed75b4.rlib,libbase64-1cab23258b68443b.rlib,liblexical_core-c2a41d0a6941285f.rlib,liblexical_write_float-9d65854ce5ab8f07.rlib,liblexical_write_integer-894216b914487c18.rlib,liblexical_parse_float-274078b1af50d567.rlib,liblexical_parse_integer-781bcb0a42285559.rlib,liblexical_util-4a71e416d58e0125.rlib,libstatic_assertions-4f12831487497211.rlib,libarrow_arith-00bcbf2ec2eb3322.rlib,libarrow_ord-fcdece9f7e87a9bf.rlib,libarrow_select-31b4c3cfa277427b.rlib,libarrow_array-c2dc23f827508dc6.rlib,libahash-d573f36c088b3179.rlib,libgetrandom-31b11224f8e2ea08.rlib,liblibc-045cef5bc264baa9.rlib,libonce_cell-83f4df333969eacb.rlib,libzerocopy-0db6330db7e4b762.rlib,libhashbrown-2c7527cd2fd4322d.rlib,libchrono-6d1bc7062186f166.rlib,libiana_time_zone-875b10f893e8f81d.rlib,libarrow_data-be090acb3cb83adb.rlib,libarrow_schema-25469a5878e8c886.rlib,libbitflags-2614952e3652d907.rlib,libarrow_buffer-8eb56dc26cbe25a3.rlib,libbytes-8b0f150f04d16150.rlib,libhalf-ed72603b54882276.rlib,libcfg_if-9dbfdc9eaf8f6a2d.rlib,libnum-436acb7880d5b290.rlib,libnum_iter-87f263003ea3e8dd.rlib,libnum_rational-d812b535c653cc6e.rlib,libnum_complex-c12c249f79450951.rlib,libnum_bigint-ff983ebd6646ce72.rlib,libnum_integer-f946a0e48063a631.rlib,libnum_traits-d0a5f363c632fb69.rlib}.rlib" "<sysroot>/lib/rustlib/aarch64-unknown-linux-gnu/lib/{libstd-*,libpanic_unwind-*,libobject-*,libmemchr-*,libaddr2line-*,libgimli-*,librustc_demangle-*,libstd_detect-*,libhashbrown-*,librustc_std_workspace_alloc-*,libminiz_oxide-*,libadler2-*,libunwind-*,libcfg_if-*,liblibc-*,liballoc-*,librustc_std_workspace_core-*,libcore-*,libcompiler_builtins-*}.rlib" "-Wl,-Bdynamic" "-lgcc_s" "-lutil" "-lrt" "-lpthread" "-lm" "-ldl" "-lc" "-L" "/tmp/rustc0jx15I/raw-dylibs" "-Wl,--eh-frame-hdr" "-Wl,-z,noexecstack" "-L" "<sysroot>/lib/rustlib/aarch64-unknown-linux-gnu/lib" "-o" "/arrow-rs/arrow-pyarrow-integration-testing/target/debug/deps/pyarrow-00909cd9e7866a35" "-Wl,--gc-sections" "-pie" "-Wl,-z,relro,-z,now" "-nodefaultlibs"
  = note: some arguments are omitted. use `--verbose` to show all linker arguments
  = note: /usr/bin/ld: /arrow-rs/arrow-pyarrow-integration-testing/target/debug/deps/libarrow_pyarrow-8b510eeadc952ad2.rlib(arrow_pyarrow-8b510eeadc952ad2.8xxa5xo5oql7wlj24034o033n.rcgu.o): in function `<pyo3::instance::Borrowed<pyo3::types::tuple::PyTuple> as pyo3::call::PyCallArgs>::call_positional':
          /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/pyo3-0.24.2/src/call.rs:213: undefined reference to `PyObject_Call'
```



# Are there any user-facing changes?

If there are user-facing changes then we may require documentation to be
updated before approving the PR.

If there are any breaking changes to public APIs, please call them out.

---------

Co-authored-by: Copilot <[email protected]>
berkaysynnada pushed a commit to synnada-ai/arrow-rs that referenced this pull request Jul 17, 2025
…yarrow` and `parquet-variant` (apache#7745)

Note this targets a release branch, not main

I have a different proposed fix for `main`:
- apache#7742

I will also make a fix for parquet-variant test failures

# Which issue does this PR close?

- Related to apache#7394
- Related to apache#7736
- Related to apache#7746

# Rationale for this change

`cargo test --all` requires python and pyarrow installed, where it did
not in previous versions of arrow due to breaking out `arrow-pyarrow`
into its own crate in
- apache#7694

Also the new `parquet-variant` crate tests fail as part of the
verification script too -- see
ttps://github.com/apache/issues/7746


In order to get a script that can automatically verify a release
candidate, let's ignore this new module for now.

More details 
- apache#7736 
- apache#7742

Note that the arrow-pyarrow tests do run as part of CI and succeed on
this branch


# What changes are included in this PR?

1. Exclude running the `arrow-pyarrow` and `parquet-variant` tests as
part of `verify-release-acndidate`


# Testing
I verified locally that `./dev/release/verify-release-candidate.sh
55.2.0 1` passes with this script

# Are there any user-facing changes?
No this is a development process only change
berkaysynnada pushed a commit to synnada-ai/arrow-rs that referenced this pull request Aug 26, 2025
…yarrow` and `parquet-variant` (apache#7745)

Note this targets a release branch, not main

I have a different proposed fix for `main`:
- apache#7742

I will also make a fix for parquet-variant test failures

# Which issue does this PR close?

- Related to apache#7394
- Related to apache#7736
- Related to apache#7746

# Rationale for this change

`cargo test --all` requires python and pyarrow installed, where it did
not in previous versions of arrow due to breaking out `arrow-pyarrow`
into its own crate in
- apache#7694

Also the new `parquet-variant` crate tests fail as part of the
verification script too -- see
ttps://github.com/apache/issues/7746


In order to get a script that can automatically verify a release
candidate, let's ignore this new module for now.

More details 
- apache#7736 
- apache#7742

Note that the arrow-pyarrow tests do run as part of CI and succeed on
this branch


# What changes are included in this PR?

1. Exclude running the `arrow-pyarrow` and `parquet-variant` tests as
part of `verify-release-acndidate`


# Testing
I verified locally that `./dev/release/verify-release-candidate.sh
55.2.0 1` passes with this script

# Are there any user-facing changes?
No this is a development process only change
berkaysynnada pushed a commit to synnada-ai/arrow-rs that referenced this pull request Aug 26, 2025
…yarrow` and `parquet-variant` (apache#7745)

Note this targets a release branch, not main

I have a different proposed fix for `main`:
- apache#7742

I will also make a fix for parquet-variant test failures

# Which issue does this PR close?

- Related to apache#7394
- Related to apache#7736
- Related to apache#7746

# Rationale for this change

`cargo test --all` requires python and pyarrow installed, where it did
not in previous versions of arrow due to breaking out `arrow-pyarrow`
into its own crate in
- apache#7694

Also the new `parquet-variant` crate tests fail as part of the
verification script too -- see
ttps://github.com/apache/issues/7746


In order to get a script that can automatically verify a release
candidate, let's ignore this new module for now.

More details 
- apache#7736 
- apache#7742

Note that the arrow-pyarrow tests do run as part of CI and succeed on
this branch


# What changes are included in this PR?

1. Exclude running the `arrow-pyarrow` and `parquet-variant` tests as
part of `verify-release-acndidate`


# Testing
I verified locally that `./dev/release/verify-release-candidate.sh
55.2.0 1` passes with this script

# Are there any user-facing changes?
No this is a development process only change
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Deliver pyarrow as a standalone crate

3 participants