Skip to content

Conversation

@notfilippo
Copy link
Contributor

@notfilippo notfilippo commented Oct 8, 2024

Update the logical-types branch to reflect changes in main (especially the CI fix) cc @jayzhan211 @alamb

OussamaSaoudi and others added 30 commits October 1, 2024 18:29
* Make  support schemas

* Set default name to table

* Remove print statements and stale comment

* Add tests for create table

* Fix typo

* Update datafusion/sql/src/statement.rs

Co-authored-by: Jonah Gao <[email protected]>

* convert create_external_table to objectname

* Add sqllogic tests

* Fix failing tests

---------

Co-authored-by: Jonah Gao <[email protected]>
* Fix Regex signature types

* Uncomment the shared tests in string_query.slt.part and removed tests copies everywhere else

* Test `LIKE` and `MATCH` with flags; Remove new tests from regexp.slt
Remove conditions where unnecessary.
Refactor to improve readability.
* Remove aggregate functions dependency on frontend

DataFusion is a SQL query engine and also a reusable library for
building query engines. The core functionality should not depend on
frontend related functionalities like `sqlparser` or `datafusion-sql`.

* Remove duplicate license header
* rm clone

Signed-off-by: jayzhan211 <[email protected]>

* fmt

Signed-off-by: jayzhan211 <[email protected]>

---------

Signed-off-by: jayzhan211 <[email protected]>
… constants (apache#12702)

* Refactor tests for union sorting properties

* update doc test

* Undo import reordering

* remove unecessary static lifetimes
…nerics (apache#12703)

* Reduce code duplication in `PrimitiveGroupValueBuilder` with const generics

* Fix docs
* Disallow duplicated qualified field names

* Fix tests
…ster) (apache#12675)

* add bench

* replace macro with generic function

* remove duplicated code

* optimize base64/hex decode
)

* support to query partitioned table for dynamic file catalog

* cargo clippy

* split partitions inferring to another function
…odes (apache#12685)

* Update trait `UserDefinedLogicalNodeCore`

Signed-off-by: Austin Liu <[email protected]>

* Update corresponding interface

Signed-off-by: Austin Liu <[email protected]>

Add rewrite rule for `push-down-limit` for `Extension`

Signed-off-by: Austin Liu <[email protected]>

* Add rewrite rule for `push-down-limit` for `Extension` and tests

Signed-off-by: Austin Liu <[email protected]>

* Update corresponding interface

Signed-off-by: Austin Liu <[email protected]>

* Reorganize to match guard

Signed-off-by: Austin Liu <[email protected]>

* Clena up

Signed-off-by: Austin Liu <[email protected]>

Clean up

Signed-off-by: Austin Liu <[email protected]>

---------

Signed-off-by: Austin Liu <[email protected]>
…12686)

* Add union resolving for nested struct arrays

* Add test

* Change test

* Reproduce index error

* fmt

---------

Co-authored-by: Andrew Lamb <[email protected]>
…pache#12693)

* Adds macro for udwf singleton

* Adds a doc comment parameter to macro

* Add doc comment for `create_udwf` macro

* Uses default constructor

* Update `Cargo.lock` in `datafusion-cli`

* Fixes: expand `$FN_NAME` in doc strings

* Adds example for macro usage

* Renames macro

* Improve doc comments

* Rename udwf macro

* Minor: doc copy edits

* Adds macro for creating fluent-style expression API

* Adds support for 1 or more parameters in expression function

* Rewrite doc comments

* Rename parameters

* Minor: formatting

* Adds doc comment for `create_udwf_expr` macro

* Improve example docs

* Hides extraneous code in doc comments

* Add a one-line readme

* Adds doc test assertions + minor formatting fixes

* Adds common macro for defining user-defined window functions

* Adds doc comment for `define_udwf_and_expr`

* Defines `RowNumber` using common macro

* Add usage example for common macro

* Adds usage for custom constructor

* Add examples for remaining patterns

* Improve doc comments for usage examples

* Rewrite inner line docs

* Rewrite `create_udwf_expr!` doc comments

* Minor doc improvements

* Fix doc test and usage example

* Add inline comments for macro patterns

* Minor: change doc comment in example
…pache#12705)

* Support unparsing plans with both Aggregation and Window functions (apache#35)

* Fix unparsing for aggregation grouping sets

* Add test for grouping set unparsing

* Update datafusion/sql/src/unparser/utils.rs

Co-authored-by: Jax Liu <[email protected]>

* Update datafusion/sql/src/unparser/utils.rs

Co-authored-by: Jax Liu <[email protected]>

* Update

* More tests

---------

Co-authored-by: Jax Liu <[email protected]>
In 1b3608d `strpos` signature was
modified to indicate it supports dictionary as input argument, but the
invoke method doesn't support them.
…provide an "out of the box" query engine (apache#12666)

* Update DataFusion introduction to show that DataFusion offers packaged versions for end users

* change order

* Update README.md

Co-authored-by: Andrew Lamb <[email protected]>

* refine wording and update user guide for consistency

* prettier

---------

Co-authored-by: Andrew Lamb <[email protected]>
…on (apache#12668)

* Initial work on apache#12432 to allow for generation of udf docs from embedded documentation in the code

* Add missing license header.

* Fixed examples.

* Fixing a really weird RustRover/wsl ... something. No clue what happened there.

* permission change

* Cargo fmt update.

* Refactored Documentation to allow it to be used in a const.

* Add documentation for syntax_example

* Refactoring Documentation based on PR feedback.

* Cargo fmt update.

* Doc update

* Fixed copy/paste error.

* Minor text updates.

---------

Co-authored-by: Andrew Lamb <[email protected]>
* imdb dataset

* cargo fmt

* Add 113 queries for IMDB(JOB)

Signed-off-by: Austin Liu <[email protected]>

* Add `get_query_sql` from `query_id` string

Signed-off-by: Austin Liu <[email protected]>

* Fix CSV reader & Remove Parquet partition

Signed-off-by: Austin Liu <[email protected]>

* Add benchmark IMDB runner

Signed-off-by: Austin Liu <[email protected]>

* Add `run_imdb` script

Signed-off-by: Austin Liu <[email protected]>

* Add checker for imdb option

Signed-off-by: Austin Liu <[email protected]>

* Add SLT for IMDB

Signed-off-by: Austin Liu <[email protected]>

* Fix `get_query_sql()` for CI roundtrip test

Signed-off-by: Austin Liu <[email protected]>

Fix `get_query_sql()` for CI roundtrip test

Signed-off-by: Austin Liu <[email protected]>

Fix `get_query_sql()` for CI roundtrip test

Signed-off-by: Austin Liu <[email protected]>

* Clean up

Signed-off-by: Austin Liu <[email protected]>

* Add missing license

Signed-off-by: Austin Liu <[email protected]>

* Add IMDB(JOB) queries `2b` to `5c`

Signed-off-by: Austin Liu <[email protected]>

* Add `INCLUDE_IMDB` in CI verify-benchmark-results

Signed-off-by: Austin Liu <[email protected]>

* Prepare IMDB dataset

Signed-off-by: Austin Liu <[email protected]>

Prepare IMDB dataset

Signed-off-by: Austin Liu <[email protected]>

* use uint as id type

* format

* Seperate `tpch` and `imdb` benchmarking CI jobs

Signed-off-by: Austin Liu <[email protected]>

Fix path

Signed-off-by: Austin Liu <[email protected]>

Fix path

Signed-off-by: Austin Liu <[email protected]>

Remove `tpch` in `imdb` benchmark

Signed-off-by: Austin Liu <[email protected]>

* Remove IMDB(JOB) slt in CI

Signed-off-by: Austin Liu <[email protected]>

Remove IMDB(JOB) slt in CI

Signed-off-by: Austin Liu <[email protected]>

---------

Signed-off-by: Austin Liu <[email protected]>
Co-authored-by: DouPache <[email protected]>
…ache#12722)

* Minor: avoid clone while calculating union equivalence properties

* Update datafusion/physical-expr/src/equivalence/properties.rs

* fmt
* simplify streaming_merge function parameters

* revert test change

* change StreamingMergeConfig into builder pattern
…ith null fields. (apache#12729)

* test: reproducer for missing schema metadata on cross join

* fix: pass thru schema metadata on cross join

* fix: preserve metadata when transforming to view types

* test: reproducer for missing field metadata in left hand NULL field of union

* fix: preserve field metadata from right side of union

* chore: safe indexing
* cleanup make array coercion rule

Signed-off-by: jayzhan211 <[email protected]>

* change to type union resolution

Signed-off-by: jayzhan211 <[email protected]>

* change value too

Signed-off-by: jayzhan211 <[email protected]>

* fix tpyo

Signed-off-by: jayzhan211 <[email protected]>

---------

Signed-off-by: jayzhan211 <[email protected]>
…te (apache#12747)

* Add `DocumentationBuilder::with_standard_expression` to reduce copy/paste

* fix doc

* fix standard argument

* Update docs

* Improve documentation to explain what is different
* fix `equal_to` in `PrimitiveGroupValueBuilder`.

* fix typo.

* add uts.

* reduce calling of `is_null`.
alamb and others added 13 commits October 6, 2024 11:33
* Fix stack overflow calculating projected orderings

* fix docs
* Update to arrow/parquet 53.1.0

* Update some API

* update for changed file sizes

* Use non deprecated APIs

* Use ParquetMetadataReader from @etseidl

* remove upstreamed implementation

* Update CSV schema

* Use upstream is_null and is_not_null kernels
* Add support for serializing and deserializing Substrait ExtendedExpr message

* Address clippy reviews

* Reuse existing rename method
…ache#12571)

* Fix grouping sets behavior when data contains nulls

* PR suggestion comment

* Update new test case

* Add grouping_id to the logical plan

* Add doc comment next to INTERNAL_GROUPING_ID

* Fix unparsing of Aggregate with grouping sets

---------

Co-authored-by: Andrew Lamb <[email protected]>
….md to code (apache#12775)

* Added documentation for string and unicode functions.

* Fixed issues with aliases.

* Cargo fmt.

* Minor doc fixes.

* Update docs for var_pop/samp

---------

Co-authored-by: Andrew Lamb <[email protected]>
…integer (apache#12751)

* fix sig

Signed-off-by: jayzhan211 <[email protected]>

* fix

Signed-off-by: jayzhan211 <[email protected]>

* fix error

Signed-off-by: jayzhan211 <[email protected]>

* fix all signature

Signed-off-by: jayzhan211 <[email protected]>

* fix all signature

Signed-off-by: jayzhan211 <[email protected]>

* change default type

Signed-off-by: jayzhan211 <[email protected]>

* clippy

Signed-off-by: jayzhan211 <[email protected]>

* fix docs

Signed-off-by: jayzhan211 <[email protected]>

* rm deadcode

Signed-off-by: jayzhan211 <[email protected]>

* cleanup

Signed-off-by: jayzhan211 <[email protected]>

* cleanup

Signed-off-by: jayzhan211 <[email protected]>

* rm test

Signed-off-by: jayzhan211 <[email protected]>

---------

Signed-off-by: jayzhan211 <[email protected]>
@github-actions github-actions bot added documentation Improvements or additions to documentation sql SQL Planner development-process Related to development process of DataFusion logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) substrait Changes to the substrait crate common Related to common crate proto Related to proto crate functions Changes to functions implementation labels Oct 8, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me @notfilippo -- thanks.

The CI appears to be failing on the logical-types branch -- perhaps you can make a follow on PR to fix that?

@alamb alamb merged commit f475a0f into apache:logical-types Oct 8, 2024
24 of 28 checks passed
@alamb
Copy link
Contributor

alamb commented Oct 8, 2024

Merging as this is to a feature branch

@notfilippo
Copy link
Contributor Author

The CI appears to be failing on the logical-types branch -- perhaps you can make a follow on PR to fix that?

Filed this PR: #12820

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate core Core DataFusion crate development-process Related to development process of DataFusion documentation Improvements or additions to documentation functions Changes to functions implementation logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Changes to the physical-expr crates proto Related to proto crate sql SQL Planner sqllogictest SQL Logic Tests (.slt) substrait Changes to the substrait crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.