Skip to content

Conversation

trueleo
Copy link
Contributor

@trueleo trueleo commented Apr 30, 2025

Summary

This PR introduces early support for nested types, such as Struct and List, similar to those found in DataFusion. The primary goal is to explore and validate an approach for encoding these complex types.

Changes

  • Refactored encode_value to support recursion, enabling encoding of arbitrary nested types.
  • Initial handling logic for Struct and List types (not fully tested yet).

Note

I couldn't find any official information about what client's are suppose to accept. Struct binary encoding is based on the logic found here

@trueleo
Copy link
Contributor Author

trueleo commented May 7, 2025

@sunng87 Current integration test is not good enough to make sure this feature works. Do you have any ideas on how we can improve tests for this ?

Specifically testing for binary/text encoding and other features like information schema

@trueleo trueleo marked this pull request as ready for review May 7, 2025 08:09
@sunng87
Copy link
Member

sunng87 commented May 7, 2025

@trueleo Can we retrieve the return value and do assertion on it?

@trueleo
Copy link
Contributor Author

trueleo commented May 7, 2025

@sunng87 I think the following can work out here.

In extended query doc, there is this note.

The choice between text and binary output is determined by the format codes given in Bind, regardless of the SQL command involved. The BINARY attribute in cursor declarations is irrelevant when using extended query protocol.

If a postgres client's can pick what format to return query data in, then the assertions can be done inside the integration test. This seems possible with python psycopg3
https://www.psycopg.org/psycopg3/docs/basic/params.html#binary-parameters-and-results

I will test this out and add it to current integration test if this works out.

[date(2012, 1, 1), None, date(2012, 1, 2)],
[datetime(2012, 1, 1), None, datetime(2012, 1, 2)],
(
(1, 1.0, "a", True, date(2012, 1, 1), datetime(2012, 1, 1))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sunng87 I added an integration test. This test passes for both binary and text format but with some hiccups. In binary format, it seems psycopg3 stringifies fields inside struct type.

@trueleo
Copy link
Contributor Author

trueleo commented May 23, 2025

@sunng87 Awaiting you review on this. I think this can be merged as it is but no hurries if you're busy. This PR does not cover list<list> types and beyond. Will like to know your thoughts and nitpicks.

@sunng87
Copy link
Member

sunng87 commented May 23, 2025

Sorry for the delay. I'm running low with my bandwidth. I will try time in this weekend or next week for this. Thank you!

Copy link
Member

@sunng87 sunng87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks quite good to me. There may be some minor clones can be eliminated but I will take over and refactor it by myself. Thank you for patch!

@sunng87 sunng87 merged commit ad15769 into datafusion-contrib:master May 25, 2025
6 checks passed
@trueleo trueleo deleted the support_struct branch June 9, 2025 09:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants