Introduce a `Vec<u8>` based row-wise representation for DataFusion

**Is your feature request related to a problem or challenge? Please describe what you are trying to do.**

**_Many pipeline-breaking operators are inherently row-based:_** 

For sort that would shuffle records around, re-order would cause random memory access patterns for each column in the current columnar organization. The performance will deteriorate as the number of columns grows. Besides, the compound sort key also requires us to access different columns.
On the other hand, row-based representation avoids this problem (performance deteriorates with payload column number growth). we can check [here](https://dl.acm.org/doi/10.1145/1409360.1409380) for more explanations.

For hashtable entries that we buffer aggregation state, we are already utilizing a row-based format indirectly -- We use `Vec<ScalarValue>` as a state for each key. Vector of `ScalarValue` is mostly stored continuously in memory but faced with two kinds of inefficiency: 1. memory overhead introduced by `ScalarValue` enum (16bytes per field according to @alamb); 2. string or other non-primitive values stored on the heap elsewhere and accessed through pointers. 

```text
┌───────────────────────────────────────────────────────┐
│                                                       │
│ ┌────────────────┬────────────────┬────────────────┐  │
│ │  ScalarValue   │  ScalarValue   │  ScalarValue   │  │
│ │    ::Int(5)    │   ::Int(10)    │    ::Int(3)    │  │
│ └────────────────┴────────────────┴────────────────┘  │
│   Hash Table Entry                                    │
│   Vec<ScalarValue>                                    │
└───────────────────────────────────────────────────────┘
 When the keys are primitive values, they are stored     
               contiguously in the Vec                   


                           ┌ ─ ─ ─ ─ ─ ─ ─ ─ ┐           
                                  "foo"                  
                           │(heap allocation)│           
                            ─ ─ ─ ─ ─ ─ ─ ─ ─            
                                    ▲                    
                            ┌───────┘                    
┌───────────────────────────┼───────────────────────────┐
│                           │                           │
│ ┌────────────────┬────────────────┬────────────────┐  │
│ │  ScalarValue   │  ScalarValue   │  ScalarValue   │  │
│ │    ::Int(5)    │ ::Utf8("foo")  │    ::Int(3)    │  │
│ └────────────────┴────────────────┴────────────────┘  │
│   Hash Table Entry                                    │
│   Vec<ScalarValue>                                    │
└───────────────────────────────────────────────────────┘
  When the keys have strings/binary data, the variable   
   length data is stored non contiguously in the Vec     
```
_I quote these two great diagrams above from @alamb. Thanks again!_


For join, whether hash-based or sort-based, would suffer from similar problems as above.

**Describe the solution you'd like**

1. A `Vec<u8>` based representation for tuple, store all columns continuously in memory, for row-logic operations.
2. Efficient coding/decoding method from/to columnar arrow data.
3. Access cells in `vec<u8>` tuple efficiently. 

We could refer to PostgreSQL / DuckDB / Spark for the row format design. But note Spark's `UnsafeRow` incurs a lot of memory overhead due to its 8-byte alignment.

**Describe alternatives you've considered**
Not to introduce or use `Vec<ScalarValue>` with overhead.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Introduce a `Vec<u8>` based row-wise representation for DataFusion #1708

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Introduce a Vec<u8> based row-wise representation for DataFusion #1708

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Introduce a `Vec<u8>` based row-wise representation for DataFusion #1708