Make a faster way to check column existence in optimizer (not `is_err()`)

**Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
Related to https://github.com/apache/arrow-datafusion/issues/5157

There are many places in the code that use fallible functions on `DFSchema` to check if a column exists:
https://docs.rs/datafusion-common/18.0.0/datafusion_common/struct.DFSchema.html#method.index_of
https://docs.rs/datafusion-common/18.0.0/datafusion_common/struct.DFSchema.html#method.index_of_column_by_name
https://docs.rs/datafusion-common/18.0.0/datafusion_common/struct.DFSchema.html#method.field_from_column

For example, there is code that looks like this (call `is_ok()` or `is_err()`and totally discards the error with the string)
```rust
input_schema.field_from_column(col).is_ok()
```

This is problematic because they return a DataFusionError that not only has an allocated `String` but also often has gone through a lot of effort to construct a nice error message. You can see them appearing in the trace on https://github.com/apache/arrow-datafusion/issues/5157

As part of making the optimizer faster Related to https://github.com/apache/arrow-datafusion/issues/5157 we need to avoid these string allocations,

Thus I propose:

1. Add new functions for checking that return a bool rather than an error
2. Replace the use of `is_err()` with 

Find the field with the given qualified column

For example, 
```rust
impl DFSchema {
  // existing function that returns Result
  pub fn field_from_column(&self, column: &Column) -> Result<&DFField> {...}

  // new function that returns bool  <---- Add this new function
  pub fn has_column(&self, column: &Column) -> bool {...}
}
```

And then replace in the code that have the pattern

```rust
input_schema.field_from_column(col).is_ok()
```

With 
```rust
input_schema.has_column(col)
```



**Describe the solution you'd like**
Ideally someone would do this transition one function on DFSchema at a time (not one giant PR please 🙏 )

**Describe alternatives you've considered**
There are more involved proposals for larger changes to DFSchema but simply avoiding this check might help a lot

**Additional context**
I think this is a good first exercise as the desire is well spelled out and it is a software engineering exercise rather than requires deep datafusion expertise

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make a faster way to check column existence in optimizer (not `is_err()`) #5309

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Make a faster way to check column existence in optimizer (not is_err()) #5309

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Make a faster way to check column existence in optimizer (not `is_err()`) #5309