Skip to content

Disambiguate duplicated column names when reading data #342

@holgerbrandl

Description

@holgerbrandl

E.g. when using DataFrame.readExcel(), it fails to read sheets with duplicated column names. It's bad practice to have this type of duplication in data, but that's how data ends up on one's desk quite frequently.

kdf should follow the approach implemented in other tabular-data-APIs, to disambiguate (or repair) duplications, e.g by correcting duplicatedcolumn name

  • foo (first appearance)
  • foo_1 (second appearance)
  • foo_2 (third appearance)

Such functionality is also referred to as name-repair strategy, e.g. see https://readr.tidyverse.org/reference/read_delim.html (name_repair)

The function should be applied/provided to/by all DataFrame.read* methods for API consistency.

Optionally, the user could be given more control over the repair strategy.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions