-
Notifications
You must be signed in to change notification settings - Fork 76
Closed
Description
E.g. when using DataFrame.readExcel(), it fails to read sheets with duplicated column names. It's bad practice to have this type of duplication in data, but that's how data ends up on one's desk quite frequently.
kdf should follow the approach implemented in other tabular-data-APIs, to disambiguate (or repair) duplications, e.g by correcting duplicatedcolumn name
- foo (first appearance)
- foo_1 (second appearance)
- foo_2 (third appearance)
Such functionality is also referred to as name-repair strategy, e.g. see https://readr.tidyverse.org/reference/read_delim.html (name_repair)
The function should be applied/provided to/by all DataFrame.read* methods for API consistency.
Optionally, the user could be given more control over the repair strategy.
koperagen and Jolanrensen
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working