-
Notifications
You must be signed in to change notification settings - Fork 75
Improve list.chunked()
+ List<List<T>>.toDataFrame
for parsing .srt and similar text files
#1486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I agree that this a popular use-case, I also faced with the same and handled with File API parsing Great, if tiny example, saying, with subtitles will be also added |
public fun <T> List<List<T>>.toDataFrame(containsColumns: Boolean = false): AnyFrame = | ||
@Refine | ||
@Interpretable("ValuesListsToDataFrame") | ||
public fun <T> List<List<T>>.toDataFrame(header: List<String>? = null, containsColumns: Boolean = false): AnyFrame = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a breaking change, can we keep the old function and deprecate it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deprecated and moved from io to api package
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a useful addition :) it's common in excel/csv's too to take the first row as headers unless headers are supplied explicitly, so it makes sense here too I guess
b6c1734
to
995529d
Compare
995529d
to
075e96d
Compare
Creates a [`DataFrame`](DataFrame.md) from an [`Iterable`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/-iterable/) of objects: | ||
#### [`DataFrame`](DataFrame.md) from `List<List<T>>`: | ||
|
||
This is useful for parsing text files. For example, the `.srt` subtitle format can be parsed like this: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool! :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
useful feature and even better example :)
You still have a failing test, but other than that it looks good
list.chunked()
+ List<List<T>>.toDataFrame
use caselist.chunked()
+ List<List<T>>.toDataFrame
for parsing .srt and similar text files
Consider file structured as this:
I believe it's popular, one example is srt, but i personally had to deal with it a lot
It can be parsed into dataframe:
Surprisingly here toDataFrame is not generic Iterable.toDataFrame, but completely another function.

Problem: In current shape it's not helpful.
I end up with something very close to what i want, but required change to code is somewhat non-trivial.
I'd either have to:
Or switch to completely different route:
with compiler plugin
With this API change:
Plugin will understand resulting schema too