Skip to content
This repository was archived by the owner on Nov 11, 2022. It is now read-only.
This repository was archived by the owner on Nov 11, 2022. It is now read-only.

FileBasedSource does not work on Windows #2

@jaroy

Description

@jaroy

Apologies if this is not the right place to submit issues.

The following code

Path path = Files.createTempFile("test", ".txt");
Pipeline pipeline = Pipeline.create(PipelineOptionsFactory.create());
pipeline.apply(TextIO.Read.from(path.toString()));
pipeline.run();

throws an exception on a Windows client:

java.lang.RuntimeException: Failed to read from source: com.google.cloud.dataflow.sdk.runners.worker.TextSource@644baf4a
...
Caused by: java.io.IOException: No match for file pattern 'C:\Users\jroy\AppData\Local\Temp\test8262931969830113037.txt'
at com.google.cloud.dataflow.sdk.runners.worker.FileBasedSource.iterator(FileBasedSource.java:100)

The file exists, so this exception is unexpected.

The bug is in FileIOChannelFactory::match(), which improperly appends the value returned by File::getAbsolutePath() to a glob expression. On Windows, getAbsolutePath() returns the path with backslash path separators, which are interpreted as escape characters by FileSystem::getPathMatcher(). So no matches are returned.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions