-
Notifications
You must be signed in to change notification settings - Fork 47
Extended workaround for handling symlinks in tar files #163
Conversation
I would be happy about people testing this on non-windows OS. |
It seems as if the verbose output of the tar listing on Mac-OS is different to that of a linux shell. |
It seems I didn't have a proper example... |
Looks ok for me now :-) |
@staticfloat who is deciding on merging? If a symlink starts with four numbers followed by a blank, it might be a problem ... Even more robust would be to first unzip, add a marker file to the tar and then list the content... |
…gen_7z` in order to avoid the "world-age problem"
The last two changes were made to make the symlink parsing more robust as there are many formats around which might be even language dependent. Idea Note I consider this the last version for this PR and would recommend merging into master. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this works well (we have an explicit test, so if it passes tests, it should) we should just dump the first copyderef code path and use only this one; it is clearly superior in pretty much every way, so I suggest just deleting the old codepaths, getting rid of the 2
at the end of your settings, and calling it good!
Thank you for your effort, and I'm sorry that it took me so long to review this!
src/Prefix.jl
Outdated
error("Could not list contents of tarball $(tarball_path)") | ||
end | ||
output = collect_stdout(oc) | ||
# @info("copyderef2: \r\n" * output) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Zap this comment
src/PlatformEngines.jl
Outdated
gen_7z = (p) -> (unpack_7z(p), package_7z(p), list_7z(p), parse_7z_list, r"Path = ([^\r\n]+)\r?\n(?:[^\r\n]+\r?\n)+Symbolic Link = ([^\r\n]+)"s) | ||
compression_engines = Tuple[] | ||
|
||
write("demoTar.txt","Demo file for tar listing") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a temporary file created with tempname()
; otherwise we are potentially overwriting user files.
…cument regex, modify excludeoption to also accept files with spaces
…s accidentilly removed) and moved it to the correct location above `probe_symlink_creation(dest)` (was wrong before!)
src/PlatformEngines.jl
Outdated
Jjz = "j" | ||
end | ||
return `$tar_cmd -x$(Jjz)f $(tarball_path) --directory=$(out_path) $(excludelist=="" ? [] : ["--exclude-from=" * excludelist])` | ||
excludeoption = excludelist=="" ? `` : `--exclude-from='$excludelist'` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use backticks when interpolating into other backticks; just use Strings.
src/PlatformEngines.jl
Outdated
# the correct 7z given the path to the executable: | ||
unpack_7z = (exe7z) -> begin | ||
return (tarball_path, out_path) -> | ||
return (tarball_path, out_path, excludelist = "") -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's default excludelist
to nothing
src/PlatformEngines.jl
Outdated
end | ||
# Some tar's aren't smart enough to auto-guess decompression method. :( | ||
unpack_tar = (tarball_path, out_path) -> begin | ||
unpack_tar = (tarball_path, out_path, excludelist = "") -> begin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's default excludelist
to nothing
src/PlatformEngines.jl
Outdated
# This is to work around filesystems that are mounted (such as SMBFS filesystems) | ||
# that do not support symlinks. | ||
|
||
excludelist = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default excludelist
to nothing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I digged a bit into interpolation from commands and I think I now understand the behavior:
x = ``
`command -o$x` # will evaluate to `command` and drop the option completely
x = "my file"
`command -o$x` # will evaluate to `command '-omy file'`
So what about defaulting excludelist::Union{AbstractString, Cmd} = ``
Then we can skip the excludelist == nothing ? ... : ...
and simply use
pipeline(`$exe7z x $(tarball_path) -y -so`,
`$exe7z x -si -y -ttar -o$(out_path) -x@$(excludelist)`)
and
return `$tar_cmd -x$(Jjz)f $(tarball_path) --directory=$(out_path) --exclude-from=$(excludelist)`
I rewrote it and it works like a charm ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what's happening here is a kind of surprising interaction in the Cmd
parser where it's seeing command -o$x
as a cartesian product between arrays to be expanded, as described in the docs here. Essentially, it looks at the Cmd
like a container, and tries to iterate over it, but the empty Cmd
is seen as a zero-dimensional object, and this disables all iteration, so we never see the first part (the "-o"
).
This is not very intuitive at all, and I would not be surprised if this changes in a future version of Julia, so I think it's best not to rely on it. Let's not use Cmd
objects at all when interpolating into other Cmd
objects and only use Strings; that's what is documented best, and is least likely to change in a future version.
Using nothing
as the sentinel value for "do nothing" is pretty idiomatic, so I think thats better than using something else like a Cmd
or an empty string; it fits in better with the rest of the APIs in Julia.
Thanks for putting so much work into this!
…} and defaulting it to ``
…ws 10: option "-f -" for stdin/stdout, removed \r in parse_tar_list
I can easily agree to your comment, so I reverted my changes. During testing I realised that Windows 10 has its own tar command which outputs \r\n. The tests at 603-605 of runtests.jl don't pass if one choses
fails due to different fileseparators. One could do something like
but I see that it's less elegant ;-) So please comment, whether these two changes (regex and runtest.jl) should be made. |
src/PlatformEngines.jl
Outdated
|
||
""" | ||
gen_unpack_cmd(tarball_path::AbstractString, out_path::AbstractString; excludelist::Union{AbstractString, Cmd} = ``) | ||
gen_unpack_cmd(tarball_path::AbstractString, out_path::AbstractString; excludelist::AbstractString = nothing) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nothing
is not an AbstractString
, so this won't work; you need to have excludelist
be a Union{AbstractString, Nothing}
.
src/PlatformEngines.jl
Outdated
automatically called upon first import of `BinaryProvider`. | ||
""" | ||
gen_unpack_cmd = (tarball_path::AbstractString, out_path::AbstractString; excludelist::Union{AbstractString, Cmd} = ``) -> | ||
gen_unpack_cmd = (tarball_path::AbstractString, out_path::AbstractString; excludelist::AbstractString = nothing) -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stupid me! Thanks for your vigilance and patience - it's my first PR ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're doing a great job! Keep up the good work!
I think the cleanest solution for the file path separation issue on Windows is to translate the filenames to be consistent across all platforms within |
Oops, needs some rework, probably tomorrow ... Just to make my point clear: The tar-listing is ok and it is formatted with "/". The test |
The regex that I had unintentionally left in a modified version of tar_list should have been |
I just realised that there is no mechanism for erroring out in case that
Btw, all error messages in the existing code still use Also, if you find my regex explanations too verbose, I might cut them down. Or feel free to redact them yourself to your own taste. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think we should make the error messages and error conditions consistent, but let's do that in a separate PR. Put the correct error message within this new method you've added here, but then correct the other things in a separate PR.
|
||
# Global variable that tells us whether tempdir() can have symlinks | ||
# created within it. | ||
tempdir_symlink_creation = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every time a global
is removed from a file, an angel sheds a single tear of joy.
I'm not certain why the tests are failing on Windows. I may need to dig into that; it's possible this symlink thing isn't working for one of the test cases, but I don't have time to dig into it right now. If you have access to a windows machine and can run the test cases locally, that might help. |
Tests are failing only, if I manually chose tar to be the compression engine. Tar on windows behaves like a regular unix-y tar and lists its contents with regular slashes, whereas 7z uses backslashes. |
Thanks @hhaensel! I will correct the |
I am really happy to see this merged - and I am a bit proud, as this is my first PR ever :-) |
I propose another workaround additional to
copyderef
.copyderef2
is automatically detected in case that neithertempdir
nordest
supports symlinks.In that case the symlinks of the tarfile are returned by the new function
list_tarball_symlinks
.For this purpose the generic
list
functions have been modified to support a keyword argumentverbose
.Regular expressions are used to parse the output and return a dictionary of symlinks.
The symlinks are exluded during extraction and the original source files of the symlinks are copied after extraction.