Skip to content
This repository was archived by the owner on Aug 16, 2023. It is now read-only.

Conversation

hhaensel
Copy link
Contributor

I propose another workaround additional to copyderef.

copyderef2 is automatically detected in case that neither tempdir nor dest supports symlinks.
In that case the symlinks of the tarfile are returned by the new function list_tarball_symlinks.
For this purpose the generic list functions have been modified to support a keyword argument verbose.
Regular expressions are used to parse the output and return a dictionary of symlinks.
The symlinks are exluded during extraction and the original source files of the symlinks are copied after extraction.

@hhaensel
Copy link
Contributor Author

hhaensel commented May 28, 2019

I would be happy about people testing this on non-windows OS.
copyderef2 can be forced by an environment variable BINARYPROVIDER_COPYDEREF2, which needs to be set to true.
(ImageMagick uses symlinks ;-) )
Update:
Just realised that this functionality is tested by runtests.jl. But in order to work, line 777 needs to be modified:
"-" withenv("BINARYPROVIDER_COPYDEREF" => "true") do
"+" withenv("BINARYPROVIDER_COPYDEREF2" => "true") do

@hhaensel
Copy link
Contributor Author

It seems as if the verbose output of the tar listing on Mac-OS is different to that of a linux shell.
Therefore, my regex fails to identify the symlinks :-(
Tomorrow I will know, how a listing looks like and possibly adapt my regex...

@hhaensel
Copy link
Contributor Author

It seems I didn't have a proper example...

@hhaensel
Copy link
Contributor Author

Looks ok for me now :-)

@hhaensel
Copy link
Contributor Author

@staticfloat who is deciding on merging?
The regex should match most tar cases.

If a symlink starts with four numbers followed by a blank, it might be a problem ...
One idea to make it more robust is to create one tar list example by
echo a > a.txt && tar -cz a.txt | tar -tzv and then determine the date pattern of the tar output...

Even more robust would be to first unzip, add a marker file to the tar and then list the content...
But we could leave this up to later, if this comes up as a problem.

@hhaensel
Copy link
Contributor Author

hhaensel commented Jun 5, 2019

The last two changes were made to make the symlink parsing more robust as there are many formats around which might be even language dependent.

Idea
Check the listing of a demo file via `tar -c demoTar.txt | tar -tv`` and determine the number of parameters before the filename and finally construct a regex that filters the listing.

Note
gen_7z needs to contain the regex literally, because splicing into functions has a lot of pitfalls (world-age problem)
For the tar section this is not a problem, because the regex is spliced into the tuple as an element and not as part of a function.

I consider this the last version for this PR and would recommend merging into master.

Copy link
Member

@staticfloat staticfloat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this works well (we have an explicit test, so if it passes tests, it should) we should just dump the first copyderef code path and use only this one; it is clearly superior in pretty much every way, so I suggest just deleting the old codepaths, getting rid of the 2 at the end of your settings, and calling it good!

Thank you for your effort, and I'm sorry that it took me so long to review this!

src/Prefix.jl Outdated
error("Could not list contents of tarball $(tarball_path)")
end
output = collect_stdout(oc)
# @info("copyderef2: \r\n" * output)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zap this comment

gen_7z = (p) -> (unpack_7z(p), package_7z(p), list_7z(p), parse_7z_list, r"Path = ([^\r\n]+)\r?\n(?:[^\r\n]+\r?\n)+Symbolic Link = ([^\r\n]+)"s)
compression_engines = Tuple[]

write("demoTar.txt","Demo file for tar listing")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a temporary file created with tempname(); otherwise we are potentially overwriting user files.

…cument regex, modify excludeoption to also accept files with spaces
Helmut Haensel added 2 commits June 6, 2019 09:48
…s accidentilly removed) and moved it to the correct location above `probe_symlink_creation(dest)` (was wrong before!)
Jjz = "j"
end
return `$tar_cmd -x$(Jjz)f $(tarball_path) --directory=$(out_path) $(excludelist=="" ? [] : ["--exclude-from=" * excludelist])`
excludeoption = excludelist=="" ? `` : `--exclude-from='$excludelist'`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use backticks when interpolating into other backticks; just use Strings.

# the correct 7z given the path to the executable:
unpack_7z = (exe7z) -> begin
return (tarball_path, out_path) ->
return (tarball_path, out_path, excludelist = "") ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's default excludelist to nothing

end
# Some tar's aren't smart enough to auto-guess decompression method. :(
unpack_tar = (tarball_path, out_path) -> begin
unpack_tar = (tarball_path, out_path, excludelist = "") -> begin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's default excludelist to nothing

# This is to work around filesystems that are mounted (such as SMBFS filesystems)
# that do not support symlinks.

excludelist = ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default excludelist to nothing

Copy link
Contributor Author

@hhaensel hhaensel Jun 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I digged a bit into interpolation from commands and I think I now understand the behavior:

x = ``
`command -o$x` # will evaluate to `command` and drop the option completely
x = "my file"
`command -o$x` # will evaluate to `command '-omy file'`

So what about defaulting excludelist::Union{AbstractString, Cmd} = ``
Then we can skip the excludelist == nothing ? ... : ... and simply use

        pipeline(`$exe7z x $(tarball_path) -y -so`,
                 `$exe7z x -si -y -ttar -o$(out_path)  -x@$(excludelist)`)

and

return `$tar_cmd -x$(Jjz)f $(tarball_path) --directory=$(out_path) --exclude-from=$(excludelist)`

I rewrote it and it works like a charm ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what's happening here is a kind of surprising interaction in the Cmd parser where it's seeing command -o$x as a cartesian product between arrays to be expanded, as described in the docs here. Essentially, it looks at the Cmd like a container, and tries to iterate over it, but the empty Cmd is seen as a zero-dimensional object, and this disables all iteration, so we never see the first part (the "-o").

This is not very intuitive at all, and I would not be surprised if this changes in a future version of Julia, so I think it's best not to rely on it. Let's not use Cmd objects at all when interpolating into other Cmd objects and only use Strings; that's what is documented best, and is least likely to change in a future version.

Using nothing as the sentinel value for "do nothing" is pretty idiomatic, so I think thats better than using something else like a Cmd or an empty string; it fits in better with the rest of the APIs in Julia.

Thanks for putting so much work into this!

Helmut Haensel and others added 2 commits June 7, 2019 15:24
…ws 10: option "-f -" for stdin/stdout, removed \r in parse_tar_list
@hhaensel
Copy link
Contributor Author

hhaensel commented Jun 8, 2019

I can easily agree to your comment, so I reverted my changes.

During testing I realised that Windows 10 has its own tar command which outputs \r\n.
Therefore, I have removed trailing \r in parse_tar_list, I hope that's ok.
Alternatively, the parse_tar_list could be done with a regex.
lines = [m.captures[1] for m in eachmatch(r"^(.+?)(?<!/\r|/)\r?$"m, output)]
For 7z this is also possible with
r".+:\d{2}\s+\.\D+\d+\D+\d+\s+(?:\./|\.\)?(.+?)\r?$"m

The tests at 603-605 of runtests.jl don't pass if one choses tar or busybox tar in Windows, since

@test joinpath("bin", "bar.sh") in contents

fails due to different fileseparators. One could do something like

joinpath("etc/qux.conf") in replace.(contents, ["\\" => "/"])

but I see that it's less elegant ;-)

So please comment, whether these two changes (regex and runtest.jl) should be made.
Otherwise, I hope that we are done with this PR...


"""
gen_unpack_cmd(tarball_path::AbstractString, out_path::AbstractString; excludelist::Union{AbstractString, Cmd} = ``)
gen_unpack_cmd(tarball_path::AbstractString, out_path::AbstractString; excludelist::AbstractString = nothing)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nothing is not an AbstractString, so this won't work; you need to have excludelist be a Union{AbstractString, Nothing}.

automatically called upon first import of `BinaryProvider`.
"""
gen_unpack_cmd = (tarball_path::AbstractString, out_path::AbstractString; excludelist::Union{AbstractString, Cmd} = ``) ->
gen_unpack_cmd = (tarball_path::AbstractString, out_path::AbstractString; excludelist::AbstractString = nothing) ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stupid me! Thanks for your vigilance and patience - it's my first PR ;-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're doing a great job! Keep up the good work!

@staticfloat
Copy link
Member

I think the cleanest solution for the file path separation issue on Windows is to translate the filenames to be consistent across all platforms within list_tarball_files(). So far, we've been using UNIX-y tools so this hasn't been a problem, but now that we have Windows-native tools, we need to check. :P So let's just have list_tarball_files always use / since that is valid on all platforms. You can split the filenames by "\\" and then join them with "/" and return that.

@hhaensel
Copy link
Contributor Author

hhaensel commented Jun 10, 2019

Oops, needs some rework, probably tomorrow ...

Just to make my point clear: The tar-listing is ok and it is formatted with "/". The test
@test joinpath("bin", "bar.sh") in contents fails with the windows tar, because the tar uses / and joinpath uses \\

@hhaensel
Copy link
Contributor Author

The regex that I had unintentionally left in a modified version of tar_list should have been
r"^(?:\./|\.\\)?(.+?)(?<!/|/\r)\r?$"m ...
That would have worked. But the existing version is longer code but probably faster, so I left it unchanged except the removal of the \r line endings.

@hhaensel
Copy link
Contributor Author

I just realised that there is no mechanism for erroring out in case that probe_platform_engines!() has not been run.
What do you think about adding

    if !isdefined(BinaryProvider, :gen_symlink_parser)
        error("Call `probe_platform_engines!()` before `list_tarball_symlinks()`")
    end

Btw, all error messages in the existing code still use probe_platform_engines() instead of probe_platform_engines!()
Shall I replace these within that PR?

Also, if you find my regex explanations too verbose, I might cut them down. Or feel free to redact them yourself to your own taste.

Copy link
Member

@staticfloat staticfloat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we should make the error messages and error conditions consistent, but let's do that in a separate PR. Put the correct error message within this new method you've added here, but then correct the other things in a separate PR.


# Global variable that tells us whether tempdir() can have symlinks
# created within it.
tempdir_symlink_creation = false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every time a global is removed from a file, an angel sheds a single tear of joy.

@staticfloat
Copy link
Member

I'm not certain why the tests are failing on Windows. I may need to dig into that; it's possible this symlink thing isn't working for one of the test cases, but I don't have time to dig into it right now. If you have access to a windows machine and can run the test cases locally, that might help.

@hhaensel
Copy link
Contributor Author

hhaensel commented Jun 12, 2019

I'm not certain why the tests are failing on Windows. I may need to dig into that; it's possible this symlink thing isn't working for one of the test cases, but I don't have time to dig into it right now. If you have access to a windows machine and can run the test cases locally, that might help.

Tests are failing only, if I manually chose tar to be the compression engine. Tar on windows behaves like a regular unix-y tar and lists its contents with regular slashes, whereas 7z uses backslashes.
We could change that behavior but I can't judge, whether this would do something unexpected at a different place. In my symlinks workaround I need / for listing in excludelist but I use \ for the copying ...
Anyhow, I will now push the final version of this PR.

@staticfloat staticfloat merged commit 6b34650 into JuliaPackaging:master Jun 12, 2019
@staticfloat
Copy link
Member

Thanks @hhaensel! I will correct the / vs. \ problems in another PR. This has been a long time coming, but I'm glad you stuck it through!

@hhaensel
Copy link
Contributor Author

I am really happy to see this merged - and I am a bit proud, as this is my first PR ever :-)
@staticfloat Thanks for your support! I learned a lot from the discussion we had.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants