-
Notifications
You must be signed in to change notification settings - Fork 150
tree-in-dcache stuff #10249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tree-in-dcache stuff #10249
Conversation
|
Upstream branch: 026bcf9 |
AI reviewed your patch. Please fix the bug or email reply why it's not a bug. In-Reply-To-Subject: |
AI reviewed your patch. Please fix the bug or email reply why it's not a bug. In-Reply-To-Subject: |
|
Forwarding comment 3515386521 via email |
|
Forwarding comment 3515389253 via email |
fuse_ctl_remove_conn() used to decrement the link count of root manually; that got subsumed by simple_recursive_removal(), but in case when subdirectory creation has failed the latter won't get called. Just move the modification of parent's link count into fuse_ctl_add_dentry() to keep the things simple. Allows to get rid of the nlink argument as well... Fixes: fcaac5b "fuse_ctl: use simple_recursive_removal()" Signed-off-by: Al Viro <[email protected]> Acked-by: Miklos Szeredi <[email protected]>
If we have LOCKDOWN_TRACEFS, the function bails out - *after* having locked the parent directory and without bothering to undo that. Just check it before tracefs_start_creating()... Fixes: e247094 "tracefs/eventfs: Add missing lockdown checks" Acked-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Al Viro <[email protected]>
simple_recursive_removal(), but instead of victim dentry it takes parent + name. Used to be open-coded in fs/fuse/control.c, but there's no need to expose the guts of that thing there and there are other potential users, so let's lift it into libfs... Signed-off-by: Al Viro <[email protected]> Acked-by: Miklos Szeredi <[email protected]>
should be paired with simple_start_creating() - unlocks parent and drops dentry reference. Signed-off-by: Al Viro <[email protected]>
Some filesystems use a kinda-sorta controlled dentry refcount leak to pin dentries of created objects in dcache (and undo it when removing those). Reference is grabbed and not released, but it's not actually _stored_ anywhere. That works, but it's hard to follow and verify; among other things, we have no way to tell _which_ of the increments is intended to be an unpaired one. Worse, on removal we need to decide whether the reference had already been dropped, which can be non-trivial if that removal is on umount and we need to figure out if this dentry is pinned due to e.g. unlink() not done. Usually that is handled by using kill_litter_super() as ->kill_sb(), but there are open-coded special cases of the same (consider e.g. /proc/self). Things get simpler if we introduce a new dentry flag (DCACHE_PERSISTENT) marking those "leaked" dentries. Having it set claims responsibility for +1 in refcount. The end result this series is aiming for: * get these unbalanced dget() and dput() replaced with new primitives that would, in addition to adjusting refcount, set and clear persistency flag. * instead of having kill_litter_super() mess with removing the remaining "leaked" references (e.g. for all tmpfs files that hadn't been removed prior to umount), have the regular shrink_dcache_for_umount() strip DCACHE_PERSISTENT of all dentries, dropping the corresponding reference if it had been set. After that kill_litter_super() becomes an equivalent of kill_anon_super(). Doing that in a single step is not feasible - it would affect too many places in too many filesystems. It has to be split into a series. Here we * introduce the new flag * teach shrink_dcache_for_umount() to handle it (i.e. remove and drop refcount on anything that survives to umount with that flag still set) * teach kill_litter_super() that anything with that flag does *not* need to be unpinned. Next commits will add primitives for maintaing that flag and convert the common helpers to those. After that - a long series of per-filesystem patches converting to those primitives. Signed-off-by: Al Viro <[email protected]>
* d_make_persistent(dentry, inode) - bump refcount, mark persistent and make hashed positive. Return value is a borrowed reference to dentry; it can be used until something removes persistency (at the very least, until the parent gets unlocked, but some filesystems may have stronger exclusion). * d_make_discardable() - remove persistency mark and drop reference. d_make_persistent() is similar to combination of d_instantiate(), dget() and setting flag. The only difference is that unlike d_instantiate() it accepts hashed and unhashed negatives alike. It is always called in strong locking environment (parent held exclusive, or, in some cases, dentry coming from d_alloc_name()); if we ever start using it with parent held only shared and dentry coming from d_alloc_parallel(), we'll need to copy the in-lookup logics from __d_add(). d_make_discardable() is eqiuvalent to combination of removing flag and dput(); since flag removal requires ->d_lock, there's no point trying to avoid taking that for refcount decrement as fast_dput() does. The slow path of dput() has been taken into a helper and reused in d_make_discardable() instead. Signed-off-by: Al Viro <[email protected]>
Note that simple_unlink() et.al. are used by many filesystems; for now they can not assume that persistency mark will have been set back when the object got created. Once all conversions are done we'll have them complain if called for something that had not been marked persistent. Signed-off-by: Al Viro <[email protected]>
Quite a bit is already done by infrastructure changes (simple_link(),
simple_unlink()) - all that is left is replacing d_instantiate() +
pinning dget() (in ->symlink() and ->mknod()) with d_make_persistent(),
and, in case of shmem, using simple_unlink() and simple_link() in
->unlink() and ->link() resp., instead of open-coding those there.
Since d_make_persistent() accepts (and hashes) unhashed ones, shmem
situation gets simpler - we no longer care whether ->lookup() has hashed
the sucker.
With that done, we don't need kill_litter_super() for these filesystems
anymore - by the umount time all remaining dentries will be marked
persistent and kill_litter_super() will boil down to call of
kill_anon_super().
The same goes for devtmpfs and rootfs - they are handled by
ramfs or by shmem, depending upon config.
NB: strictly speaking, both devtmpfs and rootfs ought to use
ramfs_kill_sb() if they end up using ramfs; that's a separate
story and the only impact of "just use kill_{litter,anon}_super()"
is that we fail to free their sb->s_fs_info... on reboot.
That's orthogonal to the changes in this series - kill_litter_super()
is identical to kill_anon_super() for those at this point.
Signed-off-by: Al Viro <[email protected]>
... and there's no need to remember those pointers anywhere - ->kill_sb() no longer needs to bother since kill_anon_super() will take care of them anyway and proc_pid_readdir() only wants the inumbers, which we had in a couple of static variables all along. Signed-off-by: Al Viro <[email protected]>
These are guaranteed to be empty by the time they are shut down; both are single-instance and there is an internal mount maintained for as long as there is any contents. Both have that internal mount pinned by every object in root. In other words, kill_litter_super() boils down to kill_anon_super() for those. Reviewed-by: Joel Becker <[email protected]> Acked-by: Paul Moore <paul@paul-moore> (LSM) Acked-by: Andreas Hindborg <[email protected]> (configfs) Signed-off-by: Al Viro <[email protected]>
entirely static tree, populated by simple_fill_super(). Can switch to kill_anon_super() without any other changes. Signed-off-by: Al Viro <[email protected]>
Entirely static tree populated by simple_fill_super(). Can use kill_anon_super() as-is. Acked-by: Casey Schaufler <[email protected]> Signed-off-by: Al Viro <[email protected]>
Very much ramfs-like; dget()+d_instantiate() -> d_make_persistent() (in two places) is all it takes. NB: might make sense to turn its ->put_super() into ->kill_sb(). Signed-off-by: Al Viro <[email protected]>
All modifications via normal VFS codepaths; just take care of making persistent in in mqueue_create_attr() and discardable in mqueue_unlink() and it doesn't need kill_litter_super() at all. mqueue_unlink() side is best handled by having it call simple_unlink() rather than duplicating its guts... Signed-off-by: Al Viro <[email protected]>
object creation goes through the normal VFS paths or approximation
thereof (user_path_create()/done_path_create() in case of bpf_obj_do_pin(),
open-coded simple_{start,done}_creating() in bpf_iter_link_pin_kernel()
at mount time), removals go entirely through the normal VFS paths (and
->unlink() is simple_unlink() there).
Enough to have bpf_dentry_finalize() use d_make_persistent() instead
of dget() and we are done.
Convert bpf_iter_link_pin_kernel() to simple_{start,done}_creating(),
while we are at it.
Signed-off-by: Al Viro <[email protected]>
All modifications via normal VFS codepaths; just take care of making persistent in ->create() and ->mkdir() and that's it (removal side doesn't need any changes, since it uses simple_rmdir() for ->rmdir() and calls simple_unlink() from ->unlink()). Signed-off-by: Al Viro <[email protected]>
objects are created in fuse_ctl_add_dentry() by d_alloc_name()+d_add(), removed by simple_remove_by_name(). What we return is a borrowed reference - it is valid until the call of fuse_ctl_remove_conn() and we depend upon the exclusion (on fuse_mutex) for safety. Return value is used only within the caller (fuse_ctl_add_conn()). Replace d_add() with d_make_persistent() + dput(). dput() is paired with d_alloc_name() and return value is the result of d_make_persistent(). Signed-off-by: Al Viro <[email protected]> Acked-by: Miklos Szeredi <[email protected]>
object creation by d_alloc_name()+d_add() in pstore_mkfile(), removal - via normal VFS codepaths (with ->unlink() using simple_unlink()) or in pstore_put_backend_records() via locked_recursive_removal() Replace d_add() with d_make_persistent()+dput() - that's what really happens there. The reference that goes into record->dentry is valid only until the unlink (and explicitly cleared by pstore_unlink()). Reviewed-by: Kees Cook <[email protected]> Signed-off-by: Al Viro <[email protected]>
A mix of persistent and non-persistent dentries in there. Strictly speaking, no need for kill_litter_super() anyway - it pins an internal mount whenever a persistent dentry is created, so at fs shutdown time there won't be any to deal with. However, let's make it explicit - replace d_instantiate() with d_make_persistent() + dput() (the latter in tracefs_end_creating(), where it folds with inode_unlock() into simple_done_creating()) for dentries we want persistent and have d_make_discardable() done either by simple_recursive_removal() (used by tracefs_remove()) or explicitly in eventfs_remove_events_dir(). Acked-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Al Viro <[email protected]>
similar to tracefs - simulation of normal codepath for creation, simple_recursive_removal() for removal. Acked-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Al Viro <[email protected]>
It's called once, during binderfs mount, right after allocating root dentry. Checking that it hadn't been already called is only obfuscating things. Looks like that bogosity had been copied from devpts... Signed-off-by: Al Viro <[email protected]>
Objects are created either by d_alloc_name()+d_add() (in binderfs_ctl_create()) or by simple_start_creating()+d_instantiate(). Removals are by simple_recurisive_removal(). Switch d_add()/d_instantiate() to d_make_persistent() + dput(). Voila - kill_litter_super() is not needed anymore. Fold dput()+unlocking the parent into simple_done_creating(), while we are at it. NOTE: return value of binderfs_create_file() is borrowed; it may get stored in proc->binderfs_entry. See binder_release()... Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Al Viro <[email protected]>
creation/removal is via normal VFS paths; make ->mkdir() and ->symlink() use d_make_persistent(); ->rmdir() and ->unlink() - d_make_discardable() instead of dput() and that's it. d_make_persistent() works for unhashed just fine... Note that only persistent dentries are ever hashed there; unusual absense of ->d_delete() in dentry_operations is due to that - anything that has refcount reach 0 will be unhashed there, so it won't get to checking ->d_delete anyway. Signed-off-by: Al Viro <[email protected]>
removals are done with locked_recursive_removal(); switch creations to simple_start_creating()/d_make_persistent()/simple_done_creating() and take them to a helper (add_entry()), while we are at it - simpler control flow that way. Signed-off-by: Al Viro <[email protected]>
Don't bother to store the dentry of /policy_capabilities - it belongs to invariant part of tree and we only use it to populate that directory, so there's no reason to keep it around afterwards. Same situation as with /avc, /ss, etc. There are two directories that get replaced on policy load - /class and /booleans. These we need to stash (and update the pointers on policy reload); /policy_capabilities is not in the same boat. Acked-by: Paul Moore <[email protected]> Reviewed-by: Stephen Smalley <[email protected]> Tested-by: Stephen Smalley <[email protected]> Signed-off-by: Al Viro <[email protected]>
allocating dentry after the inode has been set up reduces the amount of boilerplate - "attach this inode under that name and this parent or drop inode in case of failure" simplifies quite a few places. Acked-by: Paul Moore <[email protected]> Reviewed-by: Stephen Smalley <[email protected]> Tested-by: Stephen Smalley <[email protected]> Signed-off-by: Al Viro <[email protected]>
Tree has invariant part + two subtrees that get replaced upon each policy load. Invariant parts stay for the lifetime of filesystem, these two subdirs - from policy load to policy load (serialized on lock_rename(root, ...)). All object creations are via d_alloc_name()+d_add() inside selinuxfs, all removals are via simple_recursive_removal(). Turn those d_add() into d_make_persistent()+dput() and that's mostly it. Acked-by: Paul Moore <[email protected]> Reviewed-by: Stephen Smalley <[email protected]> Tested-by: Stephen Smalley <[email protected]> Signed-off-by: Al Viro <[email protected]>
No need to return dentry from ffs_sb_create_file() or keep it around afterwards. Acked-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Al Viro <[email protected]>
All files are regular; ep0 is there all along, other ep* may appear and go away during the filesystem lifetime; all of those are guaranteed to be gone by the time we umount it. Object creation is in ffs_sb_create_file(), removals - at ->kill_sb() time (for ep0) or by simple_remove_by_name() from ffs_epfiles_destroy() (for the rest of them). Switch ffs_sb_create_file() to simple_start_creating()/d_make_persistent()/ simple_done_creating() and that's it. Signed-off-by: Al Viro <[email protected]>
No need to return dentry from gadgetfs_create_file() or keep it around afterwards. Signed-off-by: Al Viro <[email protected]>
same as functionfs Signed-off-by: Al Viro <[email protected]>
hypfs dentries end up with refcount 2 when they are not busy. Refcount 1 is enough to keep them pinned, and going that way allows to simplify things nicely: * don't need to drop an extra reference before the call of kill_litter_super() in ->kill_sb(); all we need there is to reset the cleanup list - everything on it will be taken out automatically. * we can make use of simple_recursive_removal() on tree rebuilds; just make sure that only children of root end up in the cleanup list and hypfs_delete_tree() becomes much simpler Signed-off-by: Al Viro <[email protected]>
Every single caller only cares about PTR_ERR_OR_ZERO() of return value... Signed-off-by: Al Viro <[email protected]>
same story as for hypfs_create_str() Signed-off-by: Al Viro <[email protected]>
just have hypfs_create_file() do the usual simple_start_creating()/ d_make_persistent()/simple_done_creating() and that's it Signed-off-by: Al Viro <[email protected]>
Just use d_make_persistent() + dput() (and fold the latter into simple_finish_creating()) and that's it... NOTE: pipe->dentry is a borrowed reference - it does not contribute to dentry refcount. Signed-off-by: Al Viro <[email protected]>
One instance per net-ns. There's a fixed subset (several files in root, an optional symlink in root + initially empty /clients/) + per-client subdirectory in /clients/. Clients can appear only after the filesystem is there and they are all gone before it gets through ->kill_sb(). Fixed subset created in fill_super(), regular files by simple_fill_super(), then a subdirectory and a symlink - manually. It is removed by kill_litter_super(). Per-client subdirectories are created by nfsd_client_mkdir() (populated with client-supplied list of files in them). Removed by nfsd_client_rmdir(), which is simple_recursive_removal(). All dentries except for the ones from simple_fill_super() come from * nfsd_mkdir() (subdirectory, dentry from simple_start_creating()). Called from fill_super() (creates initially empty /clients) and from nfsd_client_mkdir (creates a per-client subdirectory in /clients). * _nfsd_symlink() (symlink, dentry from simple_start_creating()), called from fill_super(). * nfsdfs_create_files() (regulars, dentry from simple_start_creating()), called only from nfsd_client_mkdir(). Turn d_instatiate() + inode_unlock() into d_make_persistent() + simple_done_creating() in nfsd_mkdir(), _nfsd_symlink() and nfsdfs_create_files() and we are done. Signed-off-by: Al Viro <[email protected]>
Parallel to binderfs stuff: * use simple_start_creating()/simple_done_creating()/d_make_persistent() instead of manual inode_lock()/lookup_noperm()/d_instanitate()/inode_unlock(). * allocate inode first - simpler cleanup that way. * use simple_recursive_removal() instead of open-coding it. * switch to kill_anon_super() Signed-off-by: Al Viro <[email protected]>
|
Upstream branch: 026bcf9 |
Not used anymore. Signed-off-by: Al Viro <[email protected]>
securityfs uses simple_recursive_removal(), but does not bother to mark
dentries persistent. This is the only place where it still happens; get
rid of that irregularity.
* use simple_{start,done}_creating() and d_make_persitent(); kill_litter_super()
use was already gone, since we empty the filesystem instance before it gets
shut down.
Acked-by: Paul Moore <[email protected]>
Signed-off-by: Al Viro <[email protected]>
it's an unused alias for securityfs_remove() Acked-by: Paul Moore <[email protected]> Signed-off-by: Al Viro <[email protected]>
At this point there are very few call chains that might lead to d_make_discardable() on a dentry that hadn't been made persistent: calls of simple_unlink() and simple_rmdir() in configfs and apparmorfs. Both filesystems do pin (part of) their contents in dcache, but they are currently playing very unusual games with that. Converting them to more usual patterns might be possible, but it's definitely going to be a long series of changes in both cases. For now the easiest solution is to have both stop using simple_unlink() and simple_rmdir() - that allows to make d_make_discardable() warn when given a non-persistent dentry. Rather than giving them full-blown private copies (with calls of d_make_discardable() replaced with dput()), let's pull the parts of simple_unlink() and simple_rmdir() that deal with timestamps and link counts into separate helpers (__simple_unlink() and __simple_rmdir() resp.) and have those used by configfs and apparmorfs. Signed-off-by: Al Viro <[email protected]>
941587d to
4573275
Compare
|
At least one diff in series https://patchwork.kernel.org/project/netdevbpf/list/?series=1021889 irrelevant now. Closing PR. |
Pull request for series with
subject: tree-in-dcache stuff
version: 3
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=1021889