<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/fs/afs/callback.c, branch linux-5.1.y</title>
<subtitle>Hosts the 0x221E linux distro kernel.</subtitle>
<id>https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-5.1.y</id>
<link rel='self' href='https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-5.1.y'/>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/'/>
<updated>2019-07-21T07:01:56Z</updated>
<entry>
<title>afs: Fix uninitialised spinlock afs_volume::cb_break_lock</title>
<updated>2019-07-21T07:01:56Z</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2019-06-20T15:49:35Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=7b0ef1ba26048ac3ca00408c6d689a717e65cf58'/>
<id>urn:sha1:7b0ef1ba26048ac3ca00408c6d689a717e65cf58</id>
<content type='text'>
[ Upstream commit 90fa9b64523a645a97edc0bdcf2d74759957eeee ]

Fix the cb_break_lock spinlock in afs_volume struct by initialising it when
the volume record is allocated.

Also rename the lock to cb_v_break_lock to distinguish it from the lock of
the same name in the afs_server struct.

Without this, the following trace may be observed when a volume-break
callback is received:

  INFO: trying to register non-static key.
  the code is fine but needs lockdep annotation.
  turning off the locking correctness validator.
  CPU: 2 PID: 50 Comm: kworker/2:1 Not tainted 5.2.0-rc1-fscache+ #3045
  Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
  Workqueue: afs SRXAFSCB_CallBack
  Call Trace:
   dump_stack+0x67/0x8e
   register_lock_class+0x23b/0x421
   ? check_usage_forwards+0x13c/0x13c
   __lock_acquire+0x89/0xf73
   lock_acquire+0x13b/0x166
   ? afs_break_callbacks+0x1b2/0x3dd
   _raw_write_lock+0x2c/0x36
   ? afs_break_callbacks+0x1b2/0x3dd
   afs_break_callbacks+0x1b2/0x3dd
   ? trace_event_raw_event_afs_server+0x61/0xac
   SRXAFSCB_CallBack+0x11f/0x16c
   process_one_work+0x2c5/0x4ee
   ? worker_thread+0x234/0x2ac
   worker_thread+0x1d8/0x2ac
   ? cancel_delayed_work_sync+0xf/0xf
   kthread+0x11f/0x127
   ? kthread_park+0x76/0x76
   ret_from_fork+0x24/0x30

Fixes: 68251f0a6818 ("afs: Fix whole-volume callback handling")
Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>afs: Fix in-progess ops to ignore server-level callback invalidation</title>
<updated>2019-04-13T07:37:37Z</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2019-04-13T07:37:37Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=eeba1e9cf31d064284dd1fa7bd6cfe01395bd03d'/>
<id>urn:sha1:eeba1e9cf31d064284dd1fa7bd6cfe01395bd03d</id>
<content type='text'>
The in-kernel afs filesystem client counts the number of server-level
callback invalidation events (CB.InitCallBackState* RPC operations) that it
receives from the server.  This is stored in cb_s_break in various
structures, including afs_server and afs_vnode.

If an inode is examined by afs_validate(), say, the afs_server copy is
compared, along with other break counters, to those in afs_vnode, and if
one or more of the counters do not match, it is considered that the
server's callback promise is broken.  At points where this happens,
AFS_VNODE_CB_PROMISED is cleared to indicate that the status must be
refetched from the server.

afs_validate() issues an FS.FetchStatus operation to get updated metadata -
and based on the updated data_version may invalidate the pagecache too.

However, the break counters are also used to determine whether to note a
new callback in the vnode (which would set the AFS_VNODE_CB_PROMISED flag)
and whether to cache the permit data included in the YFSFetchStatus record
by the server.


The problem comes when the server sends us a CB.InitCallBackState op.  The
first such instance doesn't cause cb_s_break to be incremented, but rather
causes AFS_SERVER_FL_NEW to be cleared - but thereafter, say some hours
after last use and all the volumes have been automatically unmounted and
the server has forgotten about the client[*], this *will* likely cause an
increment.

 [*] There are other circumstances too, such as the server restarting or
     needing to make space in its callback table.

Note that the server won't send us a CB.InitCallBackState op until we talk
to it again.

So what happens is:

 (1) A mount for a new volume is attempted, a inode is created for the root
     vnode and vnode-&gt;cb_s_break and AFS_VNODE_CB_PROMISED aren't set
     immediately, as we don't have a nominated server to talk to yet - and
     we may iterate through a few to find one.

 (2) Before the operation happens, afs_fetch_status(), say, notes in the
     cursor (fc.cb_break) the break counter sum from the vnode, volume and
     server counters, but the server-&gt;cb_s_break is currently 0.

 (3) We send FS.FetchStatus to the server.  The server sends us back
     CB.InitCallBackState.  We increment server-&gt;cb_s_break.

 (4) Our FS.FetchStatus completes.  The reply includes a callback record.

 (5) xdr_decode_AFSCallBack()/xdr_decode_YFSCallBack() check to see whether
     the callback promise was broken by checking the break counter sum from
     step (2) against the current sum.

     This fails because of step (3), so we don't set the callback record
     and, importantly, don't set AFS_VNODE_CB_PROMISED on the vnode.

This does not preclude the syscall from progressing, and we don't loop here
rechecking the status, but rather assume it's good enough for one round
only and will need to be rechecked next time.

 (6) afs_validate() it triggered on the vnode, probably called from
     d_revalidate() checking the parent directory.

 (7) afs_validate() notes that AFS_VNODE_CB_PROMISED isn't set, so doesn't
     update vnode-&gt;cb_s_break and assumes the vnode to be invalid.

 (8) afs_validate() needs to calls afs_fetch_status().  Go back to step (2)
     and repeat, every time the vnode is validated.

This primarily affects volume root dir vnodes.  Everything subsequent to
those inherit an already incremented cb_s_break upon mounting.


The issue is that we assume that the callback record and the cached permit
information in a reply from the server can't be trusted after getting a
server break - but this is wrong since the server makes sure things are
done in the right order, holding up our ops if necessary[*].

 [*] There is an extremely unlikely scenario where a reply from before the
     CB.InitCallBackState could get its delivery deferred till after - at
     which point we think we have a promise when we don't.  This, however,
     requires unlucky mass packet loss to one call.

AFS_SERVER_FL_NEW tries to paper over the cracks for the initial mount from
a server we've never contacted before, but this should be unnecessary.
It's also further insulated from the problem on an initial mount by
querying the server first with FS.GetCapabilities, which triggers the
CB.InitCallBackState.


Fix this by

 (1) Remove AFS_SERVER_FL_NEW.

 (2) In afs_calc_vnode_cb_break(), don't include cb_s_break in the
     calculation.

 (3) In afs_cb_is_broken(), don't include cb_s_break in the check.


Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
</content>
</entry>
<entry>
<title>afs: Implement YFS support in the fs client</title>
<updated>2018-10-23T23:41:08Z</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2018-10-19T23:57:58Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=30062bd13e3659a309d249a06d5f4ebb4a5c5251'/>
<id>urn:sha1:30062bd13e3659a309d249a06d5f4ebb4a5c5251</id>
<content type='text'>
Implement support for talking to YFS-variant fileservers in the cache
manager and the filesystem client.  These implement upgraded services on
the same port as their AFS services.

YFS fileservers provide expanded capabilities over AFS.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
</content>
</entry>
<entry>
<title>afs: Remove callback details from afs_callback_break struct</title>
<updated>2018-10-23T23:41:08Z</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2018-10-19T23:57:58Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=06aeb2971457b33c1123af9f307a55f3dc4052c9'/>
<id>urn:sha1:06aeb2971457b33c1123af9f307a55f3dc4052c9</id>
<content type='text'>
Remove unnecessary details of a broken callback, such as version, expiry
and type, from the afs_callback_break struct as they're not actually used
and make the list take more memory.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
</content>
</entry>
<entry>
<title>afs: Increase to 64-bit volume ID and 96-bit vnode ID for YFS</title>
<updated>2018-10-23T23:41:08Z</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2018-10-19T23:57:57Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=3b6492df4153b8550d347dfc581856138678a231'/>
<id>urn:sha1:3b6492df4153b8550d347dfc581856138678a231</id>
<content type='text'>
Increase the sizes of the volume ID to 64 bits and the vnode ID (inode
number equivalent) to 96 bits to allow the support of YFS.

This requires the iget comparator to check the vnode-&gt;fid rather than i_ino
and i_generation as i_ino is not sufficiently capacious.  It also requires
this data to be placed into the vnode cache key for fscache.

For the moment, just discard the top 32 bits of the vnode ID when returning
it though stat.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
</content>
</entry>
<entry>
<title>afs: Optimise callback breaking by not repeating volume lookup</title>
<updated>2018-06-15T14:27:09Z</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2018-06-15T14:24:50Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=47ea0f2ebffd400d36ab5946ec8d6d6e08a67d53'/>
<id>urn:sha1:47ea0f2ebffd400d36ab5946ec8d6d6e08a67d53</id>
<content type='text'>
At the moment, afs_break_callbacks calls afs_break_one_callback() for each
separate FID it was given, and the latter looks up the volume individually
for each one.

However, this is inefficient if two or more FIDs have the same vid as we
could reuse the volume.  This is complicated by cell aliasing whereby we
may have multiple cells sharing a volume and can therefore have multiple
callback interests for any particular volume ID.

At the moment afs_break_one_callback() scans the entire list of volumes
we're getting from a server and breaks the appropriate callback in every
matching volume, regardless of cell.  This scan is done for every FID.

Optimise callback breaking by the following means:

 (1) Sort the FID list by vid so that all FIDs belonging to the same volume
     are clumped together.

     This is done through the use of an indirection table as we cannot do
     an insertion sort on the afs_callback_break array as we decode FIDs
     into it as we subsequently also have to decode callback info into it
     that corresponds by array index only.

     We also don't really want to bubblesort afterwards if we can avoid it.

 (2) Sort the server-&gt;cb_interests array by vid so that all the matching
     volumes are grouped together.  This permits the scan to stop after
     finding a record that has a higher vid.

 (3) When breaking FIDs, we try to keep server-&gt;cb_break_lock as long as
     possible, caching the start point in the array for that volume group
     as long as possible.

     It might make sense to add another layer in that list and have a
     refcounted volume ID anchor that has the matching interests attached
     to it rather than being in the list.  This would allow the lock to be
     dropped without losing the cursor.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
</content>
</entry>
<entry>
<title>afs: Fix whole-volume callback handling</title>
<updated>2018-05-14T14:15:18Z</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2018-05-12T21:31:33Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=68251f0a6818f3be19b1471f36c956ca97c1427d'/>
<id>urn:sha1:68251f0a6818f3be19b1471f36c956ca97c1427d</id>
<content type='text'>
It's possible for an AFS file server to issue a whole-volume notification
that callbacks on all the vnodes in the file have been broken.  This is
done for R/O and backup volumes (which don't have per-file callbacks) and
for things like a volume being taken offline.

Fix callback handling to detect whole-volume notifications, to track it
across operations and to check it during inode validation.

Fixes: c435ee34551e ("afs: Overhaul the callback handling")
Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
</content>
</entry>
<entry>
<title>afs: Fix refcounting in callback registration</title>
<updated>2018-05-14T12:17:35Z</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2018-05-10T07:43:04Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=d4a96bec7a7362834ef5c31d7b2cc9bf36eb0570'/>
<id>urn:sha1:d4a96bec7a7362834ef5c31d7b2cc9bf36eb0570</id>
<content type='text'>
The refcounting on afs_cb_interest struct objects in
afs_register_server_cb_interest() is wrong as it uses the server list
entry's call back interest pointer without regard for the fact that it
might be replaced at any time and the object thrown away.

Fix this by:

 (1) Put a lock on the afs_server_list struct that can be used to
     mediate access to the callback interest pointers in the servers array.

 (2) Keep a ref on the callback interest that we get from the entry.

 (3) Dropping the old reference held by vnode-&gt;cb_interest if we replace
     the pointer.

Fixes: c435ee34551e ("afs: Overhaul the callback handling")
Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
</content>
</entry>
<entry>
<title>afs: Do better accretion of small writes on newly created content</title>
<updated>2018-04-09T20:54:48Z</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2018-04-06T13:17:26Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=5a8132761609bd7e42db642d6f157140d5bf2ae8'/>
<id>urn:sha1:5a8132761609bd7e42db642d6f157140d5bf2ae8</id>
<content type='text'>
Processes like ld that do lots of small writes that aren't necessarily
contiguous result in a lot of small StoreData operations to the server, the
idea being that if someone else changes the data on the server, we only
write our changes over that and not the space between.  Further, we don't
want to write back empty space if we can avoid it to make it easier for the
server to do sparse files.

However, making lots of tiny RPC ops is a lot less efficient for the server
than one big one because each op requires allocation of resources and the
taking of locks, so we want to compromise a bit.

Reduce the load by the following:

 (1) If a file is just created locally or has just been truncated with
     O_TRUNC locally, allow subsequent writes to the file to be merged with
     intervening space if that space doesn't cross an entire intervening
     page.

 (2) Don't flush the file on -&gt;flush() but rather on -&gt;release() if the
     file was open for writing.

Just linking vmlinux.o, without this patch, looking in /proc/fs/afs/stats:

	file-wr : n=441 nb=513581204

and after the patch:

	file-wr : n=62 nb=513668555

there were 379 fewer StoreData RPC operations at the expense of an extra
87K being written.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
</content>
</entry>
<entry>
<title>afs: Prospectively look up extra files when doing a single lookup</title>
<updated>2018-04-09T20:12:31Z</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2018-04-09T20:12:31Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=5cf9dd55a0ec26428f2824aadd16bfa305a5b603'/>
<id>urn:sha1:5cf9dd55a0ec26428f2824aadd16bfa305a5b603</id>
<content type='text'>
When afs_lookup() is called, prospectively look up the next 50 uncached
fids also from that same directory and cache the results, rather than just
looking up the one file requested.

This allows us to use the FS.InlineBulkStatus RPC op to increase efficiency
by fetching up to 50 file statuses at a time.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
</content>
</entry>
</feed>
