<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/include/linux/iomap.h, branch linux-5.17.y</title>
<subtitle>Hosts the 0x221E linux distro kernel.</subtitle>
<id>https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-5.17.y</id>
<link rel='self' href='https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-5.17.y'/>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/'/>
<updated>2022-01-26T17:19:20Z</updated>
<entry>
<title>xfs, iomap: limit individual ioend chain lengths in writeback</title>
<updated>2022-01-26T17:19:20Z</updated>
<author>
<name>Dave Chinner</name>
<email>dchinner@redhat.com</email>
</author>
<published>2022-01-26T17:19:20Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=ebb7fb1557b1d03b906b668aa2164b51e6b7d19a'/>
<id>urn:sha1:ebb7fb1557b1d03b906b668aa2164b51e6b7d19a</id>
<content type='text'>
Trond Myklebust reported soft lockups in XFS IO completion such as
this:

 watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [kworker/12:1:3106]
 CPU: 12 PID: 3106 Comm: kworker/12:1 Not tainted 4.18.0-305.10.2.el8_4.x86_64 #1
 Workqueue: xfs-conv/md127 xfs_end_io [xfs]
 RIP: 0010:_raw_spin_unlock_irqrestore+0x11/0x20
 Call Trace:
  wake_up_page_bit+0x8a/0x110
  iomap_finish_ioend+0xd7/0x1c0
  iomap_finish_ioends+0x7f/0xb0
  xfs_end_ioend+0x6b/0x100 [xfs]
  xfs_end_io+0xb9/0xe0 [xfs]
  process_one_work+0x1a7/0x360
  worker_thread+0x1fa/0x390
  kthread+0x116/0x130
  ret_from_fork+0x35/0x40

Ioends are processed as an atomic completion unit when all the
chained bios in the ioend have completed their IO. Logically
contiguous ioends can also be merged and completed as a single,
larger unit.  Both of these things can be problematic as both the
bio chains per ioend and the size of the merged ioends processed as
a single completion are both unbound.

If we have a large sequential dirty region in the page cache,
write_cache_pages() will keep feeding us sequential pages and we
will keep mapping them into ioends and bios until we get a dirty
page at a non-sequential file offset. These large sequential runs
can will result in bio and ioend chaining to optimise the io
patterns. The pages iunder writeback are pinned within these chains
until the submission chaining is broken, allowing the entire chain
to be completed. This can result in huge chains being processed
in IO completion context.

We get deep bio chaining if we have large contiguous physical
extents. We will keep adding pages to the current bio until it is
full, then we'll chain a new bio to keep adding pages for writeback.
Hence we can build bio chains that map millions of pages and tens of
gigabytes of RAM if the page cache contains big enough contiguous
dirty file regions. This long bio chain pins those pages until the
final bio in the chain completes and the ioend can iterate all the
chained bios and complete them.

OTOH, if we have a physically fragmented file, we end up submitting
one ioend per physical fragment that each have a small bio or bio
chain attached to them. We do not chain these at IO submission time,
but instead we chain them at completion time based on file
offset via iomap_ioend_try_merge(). Hence we can end up with unbound
ioend chains being built via completion merging.

XFS can then do COW remapping or unwritten extent conversion on that
merged chain, which involves walking an extent fragment at a time
and running a transaction to modify the physical extent information.
IOWs, we merge all the discontiguous ioends together into a
contiguous file range, only to then process them individually as
discontiguous extents.

This extent manipulation is computationally expensive and can run in
a tight loop, so merging logically contiguous but physically
discontigous ioends gains us nothing except for hiding the fact the
fact we broke the ioends up into individual physical extents at
submission and then need to loop over those individual physical
extents at completion.

Hence we need to have mechanisms to limit ioend sizes and
to break up completion processing of large merged ioend chains:

1. bio chains per ioend need to be bound in length. Pure overwrites
go straight to iomap_finish_ioend() in softirq context with the
exact bio chain attached to the ioend by submission. Hence the only
way to prevent long holdoffs here is to bound ioend submission
sizes because we can't reschedule in softirq context.

2. iomap_finish_ioends() has to handle unbound merged ioend chains
correctly. This relies on any one call to iomap_finish_ioend() being
bound in runtime so that cond_resched() can be issued regularly as
the long ioend chain is processed. i.e. this relies on mechanism #1
to limit individual ioend sizes to work correctly.

3. filesystems have to loop over the merged ioends to process
physical extent manipulations. This means they can loop internally,
and so we break merging at physical extent boundaries so the
filesystem can easily insert reschedule points between individual
extent manipulations.

Signed-off-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Reported-and-tested-by: Trond Myklebust &lt;trondmy@hammerspace.com&gt;
Reviewed-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
Signed-off-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'libnvdimm-for-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm</title>
<updated>2022-01-12T23:46:11Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-01-12T23:46:11Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=3acbdbf42e943d85174401357a6b6243479d4c76'/>
<id>urn:sha1:3acbdbf42e943d85174401357a6b6243479d4c76</id>
<content type='text'>
Pull dax and libnvdimm updates from Dan Williams:
 "The bulk of this is a rework of the dax_operations API after
  discovering the obstacles it posed to the work-in-progress DAX+reflink
  support for XFS and other copy-on-write filesystem mechanics.

  Primarily the need to plumb a block_device through the API to handle
  partition offsets was a sticking point and Christoph untangled that
  dependency in addition to other cleanups to make landing the
  DAX+reflink support easier.

  The DAX_PMEM_COMPAT option has been around for 4 years and not only
  are distributions shipping userspace that understand the current
  configuration API, but some are not even bothering to turn this option
  on anymore, so it seems a good time to remove it per the deprecation
  schedule. Recall that this was added after the device-dax subsystem
  moved from /sys/class/dax to /sys/bus/dax for its sysfs organization.
  All recent functionality depends on /sys/bus/dax.

  Some other miscellaneous cleanups and reflink prep patches are
  included as well.

  Summary:

   - Simplify the dax_operations API:

      - Eliminate bdev_dax_pgoff() in favor of the filesystem
        maintaining and applying a partition offset to all its DAX iomap
        operations.

      - Remove wrappers and device-mapper stacked callbacks for
        -&gt;copy_from_iter() and -&gt;copy_to_iter() in favor of moving
        block_device relative offset responsibility to the
        dax_direct_access() caller.

      - Remove the need for an @bdev in filesystem-DAX infrastructure

      - Remove unused uio helpers copy_from_iter_flushcache() and
        copy_mc_to_iter() as only the non-check_copy_size() versions are
        used for DAX.

   - Prepare XFS for the pending (next merge window) DAX+reflink support

   - Remove deprecated DEV_DAX_PMEM_COMPAT support

   - Cleanup a straggling misuse of the GUID api"

* tag 'libnvdimm-for-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (38 commits)
  iomap: Fix error handling in iomap_zero_iter()
  ACPI: NFIT: Import GUID before use
  dax: remove the copy_from_iter and copy_to_iter methods
  dax: remove the DAXDEV_F_SYNC flag
  dax: simplify dax_synchronous and set_dax_synchronous
  uio: remove copy_from_iter_flushcache() and copy_mc_to_iter()
  iomap: turn the byte variable in iomap_zero_iter into a ssize_t
  memremap: remove support for external pgmap refcounts
  fsdax: don't require CONFIG_BLOCK
  iomap: build the block based code conditionally
  dax: fix up some of the block device related ifdefs
  fsdax: shift partition offset handling into the file systems
  dax: return the partition offset from fs_dax_get_by_bdev
  iomap: add a IOMAP_DAX flag
  xfs: pass the mapping flags to xfs_bmbt_to_iomap
  xfs: use xfs_direct_write_iomap_ops for DAX zeroing
  xfs: move dax device handling into xfs_{alloc,free}_buftarg
  ext4: cleanup the dax handling in ext4_fill_super
  ext2: cleanup the dax handling in ext2_fill_super
  fsdax: decouple zeroing from the iomap buffered I/O code
  ...
</content>
</entry>
<entry>
<title>iomap,xfs: Convert -&gt;discard_page to -&gt;discard_folio</title>
<updated>2021-12-18T05:06:07Z</updated>
<author>
<name>Matthew Wilcox (Oracle)</name>
<email>willy@infradead.org</email>
</author>
<published>2021-07-30T13:56:05Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=6e478521df535b9d5ef5eb84d4352f235bbbef99'/>
<id>urn:sha1:6e478521df535b9d5ef5eb84d4352f235bbbef99</id>
<content type='text'>
XFS has the only implementation of -&gt;discard_page today, so convert it
to use folios in the same patch as converting the API.

Signed-off-by: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
</content>
</entry>
<entry>
<title>iomap: Add iomap_invalidate_folio</title>
<updated>2021-12-16T20:49:52Z</updated>
<author>
<name>Matthew Wilcox (Oracle)</name>
<email>willy@infradead.org</email>
</author>
<published>2021-04-28T11:51:36Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=8306a5f56305521d8b307b4ee1f69949fbb49279'/>
<id>urn:sha1:8306a5f56305521d8b307b4ee1f69949fbb49279</id>
<content type='text'>
Keep iomap_invalidatepage around as a wrapper for use in address_space
operations.

Signed-off-by: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
</content>
</entry>
<entry>
<title>iomap: add a IOMAP_DAX flag</title>
<updated>2021-12-04T16:58:53Z</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2021-11-29T10:21:58Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=952da06375c8f3aa58474fff718d9ae8442531b9'/>
<id>urn:sha1:952da06375c8f3aa58474fff718d9ae8442531b9</id>
<content type='text'>
Add a flag so that the file system can easily detect DAX operations
based just on the iomap operation requested instead of looking at
inode state using IS_DAX.  This will be needed to apply the to be
added partition offset only for operations that actually use DAX,
but not things like fiemap that are based on the block device.
In the long run it should also allow turning the bdev, dax_dev
and inline_data into a union.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Reviewed-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
Link: https://lore.kernel.org/r/20211129102203.2243509-25-hch@lst.de
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</content>
</entry>
<entry>
<title>Merge tag 'gfs2-v5.15-rc5-mmap-fault' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2</title>
<updated>2021-11-02T19:25:03Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2021-11-02T19:25:03Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=c03098d4b9ad76bca2966a8769dcfe59f7f85103'/>
<id>urn:sha1:c03098d4b9ad76bca2966a8769dcfe59f7f85103</id>
<content type='text'>
Pull gfs2 mmap + page fault deadlocks fixes from Andreas Gruenbacher:
 "Functions gfs2_file_read_iter and gfs2_file_write_iter are both
  accessing the user buffer to write to or read from while holding the
  inode glock.

  In the most basic deadlock scenario, that buffer will not be resident
  and it will be mapped to the same file. Accessing the buffer will
  trigger a page fault, and gfs2 will deadlock trying to take the same
  inode glock again while trying to handle that fault.

  Fix that and similar, more complex scenarios by disabling page faults
  while accessing user buffers. To make this work, introduce a small
  amount of new infrastructure and fix some bugs that didn't trigger so
  far, with page faults enabled"

* tag 'gfs2-v5.15-rc5-mmap-fault' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: Fix mmap + page fault deadlocks for direct I/O
  iov_iter: Introduce nofault flag to disable page faults
  gup: Introduce FOLL_NOFAULT flag to disable page faults
  iomap: Add done_before argument to iomap_dio_rw
  iomap: Support partial direct I/O on user copy failures
  iomap: Fix iomap_dio_rw return value for user copies
  gfs2: Fix mmap + page fault deadlocks for buffered I/O
  gfs2: Eliminate ip-&gt;i_gh
  gfs2: Move the inode glock locking to gfs2_file_buffered_write
  gfs2: Introduce flag for glock holder auto-demotion
  gfs2: Clean up function may_grant
  gfs2: Add wrapper for iomap_file_buffered_write
  iov_iter: Introduce fault_in_iov_iter_writeable
  iov_iter: Turn iov_iter_fault_in_readable into fault_in_iov_iter_readable
  gup: Turn fault_in_pages_{readable,writeable} into fault_in_{readable,writeable}
  powerpc/kvm: Fix kvm_use_magic_page
  iov_iter: Fix iov_iter_get_pages{,_alloc} page fault return value
</content>
</entry>
<entry>
<title>iomap: Add done_before argument to iomap_dio_rw</title>
<updated>2021-10-24T13:26:05Z</updated>
<author>
<name>Andreas Gruenbacher</name>
<email>agruenba@redhat.com</email>
</author>
<published>2021-07-24T10:26:41Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=4fdccaa0d184c202f98d73b24e3ec8eeee88ab8d'/>
<id>urn:sha1:4fdccaa0d184c202f98d73b24e3ec8eeee88ab8d</id>
<content type='text'>
Add a done_before argument to iomap_dio_rw that indicates how much of
the request has already been transferred.  When the request succeeds, we
report that done_before additional bytes were tranferred.  This is
useful for finishing a request asynchronously when part of the request
has already been completed synchronously.

We'll use that to allow iomap_dio_rw to be used with page faults
disabled: when a page fault occurs while submitting a request, we
synchronously complete the part of the request that has already been
submitted.  The caller can then take care of the page fault and call
iomap_dio_rw again for the rest of the request, passing in the number of
bytes already tranferred.

Signed-off-by: Andreas Gruenbacher &lt;agruenba@redhat.com&gt;
Reviewed-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
</content>
</entry>
<entry>
<title>iomap: Support partial direct I/O on user copy failures</title>
<updated>2021-10-24T13:26:05Z</updated>
<author>
<name>Andreas Gruenbacher</name>
<email>agruenba@redhat.com</email>
</author>
<published>2021-07-22T23:59:41Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=97308f8b0d867e9ef59528cd97f0db55ffdf5651'/>
<id>urn:sha1:97308f8b0d867e9ef59528cd97f0db55ffdf5651</id>
<content type='text'>
In iomap_dio_rw, when iomap_apply returns an -EFAULT error and the
IOMAP_DIO_PARTIAL flag is set, complete the request synchronously and
return a partial result.  This allows the caller to deal with the page
fault and retry the remainder of the request.

Signed-off-by: Andreas Gruenbacher &lt;agruenba@redhat.com&gt;
Reviewed-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
</content>
</entry>
<entry>
<title>block: switch polling to be bio based</title>
<updated>2021-10-18T12:17:36Z</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2021-10-12T11:12:24Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=3e08773c3841e9db7a520908cc2b136a77d275ff'/>
<id>urn:sha1:3e08773c3841e9db7a520908cc2b136a77d275ff</id>
<content type='text'>
Replace the blk_poll interface that requires the caller to keep a queue
and cookie from the submissions with polling based on the bio.

Polling for the bio itself leads to a few advantages:

 - the cookie construction can made entirely private in blk-mq.c
 - the caller does not need to remember the request_queue and cookie
   separately and thus sidesteps their lifetime issues
 - keeping the device and the cookie inside the bio allows to trivially
   support polling BIOs remapping by stacking drivers
 - a lot of code to propagate the cookie back up the submission path can
   be removed entirely.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Tested-by: Mark Wunderlich &lt;mark.wunderlich@intel.com&gt;
Link: https://lore.kernel.org/r/20211012111226.760968-15-hch@lst.de
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: replace the spin argument to blk_iopoll with a flags argument</title>
<updated>2021-10-18T12:17:36Z</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2021-10-12T11:12:19Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=ef99b2d37666b7a600baab9e1c4944436652b0a2'/>
<id>urn:sha1:ef99b2d37666b7a600baab9e1c4944436652b0a2</id>
<content type='text'>
Switch the boolean spin argument to blk_poll to passing a set of flags
instead.  This will allow to control polling behavior in a more fine
grained way.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Tested-by: Mark Wunderlich &lt;mark.wunderlich@intel.com&gt;
Link: https://lore.kernel.org/r/20211012111226.760968-10-hch@lst.de
[axboe: adapt to changed io_uring iopoll]
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
</feed>
