<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/io_uring/io-wq.c, branch linux-6.18.y</title>
<subtitle>Hosts the 0x221E linux distro kernel.</subtitle>
<id>https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-6.18.y</id>
<link rel='self' href='https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-6.18.y'/>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/'/>
<updated>2026-02-16T09:19:40Z</updated>
<entry>
<title>io_uring/io-wq: add exit-on-idle state</title>
<updated>2026-02-16T09:19:40Z</updated>
<author>
<name>Li Chen</name>
<email>me@linux.beauty</email>
</author>
<published>2026-02-02T14:37:53Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=f02693a40e40702cb6201145379e21634dfbe20d'/>
<id>urn:sha1:f02693a40e40702cb6201145379e21634dfbe20d</id>
<content type='text'>
commit 38aa434ab9335ce2d178b7538cdf01d60b2014c3 upstream.

io-wq uses an idle timeout to shrink the pool, but keeps the last worker
around indefinitely to avoid churn.

For tasks that used io_uring for file I/O and then stop using io_uring,
this can leave an iou-wrk-* thread behind even after all io_uring
instances are gone. This is unnecessary overhead and also gets in the
way of process checkpoint/restore.

Add an exit-on-idle state that makes all io-wq workers exit as soon as
they become idle, and provide io_wq_set_exit_on_idle() to toggle it.

Signed-off-by: Li Chen &lt;me@linux.beauty&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>io_uring/io-wq: check IO_WQ_BIT_EXIT inside work run loop</title>
<updated>2026-01-30T09:32:15Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2026-01-20T14:42:50Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=bdf0bf73006ea8af9327cdb85cfdff4c23a5f966'/>
<id>urn:sha1:bdf0bf73006ea8af9327cdb85cfdff4c23a5f966</id>
<content type='text'>
commit 10dc959398175736e495f71c771f8641e1ca1907 upstream.

Currently this is checked before running the pending work. Normally this
is quite fine, as work items either end up blocking (which will create a
new worker for other items), or they complete fairly quickly. But syzbot
reports an issue where io-wq takes seemingly forever to exit, and with a
bit of debugging, this turns out to be because it queues a bunch of big
(2GB - 4096b) reads with a /dev/msr* file. Since this file type doesn't
support -&gt;read_iter(), loop_rw_iter() ends up handling them. Each read
returns 16MB of data read, which takes 20 (!!) seconds. With a bunch of
these pending, processing the whole chain can take a long time. Easily
longer than the syzbot uninterruptible sleep timeout of 140 seconds.
This then triggers a complaint off the io-wq exit path:

INFO: task syz.4.135:6326 blocked for more than 143 seconds.
      Not tainted syzkaller #0
      Blocked by coredump.
"echo 0 &gt; /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz.4.135       state:D stack:26824 pid:6326  tgid:6324  ppid:5957   task_flags:0x400548 flags:0x00080000
Call Trace:
 &lt;TASK&gt;
 context_switch kernel/sched/core.c:5256 [inline]
 __schedule+0x1139/0x6150 kernel/sched/core.c:6863
 __schedule_loop kernel/sched/core.c:6945 [inline]
 schedule+0xe7/0x3a0 kernel/sched/core.c:6960
 schedule_timeout+0x257/0x290 kernel/time/sleep_timeout.c:75
 do_wait_for_common kernel/sched/completion.c:100 [inline]
 __wait_for_common+0x2fc/0x4e0 kernel/sched/completion.c:121
 io_wq_exit_workers io_uring/io-wq.c:1328 [inline]
 io_wq_put_and_exit+0x271/0x8a0 io_uring/io-wq.c:1356
 io_uring_clean_tctx+0x10d/0x190 io_uring/tctx.c:203
 io_uring_cancel_generic+0x69c/0x9a0 io_uring/cancel.c:651
 io_uring_files_cancel include/linux/io_uring.h:19 [inline]
 do_exit+0x2ce/0x2bd0 kernel/exit.c:911
 do_group_exit+0xd3/0x2a0 kernel/exit.c:1112
 get_signal+0x2671/0x26d0 kernel/signal.c:3034
 arch_do_signal_or_restart+0x8f/0x7e0 arch/x86/kernel/signal.c:337
 __exit_to_user_mode_loop kernel/entry/common.c:41 [inline]
 exit_to_user_mode_loop+0x8c/0x540 kernel/entry/common.c:75
 __exit_to_user_mode_prepare include/linux/irq-entry-common.h:226 [inline]
 syscall_exit_to_user_mode_prepare include/linux/irq-entry-common.h:256 [inline]
 syscall_exit_to_user_mode_work include/linux/entry-common.h:159 [inline]
 syscall_exit_to_user_mode include/linux/entry-common.h:194 [inline]
 do_syscall_64+0x4ee/0xf80 arch/x86/entry/syscall_64.c:100
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fa02738f749
RSP: 002b:00007fa0281ae0e8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
RAX: fffffffffffffe00 RBX: 00007fa0275e6098 RCX: 00007fa02738f749
RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00007fa0275e6098
RBP: 00007fa0275e6090 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fa0275e6128 R14: 00007fff14e4fcb0 R15: 00007fff14e4fd98

There's really nothing wrong here, outside of processing these reads
will take a LONG time. However, we can speed up the exit by checking the
IO_WQ_BIT_EXIT inside the io_worker_handle_work() loop, as syzbot will
exit the ring after queueing up all of these reads. Then once the first
item is processed, io-wq will simply cancel the rest. That should avoid
syzbot running into this complaint again.

Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/68a2decc.050a0220.e29e5.0099.GAE@google.com/
Reported-by: syzbot+4eb282331cab6d5b6588@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>io_uring/io-wq: fix incorrect io_wq_for_each_worker() termination logic</title>
<updated>2026-01-17T15:35:14Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2026-01-05T14:42:48Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=2b9c15286a178190e5bc7ee0a1b200c38953fdf4'/>
<id>urn:sha1:2b9c15286a178190e5bc7ee0a1b200c38953fdf4</id>
<content type='text'>
commit e0392a10c9e80a3991855a81317da3039fcbe32c upstream.

A previous commit added this helper, and had it terminate if false is
returned from the handler. However, that is completely opposite, it
should abort the loop if true is returned.

Fix this up by having io_wq_for_each_worker() keep iterating as long
as false is returned, and only abort if true is returned.

Cc: stable@vger.kernel.org
Fixes: 751eedc4b4b7 ("io_uring/io-wq: move worker lists to struct io_wq_acct")
Reported-by: Lewis Campbell &lt;info@lewiscampbell.tech&gt;
Reviewed-by: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>io_uring/io-wq: fix `max_workers` breakage and `nr_workers` underflow</title>
<updated>2025-09-15T16:46:13Z</updated>
<author>
<name>Max Kellermann</name>
<email>max.kellermann@ionos.com</email>
</author>
<published>2025-09-12T00:06:09Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=cd4ea81be3eb94047ad023c631afd9bd6c295400'/>
<id>urn:sha1:cd4ea81be3eb94047ad023c631afd9bd6c295400</id>
<content type='text'>
Commit 88e6c42e40de ("io_uring/io-wq: add check free worker before
create new worker") reused the variable `do_create` for something
else, abusing it for the free worker check.

This caused the value to effectively always be `true` at the time
`nr_workers &lt; max_workers` was checked, but it should really be
`false`.  This means the `max_workers` setting was ignored, and worse:
if the limit had already been reached, incrementing `nr_workers` was
skipped even though another worker would be created.

When later lots of workers exit, the `nr_workers` field could easily
underflow, making the problem worse because more and more workers
would be created without incrementing `nr_workers`.

The simple solution is to use a different variable for the free worker
check instead of using one variable for two different things.

Cc: stable@vger.kernel.org
Fixes: 88e6c42e40de ("io_uring/io-wq: add check free worker before create new worker")
Signed-off-by: Max Kellermann &lt;max.kellermann@ionos.com&gt;
Reviewed-by: Fengnan Chang &lt;changfengnan@bytedance.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring/io-wq: add check free worker before create new worker</title>
<updated>2025-08-13T12:31:10Z</updated>
<author>
<name>Fengnan Chang</name>
<email>changfengnan@bytedance.com</email>
</author>
<published>2025-08-13T12:02:14Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=9d83e1f05c98bab5de350bef89177e2be8b34db0'/>
<id>urn:sha1:9d83e1f05c98bab5de350bef89177e2be8b34db0</id>
<content type='text'>
After commit 0b2b066f8a85 ("io_uring/io-wq: only create a new worker
if it can make progress"), in our produce environment, we still
observe that part of io_worker threads keeps creating and destroying.
After analysis, it was confirmed that this was due to a more complex
scenario involving a large number of fsync operations, which can be
abstracted as frequent write + fsync operations on multiple files in
a single uring instance. Since write is a hash operation while fsync
is not, and fsync is likely to be suspended during execution, the
action of checking the hash value in
io_wqe_dec_running cannot handle such scenarios.
Similarly, if hash-based work and non-hash-based work are sent at the
same time, similar issues are likely to occur.
Returning to the starting point of the issue, when a new work
arrives, io_wq_enqueue may wake up free worker A, while
io_wq_dec_running may create worker B. Ultimately, only one of A and
B can obtain and process the task, leaving the other in an idle
state. In the end, the issue is caused by inconsistent logic in the
checks performed by io_wq_enqueue and io_wq_dec_running.
Therefore, the problem can be resolved by checking for available
workers in io_wq_dec_running.

Signed-off-by: Fengnan Chang &lt;changfengnan@bytedance.com&gt;
Reviewed-by: Diangang Li &lt;lidiangang@bytedance.com&gt;
Link: https://lore.kernel.org/r/20250813120214.18729-1-changfengnan@bytedance.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring: fix task leak issue in io_wq_create()</title>
<updated>2025-06-15T18:58:39Z</updated>
<author>
<name>Penglei Jiang</name>
<email>superman.xpt@gmail.com</email>
</author>
<published>2025-06-15T16:39:06Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=89465d923bda180299e69ee2800aab84ad0ba689'/>
<id>urn:sha1:89465d923bda180299e69ee2800aab84ad0ba689</id>
<content type='text'>
Add missing put_task_struct() in the error path

Cc: stable@vger.kernel.org
Fixes: 0f8baa3c9802 ("io-wq: fully initialize wqe before calling cpuhp_state_add_instance_nocalls()")
Signed-off-by: Penglei Jiang &lt;superman.xpt@gmail.com&gt;
Link: https://lore.kernel.org/r/20250615163906.2367-1-superman.xpt@gmail.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring/io-wq: only create a new worker if it can make progress</title>
<updated>2025-05-23T12:14:10Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2025-05-23T12:08:49Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=0b2b066f8a854ff485319f697d16c5f565c32112'/>
<id>urn:sha1:0b2b066f8a854ff485319f697d16c5f565c32112</id>
<content type='text'>
Hashed work is serialized by io-wq, intended to be used for cases like
serializing buffered writes to a regular file, where the file system
will serialize the workers anyway with a mutex or similar. Since they
would be forcibly serialized and blocked, it's more efficient for io-wq
to handle these individually rather than issue them in parallel.

If a worker is currently handling a hashed work item and gets blocked,
don't create a new worker if the next work item is also hashed and
mapped to the same bucket. That new worker would not be able to make any
progress anyway.

Reported-by: Fengnan Chang &lt;changfengnan@bytedance.com&gt;
Reported-by: Diangang Li &lt;lidiangang@bytedance.com&gt;
Link: https://lore.kernel.org/io-uring/20250522090909.73212-1-changfengnan@bytedance.com/
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring/io-wq: ignore non-busy worker going to sleep</title>
<updated>2025-05-23T12:14:07Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2025-05-23T12:05:37Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=8343cae362e147a5d4505c2da0e161a4d9e9fbde'/>
<id>urn:sha1:8343cae362e147a5d4505c2da0e161a4d9e9fbde</id>
<content type='text'>
When an io-wq worker goes to sleep, it checks if there's work to do.
If there is, it'll create a new worker. But if this worker is currently
idle, it'll either get woken right back up immediately, or someone
else has already created the necessary worker to handle this work.

Only go through the worker creation logic if the current worker is
currently handling a work item. That means it's being scheduled out as
part of handling that work, not just going to sleep on its own.

Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring/io-wq: move hash helpers to the top</title>
<updated>2025-05-23T12:14:04Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2025-05-23T12:04:29Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=e37dfc0530815ead6e8ddab2a4ccce3be31af954'/>
<id>urn:sha1:e37dfc0530815ead6e8ddab2a4ccce3be31af954</id>
<content type='text'>
Just in preparation for using them higher up in the function, move
these generic helpers.

Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring/wq: avoid indirect do_work/free_work calls</title>
<updated>2025-04-21T11:06:58Z</updated>
<author>
<name>Caleb Sander Mateos</name>
<email>csander@purestorage.com</email>
</author>
<published>2025-03-29T16:15:24Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=9fe99eed91e8273d3750367af759fe11e9512759'/>
<id>urn:sha1:9fe99eed91e8273d3750367af759fe11e9512759</id>
<content type='text'>
struct io_wq stores do_work and free_work function pointers which are
called on each work item. But these function pointers are always set to
io_wq_submit_work and io_wq_free_work, respectively. So remove these
function pointers and just call the functions directly.

Signed-off-by: Caleb Sander Mateos &lt;csander@purestorage.com&gt;
Link: https://lore.kernel.org/r/20250329161527.3281314-1-csander@purestorage.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
</feed>
