<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/drivers/md/md-cluster.c, branch linux-6.9.y</title>
<subtitle>Hosts the 0x221E linux distro kernel.</subtitle>
<id>https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-6.9.y</id>
<link rel='self' href='https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-6.9.y'/>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/'/>
<updated>2023-10-12T16:16:19Z</updated>
<entry>
<title>md-cluster: check for timeout while a new disk adding</title>
<updated>2023-10-12T16:16:19Z</updated>
<author>
<name>Denis Plotnikov</name>
<email>den-plotnikov@yandex-team.ru</email>
</author>
<published>2023-09-25T12:59:40Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=1bbe254e4336c0944dd4fb6f0b8c9665b81de50f'/>
<id>urn:sha1:1bbe254e4336c0944dd4fb6f0b8c9665b81de50f</id>
<content type='text'>
A new disk adding may end up with timeout and a new disk won't be added.
Add returning the error in that case.

Found by Linux Verification Center (linuxtesting.org) with SVACE

Signed-off-by: Denis Plotnikov &lt;den-plotnikov@yandex-team.ru&gt;
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
Link: https://lore.kernel.org/r/20230925125940.1542506-1-den-plotnikov@yandex-team.ru
</content>
</entry>
<entry>
<title>md: Hold mddev-&gt;reconfig_mutex when trying to get mddev-&gt;sync_thread</title>
<updated>2023-08-15T16:40:26Z</updated>
<author>
<name>Li Lingfeng</name>
<email>lilingfeng3@huawei.com</email>
</author>
<published>2023-08-03T07:17:11Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=7eb8ff02c1df279bf7f7f29b866beb655a9eebe9'/>
<id>urn:sha1:7eb8ff02c1df279bf7f7f29b866beb655a9eebe9</id>
<content type='text'>
Commit ba9d9f1a707f ("Revert "md: unlock mddev before reap sync_thread in
action_store"") removed the scenario of calling md_unregister_thread()
without holding mddev-&gt;reconfig_mutex, so add a lock holding check before
acquiring mddev-&gt;sync_thread by passing mdev to md_unregister_thread().

Signed-off-by: Li Lingfeng &lt;lilingfeng3@huawei.com&gt;
Reviewed-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Link: https://lore.kernel.org/r/20230803071711.2546560-1-lilingfeng@huaweicloud.com
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
</content>
</entry>
<entry>
<title>md: protect md_thread with rcu</title>
<updated>2023-06-13T22:25:39Z</updated>
<author>
<name>Yu Kuai</name>
<email>yukuai3@huawei.com</email>
</author>
<published>2023-05-23T02:10:17Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=4469315439827290923fce4f3f672599cabeb366'/>
<id>urn:sha1:4469315439827290923fce4f3f672599cabeb366</id>
<content type='text'>
Currently, there are many places that md_thread can be accessed without
protection, following are known scenarios that can cause
null-ptr-dereference or uaf:

1) sync_thread that is allocated and started from md_start_sync()
2) mddev-&gt;thread can be accessed directly from timeout_store() and
   md_bitmap_daemon_work()
3) md_unregister_thread() from action_store().

Currently, a global spinlock 'pers_lock' is borrowed to protect
'mddev-&gt;thread' in some places, this problem can be fixed likewise,
however, use a global lock for all the cases is not good.

Fix this problem by protecting all md_thread with rcu.

Signed-off-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
Link: https://lore.kernel.org/r/20230523021017.3048783-6-yukuai1@huaweicloud.com
</content>
</entry>
<entry>
<title>fs: dlm: remove DLM_LSFL_FS from uapi</title>
<updated>2022-08-23T19:54:54Z</updated>
<author>
<name>Alexander Aring</name>
<email>aahringo@redhat.com</email>
</author>
<published>2022-08-15T19:43:25Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=12cda13cfd5310bbfefdfe32a82489228e2e0381'/>
<id>urn:sha1:12cda13cfd5310bbfefdfe32a82489228e2e0381</id>
<content type='text'>
The DLM_LSFL_FS flag is set in lockspaces created directly
for a kernel user, as opposed to those lockspaces created
for user space applications.  The user space libdlm allowed
this flag to be set for lockspaces created from user space,
but then used by a kernel user.  No kernel user has ever
used this method, so remove the ability to do it.

Signed-off-by: Alexander Aring &lt;aahringo@redhat.com&gt;
Signed-off-by: David Teigland &lt;teigland@redhat.com&gt;
</content>
</entry>
<entry>
<title>md: Fix spelling mistake in comments</title>
<updated>2022-08-02T23:14:44Z</updated>
<author>
<name>Zhang Jiaming</name>
<email>jiaming@nfschina.com</email>
</author>
<published>2022-07-02T01:54:11Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=9e26728b5fa9c6ffdb3f5612279bd3b2f7ea8c3c'/>
<id>urn:sha1:9e26728b5fa9c6ffdb3f5612279bd3b2f7ea8c3c</id>
<content type='text'>
There are 2 spelling mistakes in comments. Fix it.

Signed-off-by: Zhang Jiaming &lt;jiaming@nfschina.com&gt;
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>md: replace deprecated strlcpy &amp; remove duplicated line</title>
<updated>2022-04-25T21:00:36Z</updated>
<author>
<name>Heming Zhao</name>
<email>heming.zhao@suse.com</email>
</author>
<published>2022-04-01T02:13:17Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=92d9aac92b7cc92c770e736c70c3acae7b803278'/>
<id>urn:sha1:92d9aac92b7cc92c770e736c70c3acae7b803278</id>
<content type='text'>
This commit includes two topics:

1&gt; replace deprecated strlcpy

change strlcpy to strscpy for strlcpy is marked as deprecated in
Documentation/process/deprecated.rst

2&gt; remove duplicated strlcpy line

in md_bitmap_read_sb@md-bitmap.c there are two duplicated strlcpy(), the
history:

- commit cf921cc19cf7 ("Add node recovery callbacks") introduced the first
  usage of strlcpy().

- commit b97e92574c0b ("Use separate bitmaps for each nodes in the cluster")
  introduced the second strlcpy(). this time, the two strlcpy() are same,
   we can remove anyone safely.

- commit d3b178adb3a3 ("md: Skip cluster setup for dm-raid") added dm-raid
  special handling. And the "nodes" value is the key of this patch. but
  from this patch, strlcpy() which was introduced by b97e92574c0bf
  become necessary.

- commit 3c462c880b52 ("md: Increment version for clustered bitmaps") used
  clustered major version to only handle in clustered env. this patch
  could look a polishment for clustered code logic.

So cf921cc19cf7 became useless after d3b178adb3a3a, we could remove it
safely.

Signed-off-by: Heming Zhao &lt;heming.zhao@suse.com&gt;
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
</content>
</entry>
<entry>
<title>md: fix spelling of "its"</title>
<updated>2022-01-06T16:37:03Z</updated>
<author>
<name>Randy Dunlap</name>
<email>rdunlap@infradead.org</email>
</author>
<published>2021-12-26T02:24:11Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=dd3dc5f416b7247a4b5d7bac6698be623c180572'/>
<id>urn:sha1:dd3dc5f416b7247a4b5d7bac6698be623c180572</id>
<content type='text'>
Use the possessive "its" instead of the contraction "it's"
in printed messages.

Signed-off-by: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Cc: Song Liu &lt;song@kernel.org&gt;
Cc: linux-raid@vger.kernel.org
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'for-5.11/drivers-2020-12-14' of git://git.kernel.dk/linux-block</title>
<updated>2020-12-16T21:09:32Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-12-16T21:09:32Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=69f637c33560b02ae7313e0c142d847361cc723a'/>
<id>urn:sha1:69f637c33560b02ae7313e0c142d847361cc723a</id>
<content type='text'>
Pull block driver updates from Jens Axboe:
 "Nothing major in here:

   - NVMe pull request from Christoph:
        - nvmet passthrough improvements (Chaitanya Kulkarni)
        - fcloop error injection support (James Smart)
        - read-only support for zoned namespaces without Zone Append
          (Javier González)
        - improve some error message (Minwoo Im)
        - reject I/O to offline fabrics namespaces (Victor Gladkov)
        - PCI queue allocation cleanups (Niklas Schnelle)
        - remove an unused allocation in nvmet (Amit Engel)
        - a Kconfig spelling fix (Colin Ian King)
        - nvme_req_qid simplication (Baolin Wang)

   - MD pull request from Song:
        - Fix race condition in md_ioctl() (Dae R. Jeong)
        - Initialize read_slot properly for raid10 (Kevin Vigor)
        - Code cleanup (Pankaj Gupta)
        - md-cluster resync/reshape fix (Zhao Heming)

   - Move null_blk into its own directory (Damien Le Moal)

   - null_blk zone and discard improvements (Damien Le Moal)

   - bcache race fix (Dongsheng Yang)

   - Set of rnbd fixes/improvements (Gioh Kim, Guoqing Jiang, Jack Wang,
     Lutz Pogrell, Md Haris Iqbal)

   - lightnvm NULL pointer deref fix (tangzhenhao)

   - sr in_interrupt() removal (Sebastian Andrzej Siewior)

   - FC endpoint security support for s390/dasd (Jan Höppner, Sebastian
     Ott, Vineeth Vijayan). From the s390 arch guys, arch bits included
     as it made it easier for them to funnel the feature through the
     block driver tree.

   - Follow up fixes (Colin Ian King)"

* tag 'for-5.11/drivers-2020-12-14' of git://git.kernel.dk/linux-block: (64 commits)
  block: drop dead assignments in loop_init()
  sr: Remove in_interrupt() usage in sr_init_command().
  sr: Switch the sector size back to 2048 if sr_read_sector() changed it.
  cdrom: Reset sector_size back it is not 2048.
  drivers/lightnvm: fix a null-ptr-deref bug in pblk-core.c
  null_blk: Move driver into its own directory
  null_blk: Allow controlling max_hw_sectors limit
  null_blk: discard zones on reset
  null_blk: cleanup discard handling
  null_blk: Improve implicit zone close
  null_blk: improve zone locking
  block: Align max_hw_sectors to logical blocksize
  null_blk: Fail zone append to conventional zones
  null_blk: Fix zone size initialization
  bcache: fix race between setting bdev state to none and new write request direct to backing
  block/rnbd: fix a null pointer dereference on dev-&gt;blk_symlink_name
  block/rnbd-clt: Dynamically alloc buffer for pathname &amp; blk_symlink_name
  block/rnbd: call kobject_put in the failure path
  Documentation/ABI/rnbd-srv: add document for force_close
  block/rnbd-srv: close a mapped device from server side.
  ...
</content>
</entry>
<entry>
<title>md/cluster: fix deadlock when node is doing resync job</title>
<updated>2020-11-30T18:12:35Z</updated>
<author>
<name>Zhao Heming</name>
<email>heming.zhao@suse.com</email>
</author>
<published>2020-11-19T11:41:34Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=bca5b0658020be90b6b504ca514fd80110204f71'/>
<id>urn:sha1:bca5b0658020be90b6b504ca514fd80110204f71</id>
<content type='text'>
md-cluster uses MD_CLUSTER_SEND_LOCK to make node can exclusively send msg.
During sending msg, node can concurrently receive msg from another node.
When node does resync job, grab token_lockres:EX may trigger a deadlock:
```
nodeA                       nodeB
--------------------     --------------------
a.
send METADATA_UPDATED
held token_lockres:EX
                         b.
                         md_do_sync
                          resync_info_update
                            send RESYNCING
                             + set MD_CLUSTER_SEND_LOCK
                             + wait for holding token_lockres:EX

                         c.
                         mdadm /dev/md0 --remove /dev/sdg
                          + held reconfig_mutex
                          + send REMOVE
                             + wait_event(MD_CLUSTER_SEND_LOCK)

                         d.
                         recv_daemon //METADATA_UPDATED from A
                          process_metadata_update
                           + (mddev_trylock(mddev) ||
                              MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD)
                             //this time, both return false forever
```
Explaination:
a. A send METADATA_UPDATED
   This will block another node to send msg

b. B does sync jobs, which will send RESYNCING at intervals.
   This will be block for holding token_lockres:EX lock.

c. B do "mdadm --remove", which will send REMOVE.
   This will be blocked by step &lt;b&gt;: MD_CLUSTER_SEND_LOCK is 1.

d. B recv METADATA_UPDATED msg, which send from A in step &lt;a&gt;.
   This will be blocked by step &lt;c&gt;: holding mddev lock, it makes
   wait_event can't hold mddev lock. (btw,
   MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD keep ZERO in this scenario.)

There is a similar deadlock in commit 0ba959774e93
("md-cluster: use sync way to handle METADATA_UPDATED msg")
In that commit, step c is "update sb". This patch step c is
"mdadm --remove".

For fixing this issue, we can refer the solution of function:
metadata_update_start. Which does the same grab lock_token action.
lock_comm can use the same steps to avoid deadlock. By moving
MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD from lock_token to lock_comm.
It enlarge a little bit window of MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD,
but it is safe &amp; can break deadlock.

Repro steps (I only triggered 3 times with hundreds tests):

two nodes share 3 iSCSI luns: sdg/sdh/sdi. Each lun size is 1GB.
```
ssh root@node2 "mdadm -S --scan"
mdadm -S --scan
for i in {g,h,i};do dd if=/dev/zero of=/dev/sd$i oflag=direct bs=1M \
count=20; done

mdadm -C /dev/md0 -b clustered -e 1.2 -n 2 -l mirror /dev/sdg /dev/sdh \
 --bitmap-chunk=1M
ssh root@node2 "mdadm -A /dev/md0 /dev/sdg /dev/sdh"

sleep 5

mkfs.xfs /dev/md0
mdadm --manage --add /dev/md0 /dev/sdi
mdadm --wait /dev/md0
mdadm --grow --raid-devices=3 /dev/md0

mdadm /dev/md0 --fail /dev/sdg
mdadm /dev/md0 --remove /dev/sdg
mdadm --grow --raid-devices=2 /dev/md0
```

test script will hung when executing "mdadm --remove".

```
 # dump stacks by "echo t &gt; /proc/sysrq-trigger"
md0_cluster_rec D    0  5329      2 0x80004000
Call Trace:
 __schedule+0x1f6/0x560
 ? _cond_resched+0x2d/0x40
 ? schedule+0x4a/0xb0
 ? process_metadata_update.isra.0+0xdb/0x140 [md_cluster]
 ? wait_woken+0x80/0x80
 ? process_recvd_msg+0x113/0x1d0 [md_cluster]
 ? recv_daemon+0x9e/0x120 [md_cluster]
 ? md_thread+0x94/0x160 [md_mod]
 ? wait_woken+0x80/0x80
 ? md_congested+0x30/0x30 [md_mod]
 ? kthread+0x115/0x140
 ? __kthread_bind_mask+0x60/0x60
 ? ret_from_fork+0x1f/0x40

mdadm           D    0  5423      1 0x00004004
Call Trace:
 __schedule+0x1f6/0x560
 ? __schedule+0x1fe/0x560
 ? schedule+0x4a/0xb0
 ? lock_comm.isra.0+0x7b/0xb0 [md_cluster]
 ? wait_woken+0x80/0x80
 ? remove_disk+0x4f/0x90 [md_cluster]
 ? hot_remove_disk+0xb1/0x1b0 [md_mod]
 ? md_ioctl+0x50c/0xba0 [md_mod]
 ? wait_woken+0x80/0x80
 ? blkdev_ioctl+0xa2/0x2a0
 ? block_ioctl+0x39/0x40
 ? ksys_ioctl+0x82/0xc0
 ? __x64_sys_ioctl+0x16/0x20
 ? do_syscall_64+0x5f/0x150
 ? entry_SYSCALL_64_after_hwframe+0x44/0xa9

md0_resync      D    0  5425      2 0x80004000
Call Trace:
 __schedule+0x1f6/0x560
 ? schedule+0x4a/0xb0
 ? dlm_lock_sync+0xa1/0xd0 [md_cluster]
 ? wait_woken+0x80/0x80
 ? lock_token+0x2d/0x90 [md_cluster]
 ? resync_info_update+0x95/0x100 [md_cluster]
 ? raid1_sync_request+0x7d3/0xa40 [raid1]
 ? md_do_sync.cold+0x737/0xc8f [md_mod]
 ? md_thread+0x94/0x160 [md_mod]
 ? md_congested+0x30/0x30 [md_mod]
 ? kthread+0x115/0x140
 ? __kthread_bind_mask+0x60/0x60
 ? ret_from_fork+0x1f/0x40
```

At last, thanks for Xiao's solution.

Cc: stable@vger.kernel.org
Signed-off-by: Zhao Heming &lt;heming.zhao@suse.com&gt;
Suggested-by: Xiao Ni &lt;xni@redhat.com&gt;
Reviewed-by: Xiao Ni &lt;xni@redhat.com&gt;
Signed-off-by: Song Liu &lt;songliubraving@fb.com&gt;
</content>
</entry>
<entry>
<title>md: remove a spurious call to revalidate_disk_size in update_size</title>
<updated>2020-11-16T15:34:15Z</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2020-11-16T14:57:12Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=94d91e7f8c221260790a482373d347ea85efb7b3'/>
<id>urn:sha1:94d91e7f8c221260790a482373d347ea85efb7b3</id>
<content type='text'>
None of the -&gt;resize methods updates the disk size, so calling
revalidate_disk_size here won't do anything.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Acked-by: Song Liu &lt;song@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
</feed>
