<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/drivers/scsi/scsi_error.c, branch linux-6.9.y</title>
<subtitle>Hosts the 0x221E linux distro kernel.</subtitle>
<id>https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-6.9.y</id>
<link rel='self' href='https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-6.9.y'/>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/'/>
<updated>2024-02-05T21:15:20Z</updated>
<entry>
<title>scsi: core: Move scsi_host_busy() out of host lock if it is for per-command</title>
<updated>2024-02-05T21:15:20Z</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2024-02-03T02:45:21Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=4e6c9011990726f4d175e2cdfebe5b0b8cce4839'/>
<id>urn:sha1:4e6c9011990726f4d175e2cdfebe5b0b8cce4839</id>
<content type='text'>
Commit 4373534a9850 ("scsi: core: Move scsi_host_busy() out of host lock
for waking up EH handler") intended to fix a hard lockup issue triggered by
EH. The core idea was to move scsi_host_busy() out of the host lock when
processing individual commands for EH. However, a suggested style change
inadvertently caused scsi_host_busy() to remain under the host lock. Fix
this by calling scsi_host_busy() outside the lock.

Fixes: 4373534a9850 ("scsi: core: Move scsi_host_busy() out of host lock for waking up EH handler")
Cc: Sathya Prakash Veerichetty &lt;safhya.prakash@broadcom.com&gt;
Cc: Bart Van Assche &lt;bvanassche@acm.org&gt;
Cc: Ewan D. Milne &lt;emilne@redhat.com&gt;
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Link: https://lore.kernel.org/r/20240203024521.2006455-1-ming.lei@redhat.com
Reviewed-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Signed-off-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
</content>
</entry>
<entry>
<title>scsi: core: Move scsi_host_busy() out of host lock for waking up EH handler</title>
<updated>2024-01-24T02:21:51Z</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2024-01-12T07:00:00Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=4373534a9850627a2695317944898eb1283a2db0'/>
<id>urn:sha1:4373534a9850627a2695317944898eb1283a2db0</id>
<content type='text'>
Inside scsi_eh_wakeup(), scsi_host_busy() is called &amp; checked with host
lock every time for deciding if error handler kthread needs to be waken up.

This can be too heavy in case of recovery, such as:

 - N hardware queues

 - queue depth is M for each hardware queue

 - each scsi_host_busy() iterates over (N * M) tag/requests

If recovery is triggered in case that all requests are in-flight, each
scsi_eh_wakeup() is strictly serialized, when scsi_eh_wakeup() is called
for the last in-flight request, scsi_host_busy() has been run for (N * M -
1) times, and request has been iterated for (N*M - 1) * (N * M) times.

If both N and M are big enough, hard lockup can be triggered on acquiring
host lock, and it is observed on mpi3mr(128 hw queues, queue depth 8169).

Fix the issue by calling scsi_host_busy() outside the host lock. We don't
need the host lock for getting busy count because host the lock never
covers that.

[mkp: Drop unnecessary 'busy' variables pointed out by Bart]

Cc: Ewan Milne &lt;emilne@redhat.com&gt;
Fixes: 6eb045e092ef ("scsi: core: avoid host-wide host_busy counter for scsi_mq")
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Link: https://lore.kernel.org/r/20240112070000.4161982-1-ming.lei@redhat.com
Reviewed-by: Ewan D. Milne &lt;emilne@redhat.com&gt;
Reviewed-by: Sathya Prakash Veerichetty &lt;safhya.prakash@broadcom.com&gt;
Tested-by: Sathya Prakash Veerichetty &lt;safhya.prakash@broadcom.com&gt;
Reviewed-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Signed-off-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
</content>
</entry>
<entry>
<title>Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi</title>
<updated>2024-01-20T17:42:32Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2024-01-20T17:42:32Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=c25b24fa72c734f8cd6c31a13548013263b26286'/>
<id>urn:sha1:c25b24fa72c734f8cd6c31a13548013263b26286</id>
<content type='text'>
Pull SCSI updates from James Bottomley:
 "Final round of fixes that came in too late to send in the first
  request.

  It's nine bug fixes and one version update (because of a bug fix) and
  one set of PCI ID additions. There's one bug fix in the core which is
  really a one liner (except that an additional sdev pointer was added
  for convenience) and the rest are in drivers"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: target: core: Add TMF to tmr_list handling
  scsi: core: Kick the requeue list after inserting when flushing
  scsi: fnic: unlock on error path in fnic_queuecommand()
  scsi: fcoe: Fix unsigned comparison with zero in store_ctlr_mode()
  scsi: mpi3mr: Fix mpi3mr_fw.c kernel-doc warnings
  scsi: smartpqi: Bump driver version to 2.1.26-030
  scsi: smartpqi: Fix logical volume rescan race condition
  scsi: smartpqi: Add new controller PCI IDs
  scsi: ufs: qcom: Remove unnecessary goto statement from ufs_qcom_config_esi()
  scsi: ufs: core: Remove the ufshcd_hba_exit() call from ufshcd_async_scan()
  scsi: ufs: core: Simplify power management during async scan
</content>
</entry>
<entry>
<title>scsi: core: Kick the requeue list after inserting when flushing</title>
<updated>2024-01-12T02:37:43Z</updated>
<author>
<name>Niklas Cassel</name>
<email>cassel@kernel.org</email>
</author>
<published>2024-01-11T12:05:32Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=6df0e077d76bd144c533b61d6182676aae6b0a85'/>
<id>urn:sha1:6df0e077d76bd144c533b61d6182676aae6b0a85</id>
<content type='text'>
When libata calls ata_link_abort() to abort all ata queued commands, it
calls blk_abort_request() on the SCSI command representing each QC.

This causes scsi_timeout() to be called, which calls scsi_eh_scmd_add() for
each SCSI command.

scsi_eh_scmd_add() sets the SCSI host to state recovery, and then adds the
command to shost-&gt;eh_cmd_q.

This will wake up the SCSI EH, and eventually the libata EH strategy
handler will be called, which calls scsi_eh_flush_done_q() to either flush
retry or flush finish each failed command.

The commands that are flush retried by scsi_eh_flush_done_q() are done so
using scsi_queue_insert().

Before commit 8b566edbdbfb ("scsi: core: Only kick the requeue list if
necessary"), __scsi_queue_insert() called blk_mq_requeue_request() with the
second argument set to true, indicating that it should always kick/run the
requeue list after inserting.

After commit 8b566edbdbfb ("scsi: core: Only kick the requeue list if
necessary"), __scsi_queue_insert() does not kick/run the requeue list after
inserting, if the current SCSI host state is recovery (which is the case in
the libata example above).

This optimization is probably fine in most cases, as I can only assume that
most often someone will eventually kick/run the queues.

However, that is not the case for scsi_eh_flush_done_q(), where we can see
that the request gets inserted to the requeue list, but the queue is never
started after the request has been inserted, leading to the block layer
waiting for the completion of command that never gets to run.

Since scsi_eh_flush_done_q() is called by SCSI EH context, the SCSI host
state is most likely always in recovery when this function is called.

Thus, let scsi_eh_flush_done_q() explicitly kick the requeue list after
inserting a flush retry command, so that scsi_eh_flush_done_q() keeps the
same behavior as before commit 8b566edbdbfb ("scsi: core: Only kick the
requeue list if necessary").

Simple reproducer for the libata example above:
$ hdparm -Y /dev/sda
$ echo 1 &gt; /sys/class/scsi_device/0\:0\:0\:0/device/delete

Fixes: 8b566edbdbfb ("scsi: core: Only kick the requeue list if necessary")
Reported-by: Kevin Locke &lt;kevin@kevinlocke.name&gt;
Closes: https://lore.kernel.org/linux-scsi/ZZw3Th70wUUvCiCY@kevinlocke.name/
Signed-off-by: Niklas Cassel &lt;cassel@kernel.org&gt;
Link: https://lore.kernel.org/r/20240111120533.3612509-1-cassel@kernel.org
Reviewed-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Reviewed-by: Damien Le Moal &lt;dlemoal@kernel.org&gt;
Signed-off-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
</content>
</entry>
<entry>
<title>Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi</title>
<updated>2024-01-11T22:24:32Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2024-01-11T22:24:32Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=22d29f1112c85c1ad519a8c0403f7f7289cf060c'/>
<id>urn:sha1:22d29f1112c85c1ad519a8c0403f7f7289cf060c</id>
<content type='text'>
Pull SCSI updates from James Bottomley:
 "Updates to the usual drivers (ufs, mpi3mr, mpt3sas, lpfc, fnic,
  hisi_sas, arcmsr, ) plus the usual assorted minor fixes and updates.

  This time around there's only a single line update to the core, so
  nothing major and barely anything minor"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (135 commits)
  scsi: ufs: core: Simplify ufshcd_auto_hibern8_update()
  scsi: ufs: core: Rename ufshcd_auto_hibern8_enable() and make it static
  scsi: ufs: qcom: Fix ESI vector mask
  scsi: ufs: host: Fix kernel-doc warning
  scsi: hisi_sas: Correct the number of global debugfs registers
  scsi: hisi_sas: Rollback some operations if FLR failed
  scsi: hisi_sas: Check before using pointer variables
  scsi: hisi_sas: Replace with standard error code return value
  scsi: hisi_sas: Set .phy_attached before notifing phyup event HISI_PHYE_PHY_UP_PM
  scsi: ufs: core: Add sysfs node for UFS RTC update
  scsi: ufs: core: Add UFS RTC support
  scsi: ufs: core: Add ufshcd_is_ufs_dev_busy()
  scsi: ufs: qcom: Remove unused definitions
  scsi: ufs: qcom: Use ufshcd_rmwl() where applicable
  scsi: ufs: qcom: Remove support for host controllers older than v2.0
  scsi: ufs: qcom: Simplify ufs_qcom_{assert/deassert}_reset
  scsi: ufs: qcom: Initialize cycles_in_1us variable in ufs_qcom_set_core_clk_ctrl()
  scsi: ufs: qcom: Sort includes alphabetically
  scsi: ufs: qcom: Remove unused ufs_qcom_hosts struct array
  scsi: ufs: qcom: Use dev_err_probe() to simplify error handling of devm_gpiod_get_optional()
  ...
</content>
</entry>
<entry>
<title>scsi: core: Always send batch on reset or error handling command</title>
<updated>2023-12-19T02:09:41Z</updated>
<author>
<name>Alexander Atanasov</name>
<email>alexander.atanasov@virtuozzo.com</email>
</author>
<published>2023-12-15T12:10:08Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=066c5b46b6eaf2f13f80c19500dbb3b84baabb33'/>
<id>urn:sha1:066c5b46b6eaf2f13f80c19500dbb3b84baabb33</id>
<content type='text'>
In commit 8930a6c20791 ("scsi: core: add support for request batching") the
block layer bd-&gt;last flag was mapped to SCMD_LAST and used as an indicator
to send the batch for the drivers that implement this feature. However, the
error handling code was not updated accordingly.

scsi_send_eh_cmnd() is used to send error handling commands and request
sense. The problem is that request sense comes as a single command that
gets into the batch queue and times out. As a result the device goes
offline after several failed resets. This was observed on virtio_scsi
during a device resize operation.

[  496.316946] sd 0:0:4:0: [sdd] tag#117 scsi_eh_0: requesting sense
[  506.786356] sd 0:0:4:0: [sdd] tag#117 scsi_send_eh_cmnd timeleft: 0
[  506.787981] sd 0:0:4:0: [sdd] tag#117 abort

To fix this always set SCMD_LAST flag in scsi_send_eh_cmnd() and
scsi_reset_ioctl().

Fixes: 8930a6c20791 ("scsi: core: add support for request batching")
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Alexander Atanasov &lt;alexander.atanasov@virtuozzo.com&gt;
Link: https://lore.kernel.org/r/20231215121008.2881653-1-alexander.atanasov@virtuozzo.com
Reviewed-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
</content>
</entry>
<entry>
<title>scsi: core: Add a precondition check in scsi_eh_scmd_add()</title>
<updated>2023-11-25T00:23:44Z</updated>
<author>
<name>Bart Van Assche</name>
<email>bvanassche@acm.org</email>
</author>
<published>2023-11-15T19:33:43Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=10b53db2db8dfda84b25833043f2b63123572af6'/>
<id>urn:sha1:10b53db2db8dfda84b25833043f2b63123572af6</id>
<content type='text'>
Calling scsi_eh_scmd_add() may cause the error handler never to be woken up
because this may result in shost-&gt;host_failed to become larger than
scsi_host_busy(shost). Hence complain if scsi_eh_scmd_add() is called after
SCMD_STATE_INFLIGHT has been cleared.

Cc: Hannes Reinecke &lt;hare@suse.de&gt;
Cc: Damien Le Moal &lt;damien.lemoal@opensource.wdc.com&gt;
Cc: Mike Christie &lt;michael.christie@oracle.com&gt;
Cc: John Garry &lt;john.g.garry@oracle.com&gt;
Cc: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Link: https://lore.kernel.org/r/20231115193343.2262013-1-bvanassche@acm.org
Signed-off-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
</content>
</entry>
<entry>
<title>scsi: sd: Handle read/write CDL timeout failures</title>
<updated>2023-05-22T21:05:19Z</updated>
<author>
<name>Niklas Cassel</name>
<email>niklas.cassel@wdc.com</email>
</author>
<published>2023-05-11T01:13:44Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=390e2d1a587405a522dc6b433d45648f895a352c'/>
<id>urn:sha1:390e2d1a587405a522dc6b433d45648f895a352c</id>
<content type='text'>
Commands using a duration limit descriptor that has limit policies set to a
value other than 0x0 may be failed by the device if one of the limits are
exceeded. For such commands, since the failure is the result of the user
duration limit configuration and workload, the commands should not be
retried and terminated immediately. Furthermore, to allow the user to
differentiate these "soft" failures from hard errors due to hardware
problem, a different error code than EIO should be returned.

There are 2 cases to consider:

(1) The failure is due to a limit policy failing the command with a check
condition sense key, that is, any limit policy other than 0xD.  For this
case, scsi_check_sense() is modified to detect failures with the ABORTED
COMMAND sense key and the COMMAND TIMEOUT BEFORE PROCESSING or COMMAND
TIMEOUT DURING PROCESSING or COMMAND TIMEOUT DURING PROCESSING DUE TO ERROR
RECOVERY additional sense code. For these failures, a SUCCESS disposition
is returned so that scsi_finish_command() is called to terminate the
command.

(2) The failure is due to a limit policy set to 0xD, which result in the
command being terminated with a GOOD status, COMPLETED sense key, and DATA
CURRENTLY UNAVAILABLE additional sense code. To handle this case, the
scsi_check_sense() is modified to return a SUCCESS disposition so that
scsi_finish_command() is called to terminate the command.  In addition,
scsi_decide_disposition() has to be modified to see if a command being
terminated with GOOD status has sense data.  This is as defined in SCSI
Primary Commands - 6 (SPC-6), so all according to spec, even if GOOD status
commands were not checked before.

If scsi_check_sense() detects sense data representing a duration limit,
scsi_check_sense() will set the newly introduced SCSI ML byte
SCSIML_STAT_DL_TIMEOUT. This SCSI ML byte is checked in scsi_noretry_cmd(),
so that a command that failed because of a CDL timeout cannot be
retried. The SCSI ML byte is also checked in scsi_result_to_blk_status() to
complete the command request with the BLK_STS_DURATION_LIMIT status, which
result in the user seeing ETIME errors for the failed commands.

Co-developed-by: Damien Le Moal &lt;dlemoal@kernel.org&gt;
Signed-off-by: Damien Le Moal &lt;dlemoal@kernel.org&gt;
Reviewed-by: Hannes Reinecke &lt;hare@suse.de&gt;
Signed-off-by: Niklas Cassel &lt;niklas.cassel@wdc.com&gt;
Link: https://lore.kernel.org/r/20230511011356.227789-12-nks@flawful.org
Signed-off-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
</content>
</entry>
<entry>
<title>scsi: core: Allow libata to complete successful commands via EH</title>
<updated>2023-05-22T21:05:18Z</updated>
<author>
<name>Niklas Cassel</name>
<email>niklas.cassel@wdc.com</email>
</author>
<published>2023-05-11T01:13:37Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=3d848ca1ebc8d8864f25bd461914c93eff82a2d2'/>
<id>urn:sha1:3d848ca1ebc8d8864f25bd461914c93eff82a2d2</id>
<content type='text'>
In SCSI, we get the sense data as part of the completion, for ATA however,
we need to fetch the sense data as an extra step. For an aborted ATA
command the sense data is fetched via libata's -&gt;eh_strategy_handler().

For Command Duration Limits policy 0xD:

  The device shall complete the command without error with the additional
  sense code set to DATA CURRENTLY UNAVAILABLE.

In order to handle this policy in libata, we intend to send a successful
command via SCSI EH, and let libata's -&gt;eh_strategy_handler() fetch the
sense data for the good command. This is similar to how we handle an
aborted ATA command, just that we need to read the Successful NCQ Commands
log instead of the NCQ Command Error log.

When we get a SATA completion with successful commands, ATA_SENSE will be
set, indicating that some commands in the completion have sense data.

The sense_valid bitmask in the Sense Data for Successful NCQ Commands log
will inform exactly which commands that had sense data, which might be a
subset of all the commands that was completed in the same completion. (Yet
all will have ATA_SENSE set, since the status is per completion.)

The successful commands that have e.g. a "DATA CURRENTLY UNAVAILABLE" sense
data will have a SCSI ML byte set, so scsi_eh_flush_done_q() will not set
the scmd-&gt;result to DID_TIME_OUT for these commands. However, the
successful commands that did not have sense data, must not get their result
marked as DID_TIME_OUT by SCSI EH.

Add a new flag SCMD_FORCE_EH_SUCCESS, which tells SCSI EH to not mark a
command as DID_TIME_OUT, even if it has scmd-&gt;result == SAM_STAT_GOOD.

This will be used by libata in a subsequent commit.

Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Hannes Reinecke &lt;hare@suse.de&gt;
Signed-off-by: Niklas Cassel &lt;niklas.cassel@wdc.com&gt;
Link: https://lore.kernel.org/r/20230511011356.227789-5-nks@flawful.org
Signed-off-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
</content>
</entry>
<entry>
<title>scsi: core: Declare most SCSI host template pointers const</title>
<updated>2023-03-24T23:19:19Z</updated>
<author>
<name>Bart Van Assche</name>
<email>bvanassche@acm.org</email>
</author>
<published>2023-03-22T19:53:57Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=31435de9746670d884f84a3c094a401aa27747aa'/>
<id>urn:sha1:31435de9746670d884f84a3c094a401aa27747aa</id>
<content type='text'>
Prepare for constifying most SCSI host template pointers by constifying the
SCSI host template pointer arguments and variables in the SCSI core.

Reviewed-by: Benjamin Block &lt;bblock@linux.ibm.com&gt;
Reviewed-by: John Garry &lt;john.g.garry@oracle.com&gt;
Reviewed-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Hannes Reinecke &lt;hare@suse.de&gt;
Cc: Mike Christie &lt;michael.christie@oracle.com&gt;
Signed-off-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Link: https://lore.kernel.org/r/20230322195515.1267197-3-bvanassche@acm.org
Signed-off-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
</content>
</entry>
</feed>
