[SCSI] Handle disk devices which can not process medium access commands

We have experienced several devices which fail in a fashion we do not currently handle gracefully in SCSI. After a failure these devices will respond to the SCSI primary command set (INQUIRY, TEST UNIT READY, etc.) but any command accessing the storage medium will time out. The following patch adds an callback that can be used by upper level drivers to inspect the results of an error handling command. This in turn has been used to implement additional checking in the SCSI disk driver. If a medium access command fails twice but TEST UNIT READY succeeds both times in the subsequent error handling we will offline the device. The maximum number of failed commands required to take a device offline can be tweaked in sysfs. Also add a new error flag to scsi_debug which allows this scenario to be easily reproduced. [jejb: fix up integer parsing to use kstrtouint] Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
author: Martin K. Petersen <martin.petersen@oracle.com> 2012-02-09 13:48:53 -0500
committer: James Bottomley <JBottomley@Parallels.com> 2012-02-19 10:14:52 -0600
commit: 18a4d0a22ed6c54b67af7718c305cd010f09ddf8 (patch)
tree: 06e22a92290ff84b2c1d5abb09424493de384c4b /drivers/scsi/scsi_error.c
parent: a78e21dc5e9f896ecee5b1fbe189690dfcca38e1 (diff)
1 files changed, 9 insertions, 3 deletions
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index f66e90db3bee..2cfcbffa41fd 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -30,6 +30,7 @@
 #include <scsi/scsi_cmnd.h>
 #include <scsi/scsi_dbg.h>
 #include <scsi/scsi_device.h>
+#include <scsi/scsi_driver.h>
 #include <scsi/scsi_eh.h>
 #include <scsi/scsi_transport.h>
 #include <scsi/scsi_host.h>
@@ -141,11 +142,11 @@ enum blk_eh_timer_return scsi_times_out(struct request *req)
 	else if (host->hostt->eh_timed_out)
 		rtn = host->hostt->eh_timed_out(scmd);
 
+	scmd->result |= DID_TIME_OUT << 16;
+
 	if (unlikely(rtn == BLK_EH_NOT_HANDLED &&
-		     !scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD))) {
-		scmd->result |= DID_TIME_OUT << 16;
+		     !scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD)))
 		rtn = BLK_EH_HANDLED;
-	}
 
 	return rtn;
 }
@@ -778,6 +779,7 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd,
 			     int cmnd_size, int timeout, unsigned sense_bytes)
 {
 	struct scsi_device *sdev = scmd->device;
+	struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd);
 	struct Scsi_Host *shost = sdev->host;
 	DECLARE_COMPLETION_ONSTACK(done);
 	unsigned long timeleft;
@@ -832,6 +834,10 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd,
 	}
 
 	scsi_eh_restore_cmnd(scmd, &ses);
+
+	if (sdrv->eh_action)
+		rtn = sdrv->eh_action(scmd, cmnd, cmnd_size, rtn);
+
 	return rtn;
 }
author	Martin K. Petersen <martin.petersen@oracle.com>	2012-02-09 13:48:53 -0500
committer	James Bottomley <JBottomley@Parallels.com>	2012-02-19 10:14:52 -0600
commit	18a4d0a22ed6c54b67af7718c305cd010f09ddf8 (patch)
tree	06e22a92290ff84b2c1d5abb09424493de384c4b /drivers/scsi/scsi_error.c
parent	a78e21dc5e9f896ecee5b1fbe189690dfcca38e1 (diff)