OVH Community, your new community space.

Guasto HDD ks39197


Abicelli
15.10.2014, 15.40
Buongiorno,
Ho rilevato un guasto riguardante il server ks39197 con ip 91.121.20.206:

Hello,
I have a fault HDD on server ks39197 IP: 91.121.20.206:

smartctl -a -d ata /dev/sda
Codice:
smartctl 5.40 2010-07-12 r3124 [x86_64-unknown-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model:     ST2000DM001-9YN164
Serial Number:    W2F02W41
Firmware Version: CC4C
User Capacity:    2,000,398,934,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Thu Oct 16 18:30:44 2014 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 ( 600) seconds.
Offline data collection
capabilities:                    (0x73) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 255) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x3085) SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   092   083   006    Pre-fail  Always       -       119890984
  3 Spin_Up_Time            0x0003   092   092   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       26
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   087   060   030    Pre-fail  Always       -       551888123
  9 Power_On_Hours          0x0032   074   074   000    Old_age   Always       -       23276
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       25
183 Runtime_Bad_Block       0x0032   096   096   000    Old_age   Always       -       4
184 End-to-End_Error        0x0032   001   001   099    Old_age   Always   FAILING_NOW 258
187 Reported_Uncorrect      0x0032   001   001   000    Old_age   Always       -       1173
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   071   061   045    Old_age   Always       -       29 (Lifetime Min/Max 28/30)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       23
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       1483
194 Temperature_Celsius     0x0022   029   040   000    Old_age   Always       -       29 (0 17 0 0)
197 Current_Pending_Sector  0x0012   099   001   000    Old_age   Always       -       216
198 Offline_Uncorrectable   0x0010   099   001   000    Old_age   Offline      -       216
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       173160196496088
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       16944690517461
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       237593359918220

SMART Error Log Version: 1
ATA Error Count: 1093 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 1093 occurred at disk power-on lifetime: 22412 hours (933 days + 20 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 08 ff ff ff ef 00      02:13:04.926  READ DMA EXT
  ef 10 02 00 00 00 a0 00      02:13:04.926  SET FEATURES [Reserved for Serial ATA]
  27 00 00 00 00 00 e0 00      02:13:04.926  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 00      02:13:04.925  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00      02:13:04.925  SET FEATURES [Set transfer mode]

Error 1092 occurred at disk power-on lifetime: 22412 hours (933 days + 20 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 08 ff ff ff ef 00      02:13:02.094  READ DMA EXT
  ef 10 02 00 00 00 a0 00      02:13:02.093  SET FEATURES [Reserved for Serial ATA]
  27 00 00 00 00 00 e0 00      02:13:02.093  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 00      02:13:02.093  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00      02:13:02.066  SET FEATURES [Set transfer mode]

Error 1091 occurred at disk power-on lifetime: 22412 hours (933 days + 20 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 08 ff ff ff ef 00      02:12:59.191  READ DMA EXT
  ef 10 02 00 00 00 a0 00      02:12:59.191  SET FEATURES [Reserved for Serial ATA]
  27 00 00 00 00 00 e0 00      02:12:59.191  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 00      02:12:59.191  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00      02:12:59.191  SET FEATURES [Set transfer mode]

Error 1090 occurred at disk power-on lifetime: 22412 hours (933 days + 20 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 08 ff ff ff ef 00      02:12:56.324  READ DMA EXT
  ef 10 02 00 00 00 a0 00      02:12:56.324  SET FEATURES [Reserved for Serial ATA]
  27 00 00 00 00 00 e0 00      02:12:56.324  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 00      02:12:56.324  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00      02:12:56.324  SET FEATURES [Set transfer mode]

Error 1089 occurred at disk power-on lifetime: 22412 hours (933 days + 20 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 08 ff ff ff ef 00      02:12:53.476  READ DMA EXT
  ef 10 02 00 00 00 a0 00      02:12:53.476  SET FEATURES [Reserved for Serial ATA]
  27 00 00 00 00 00 e0 00      02:12:53.476  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 00      02:12:53.475  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00      02:12:53.475  SET FEATURES [Set transfer mode]

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     22405         -
# 2  Short offline       Completed without error       00%     22405         -
# 3  Extended offline    Interrupted (host reset)      90%     22404         -
# 4  Short offline       Completed without error       00%      2419         -
# 5  Short offline       Completed without error       00%      2410         -
# 6  Short offline       Completed without error       00%      2410         -
# 7  Short offline       Completed without error       00%        13         -
# 8  Short offline       Completed without error       00%         3         -
# 9  Short offline       Completed without error       00%         3         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
fdisk -l

Codice:
Disk /dev/sda: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x0003b27a

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1        1306    10485760+  83  Linux
/dev/sda2            1306      243136  1942498304   8e  Linux LVM
Partition 2 does not start on physical sector boundary.
/dev/sda3          243136      243201      525920   82  Linux swap / Solaris
Partition 3 does not start on physical sector boundary.

Disk /dev/dm-0: 1988.0 GB, 1988037181440 bytes
255 heads, 63 sectors/track, 241698 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Alignment offset: 3584 bytes
Disk identifier: 0x00000000

Disk /dev/dm-0 doesn't contain a valid partition table
root@ks39197:~#
cat /proc/mdstat

Codice:
 
root@ks39197:~# 
Personalities : [linear] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
unused devices: 
"dmesg" LOG:

Codice:
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata1.00: irq_stat 0x40000001
ata1.00: failed command: READ DMA EXT
ata1.00: cmd 25/00:08:81:f9:b1/00:00:6f:00:00/e0 tag 0 dma 4096 in
ata1.00: status: { DRDY ERR }
ata1.00: error: { UNC }
ata1.00: configured for UDMA/133
ata1: EH complete
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata1.00: irq_stat 0x40000001
ata1.00: failed command: READ DMA EXT
ata1.00: cmd 25/00:08:81:f9:b1/00:00:6f:00:00/e0 tag 0 dma 4096 in
ata1.00: status: { DRDY ERR }
ata1.00: error: { UNC }
ata1.00: configured for UDMA/133
ata1: EH complete
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata1.00: irq_stat 0x40000001
ata1.00: failed command: READ DMA EXT
ata1.00: cmd 25/00:08:81:f9:b1/00:00:6f:00:00/e0 tag 0 dma 4096 in
ata1.00: status: { DRDY ERR }
ata1.00: error: { UNC }
ata1.00: configured for UDMA/133
ata1: EH complete
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata1.00: irq_stat 0x40000001
ata1.00: failed command: READ DMA EXT
ata1.00: cmd 25/00:08:81:f9:b1/00:00:6f:00:00/e0 tag 0 dma 4096 in
ata1.00: status: { DRDY ERR }
ata1.00: error: { UNC }
ata1.00: configured for UDMA/133
Descriptor sense data with sense descriptors (in hex):
ata1: EH complete
end_request: I/O error, dev sda, sector 1873934721
         res 51/40:00:89:f9:b1/00:00:6f:00:00/00 Emask 0x9 (media error)
ata1.00: error: { UNC }
         res 51/40:00:89:f9:b1/00:00:6f:00:00/00 Emask 0x9 (media error)
ata1.00: error: { UNC }
         res 51/40:00:89:f9:b1/00:00:6f:00:00/00 Emask 0x9 (media error)
ata1.00: error: { UNC }
         res 51/40:00:89:f9:b1/00:00:6f:00:00/00 Emask 0x9 (media error)
ata1.00: error: { UNC }
         res 51/40:00:89:f9:b1/00:00:6f:00:00/00 Emask 0x9 (media error)
ata1.00: error: { UNC }
         res 51/40:00:89:f9:b1/00:00:6f:00:00/00 Emask 0x9 (media error)
ata1.00: error: { UNC }
sd 0:0:0:0: [sda] Add. Sense: Unrecovered read error - auto reallocate failed
A causa dei continui errori il server inutilizzabile.
Attendo sostituzione di /dev/sda
Grazie.