OVH Community, your new community space.

Raid-1 Software lentissimo


mac
02.07.2015, 04.43
menomale!

PFM
01.07.2015, 19.21
Aggiornamento:

Dopo alcuni scambi di email con il supporto tecnico SoYouStart, oggi mi è stato sostituito il disco "sda".

La "sorpresa" è che deve essere sostituito anche il disco "sdb" ! Anche quest'ultimo, infatti, è danneggiato.

Al momento sto ricostruendo l'array del RAID-1 replicando sdb, e relative partizioni, su sda.

Una volta terminata l'operazione, chiederò di procedere con la sostituzione del disco "sdb" e dovrei essere a posto!

mac
29.06.2015, 21.46
Codice:
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Aborted by host               10%     23471         -
c'è un problema, non ha completato il test (o lo hai interrotto tu prima)
per me il disco è da cambiare

PFM
29.06.2015, 21.32
Ecco il risultato:

Codice:
smartctl 5.43 2012-06-30 r3573 [x86_64-linux-3.10.23mio-std-ipv6-64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda (SATA 3Gb/s, 4K Sectors)
Device Model:     ST2000DM001-9YN164
Serial Number:    Z1E1E2M3
LU WWN Device Id: 5 000c50 04e53c186
Firmware Version: CC4H
User Capacity:    2,000,398,934,016 bytes [2,00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Mon Jun 29 22:31:06 2015 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (  17) The self-test routine was aborted by
                                        the host.
Total time to complete Offline
data collection:                (  575) seconds.
Offline data collection
capabilities:                    (0x73) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 222) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x3085) SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   120   084   006    Pre-fail  Always       -       241949592
  3 Spin_Up_Time            0x0003   095   094   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       47
  5 Reallocated_Sector_Ct   0x0033   091   091   036    Pre-fail  Always       -       12808
  7 Seek_Error_Rate         0x000f   076   058   030    Pre-fail  Always       -       73807005881
  9 Power_On_Hours          0x0032   074   074   000    Old_age   Always       -       23471
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       47
183 Runtime_Bad_Block       0x0032   097   097   000    Old_age   Always       -       3
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   001   001   000    Old_age   Always       -       9566
188 Command_Timeout         0x0032   100   098   000    Old_age   Always       -       12885164037
189 High_Fly_Writes         0x003a   098   098   000    Old_age   Always       -       2
190 Airflow_Temperature_Cel 0x0022   064   055   045    Old_age   Always       -       36 (Min/Max 31/40)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       46
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       182
194 Temperature_Celsius     0x0022   036   045   000    Old_age   Always       -       36 (0 20 0 0 0)
197 Current_Pending_Sector  0x0012   100   026   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   026   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       239895398340965
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       58651453345418
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       213276895057001

SMART Error Log Version: 1
ATA Error Count: 9566 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 9566 occurred at disk power-on lifetime: 23397 hours (974 days + 21 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 58 15 71 02  Error: UNC at LBA = 0x02711558 = 40965464

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 58 15 71 42 00  23d+07:52:18.486  READ FPDMA QUEUED
  ef 10 02 00 00 00 a0 00  23d+07:52:18.486  SET FEATURES [Reserved for Serial ATA]
  27 00 00 00 00 00 e0 00  23d+07:52:18.486  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 00  23d+07:52:18.486  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00  23d+07:52:18.486  SET FEATURES [Set transfer mode]

Error 9565 occurred at disk power-on lifetime: 23397 hours (974 days + 21 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 58 15 71 02  Error: UNC at LBA = 0x02711558 = 40965464

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 58 15 71 42 00  23d+07:52:15.536  READ FPDMA QUEUED
  60 00 00 00 16 71 42 00  23d+07:52:15.536  READ FPDMA QUEUED
  ef 10 02 00 00 00 a0 00  23d+07:52:15.535  SET FEATURES [Reserved for Serial ATA]
  27 00 00 00 00 00 e0 00  23d+07:52:15.535  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 00  23d+07:52:15.535  IDENTIFY DEVICE

Error 9564 occurred at disk power-on lifetime: 23397 hours (974 days + 21 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 58 15 71 02  Error: UNC at LBA = 0x02711558 = 40965464

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 00 00 0d 71 42 00  23d+07:52:12.604  READ FPDMA QUEUED
  60 00 00 00 0c 71 42 00  23d+07:52:12.603  READ FPDMA QUEUED
  60 00 00 00 0b 71 42 00  23d+07:52:12.602  READ FPDMA QUEUED
  60 00 00 00 0a 71 42 00  23d+07:52:12.602  READ FPDMA QUEUED
  60 00 00 00 08 71 42 00  23d+07:52:12.601  READ FPDMA QUEUED

Error 9563 occurred at disk power-on lifetime: 23397 hours (974 days + 21 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 ff ff ff 4f 00  23d+07:42:31.757  READ FPDMA QUEUED
  ef 10 02 00 00 00 a0 00  23d+07:42:31.754  SET FEATURES [Reserved for Serial ATA]
  27 00 00 00 00 00 e0 00  23d+07:42:31.754  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 00  23d+07:42:31.754  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00  23d+07:42:31.754  SET FEATURES [Set transfer mode]

Error 9562 occurred at disk power-on lifetime: 23397 hours (974 days + 21 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 ff ff ff 4f 00  23d+07:42:28.937  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00  23d+07:42:28.921  READ FPDMA QUEUED
  60 00 08 18 28 71 42 00  23d+07:42:28.911  READ FPDMA QUEUED
  60 00 08 80 07 71 42 00  23d+07:42:28.911  READ FPDMA QUEUED
  60 00 00 e0 0b 71 42 00  23d+07:42:28.911  READ FPDMA QUEUED

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Aborted by host               10%     23471         -
# 2  Short offline       Completed without error       00%     23464         -
# 3  Short offline       Completed without error       00%     10512         -
# 4  Short offline       Completed without error       00%     10504         -
# 5  Short offline       Completed without error       00%     10504         -
# 6  Short offline       Completed without error       00%        26         -
# 7  Short offline       Completed without error       00%        22         -
# 8  Short offline       Completed without error       00%        22         -
# 9  Short offline       Completed without error       00%        21         -
#10  Short offline       Completed without error       00%        21         -
#11  Short offline       Completed without error       00%        13         -
#12  Short offline       Completed without error       00%        13         -
#13  Short offline       Completed without error       00%        11         -
#14  Short offline       Completed without error       00%         1         -
#15  Short offline       Completed without error       00%         0         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

PFM
29.06.2015, 15.15
OK Grazie Mac, procedo!

mac
29.06.2015, 15.14
questo non riporta errori.. fai un test "long" (ci mette qualche ora) poi vedi se ti dice "Completed without error" o altro.
Puoi visualizzare il report ogni volta che vuoi, in Remaining, ovviamente, vedi quanto manca alla fine.

PFM
29.06.2015, 14.09
Ciao, grazie.

L'ho lasciato andare per 8 minuti, poi ho ripetuto il test SMART ed ottengo:

Codice:
# smartctl -a /dev/sda
smartctl 5.43 2012-06-30 r3573 [x86_64-linux-3.10.23mio-std-ipv6-64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda (SATA 3Gb/s, 4K Sectors)
Device Model:     ST2000DM001-9YN164
Serial Number:    Z1E1E2M3
LU WWN Device Id: 5 000c50 04e53c186
Firmware Version: CC4H
User Capacity:    2,000,398,934,016 bytes [2,00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Mon Jun 29 15:08:06 2015 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  575) seconds.
Offline data collection
capabilities:                    (0x73) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 222) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x3085) SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   118   084   006    Pre-fail  Always       -       176465624
  3 Spin_Up_Time            0x0003   095   094   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       47
  5 Reallocated_Sector_Ct   0x0033   091   091   036    Pre-fail  Always       -       12808
  7 Seek_Error_Rate         0x000f   076   058   030    Pre-fail  Always       -       73806591393
  9 Power_On_Hours          0x0032   074   074   000    Old_age   Always       -       23464
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       47
183 Runtime_Bad_Block       0x0032   097   097   000    Old_age   Always       -       3
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   001   001   000    Old_age   Always       -       9566
188 Command_Timeout         0x0032   100   098   000    Old_age   Always       -       12885164037
189 High_Fly_Writes         0x003a   098   098   000    Old_age   Always       -       2
190 Airflow_Temperature_Cel 0x0022   062   055   045    Old_age   Always       -       38 (Min/Max 31/38)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       46
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       182
194 Temperature_Celsius     0x0022   038   045   000    Old_age   Always       -       38 (0 20 0 0 0)
197 Current_Pending_Sector  0x0012   100   026   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   026   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       180783763446109
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       58556762625275
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       213211805873756

SMART Error Log Version: 1
ATA Error Count: 9566 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 9566 occurred at disk power-on lifetime: 23397 hours (974 days + 21 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 58 15 71 02  Error: UNC at LBA = 0x02711558 = 40965464

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 58 15 71 42 00  23d+07:52:18.486  READ FPDMA QUEUED
  ef 10 02 00 00 00 a0 00  23d+07:52:18.486  SET FEATURES [Reserved for Serial ATA]
  27 00 00 00 00 00 e0 00  23d+07:52:18.486  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 00  23d+07:52:18.486  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00  23d+07:52:18.486  SET FEATURES [Set transfer mode]

Error 9565 occurred at disk power-on lifetime: 23397 hours (974 days + 21 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 58 15 71 02  Error: UNC at LBA = 0x02711558 = 40965464

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 58 15 71 42 00  23d+07:52:15.536  READ FPDMA QUEUED
  60 00 00 00 16 71 42 00  23d+07:52:15.536  READ FPDMA QUEUED
  ef 10 02 00 00 00 a0 00  23d+07:52:15.535  SET FEATURES [Reserved for Serial ATA]
  27 00 00 00 00 00 e0 00  23d+07:52:15.535  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 00  23d+07:52:15.535  IDENTIFY DEVICE

Error 9564 occurred at disk power-on lifetime: 23397 hours (974 days + 21 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 58 15 71 02  Error: UNC at LBA = 0x02711558 = 40965464

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 00 00 0d 71 42 00  23d+07:52:12.604  READ FPDMA QUEUED
  60 00 00 00 0c 71 42 00  23d+07:52:12.603  READ FPDMA QUEUED
  60 00 00 00 0b 71 42 00  23d+07:52:12.602  READ FPDMA QUEUED
  60 00 00 00 0a 71 42 00  23d+07:52:12.602  READ FPDMA QUEUED
  60 00 00 00 08 71 42 00  23d+07:52:12.601  READ FPDMA QUEUED

Error 9563 occurred at disk power-on lifetime: 23397 hours (974 days + 21 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 ff ff ff 4f 00  23d+07:42:31.757  READ FPDMA QUEUED
  ef 10 02 00 00 00 a0 00  23d+07:42:31.754  SET FEATURES [Reserved for Serial ATA]
  27 00 00 00 00 00 e0 00  23d+07:42:31.754  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 00  23d+07:42:31.754  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00  23d+07:42:31.754  SET FEATURES [Set transfer mode]

Error 9562 occurred at disk power-on lifetime: 23397 hours (974 days + 21 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 ff ff ff 4f 00  23d+07:42:28.937  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00  23d+07:42:28.921  READ FPDMA QUEUED
  60 00 08 18 28 71 42 00  23d+07:42:28.911  READ FPDMA QUEUED
  60 00 08 80 07 71 42 00  23d+07:42:28.911  READ FPDMA QUEUED
  60 00 00 e0 0b 71 42 00  23d+07:42:28.911  READ FPDMA QUEUED

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     23464         -
# 2  Short offline       Completed without error       00%     10512         -
# 3  Short offline       Completed without error       00%     10504         -
# 4  Short offline       Completed without error       00%     10504         -
# 5  Short offline       Completed without error       00%        26         -
# 6  Short offline       Completed without error       00%        22         -
# 7  Short offline       Completed without error       00%        22         -
# 8  Short offline       Completed without error       00%        21         -
# 9  Short offline       Completed without error       00%        21         -
#10  Short offline       Completed without error       00%        13         -
#11  Short offline       Completed without error       00%        13         -
#12  Short offline       Completed without error       00%        11         -
#13  Short offline       Completed without error       00%         1         -
#14  Short offline       Completed without error       00%         0         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

mac
29.06.2015, 12.32
smartctl -t short /dev/sda
poi lo lasci 2/3minuti e guardi i risultati con
smartctl -a /dev/sda

PFM
29.06.2015, 12.04
Non ricordo il comando per lo short test, mi puoi aiutare per favore? Ti ringrazio nuovamente per la tua disponibilità!

mac
29.06.2015, 11.37
Direi proprio di sì.
Prova a fare un test "short" e vedi se lo completa o si ferma per errori...

PFM
29.06.2015, 11.28
Questo è il test S.M.A.R.T. per il disco SDA1, che te ne pare?

Da quello che mi sembra di intuire, forse ci sono dei settori danneggiati in alcune locazioni indicate nel log...

Codice:
smartctl -a /dev/sda
smartctl 5.43 2012-06-30 r3573 [x86_64-linux-3.10.23mio-std-ipv6-64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda (SATA 3Gb/s, 4K Sectors)
Device Model:     ST2000DM001-9YN164
Serial Number:    Z1E1E2M3
LU WWN Device Id: 5 000c50 04e53c186
Firmware Version: CC4H
User Capacity:    2,000,398,934,016 bytes [2,00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Mon Jun 29 12:25:52 2015 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  575) seconds.
Offline data collection
capabilities:                    (0x73) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 222) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x3085) SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   117   084   006    Pre-fail  Always       -       140355672
  3 Spin_Up_Time            0x0003   095   094   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       47
  5 Reallocated_Sector_Ct   0x0033   091   091   036    Pre-fail  Always       -       12808
  7 Seek_Error_Rate         0x000f   076   058   030    Pre-fail  Always       -       73806220717
  9 Power_On_Hours          0x0032   074   074   000    Old_age   Always       -       23461
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       47
183 Runtime_Bad_Block       0x0032   097   097   000    Old_age   Always       -       3
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   001   001   000    Old_age   Always       -       9566
188 Command_Timeout         0x0032   100   098   000    Old_age   Always       -       12885164037
189 High_Fly_Writes         0x003a   098   098   000    Old_age   Always       -       2
190 Airflow_Temperature_Cel 0x0022   065   055   045    Old_age   Always       -       35 (Min/Max 31/37)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       46
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       182
194 Temperature_Celsius     0x0022   035   045   000    Old_age   Always       -       35 (0 20 0 0 0)
197 Current_Pending_Sector  0x0012   100   026   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   026   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       264496971012443
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       58451333989161
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       213181693395278

SMART Error Log Version: 1
ATA Error Count: 9566 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 9566 occurred at disk power-on lifetime: 23397 hours (974 days + 21 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 58 15 71 02  Error: UNC at LBA = 0x02711558 = 40965464

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 58 15 71 42 00  23d+07:52:18.486  READ FPDMA QUEUED
  ef 10 02 00 00 00 a0 00  23d+07:52:18.486  SET FEATURES [Reserved for Serial ATA]
  27 00 00 00 00 00 e0 00  23d+07:52:18.486  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 00  23d+07:52:18.486  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00  23d+07:52:18.486  SET FEATURES [Set transfer mode]

Error 9565 occurred at disk power-on lifetime: 23397 hours (974 days + 21 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 58 15 71 02  Error: UNC at LBA = 0x02711558 = 40965464

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 58 15 71 42 00  23d+07:52:15.536  READ FPDMA QUEUED
  60 00 00 00 16 71 42 00  23d+07:52:15.536  READ FPDMA QUEUED
  ef 10 02 00 00 00 a0 00  23d+07:52:15.535  SET FEATURES [Reserved for Serial ATA]
  27 00 00 00 00 00 e0 00  23d+07:52:15.535  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 00  23d+07:52:15.535  IDENTIFY DEVICE

Error 9564 occurred at disk power-on lifetime: 23397 hours (974 days + 21 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 58 15 71 02  Error: UNC at LBA = 0x02711558 = 40965464

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 00 00 0d 71 42 00  23d+07:52:12.604  READ FPDMA QUEUED
  60 00 00 00 0c 71 42 00  23d+07:52:12.603  READ FPDMA QUEUED
  60 00 00 00 0b 71 42 00  23d+07:52:12.602  READ FPDMA QUEUED
  60 00 00 00 0a 71 42 00  23d+07:52:12.602  READ FPDMA QUEUED
  60 00 00 00 08 71 42 00  23d+07:52:12.601  READ FPDMA QUEUED

Error 9563 occurred at disk power-on lifetime: 23397 hours (974 days + 21 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 ff ff ff 4f 00  23d+07:42:31.757  READ FPDMA QUEUED
  ef 10 02 00 00 00 a0 00  23d+07:42:31.754  SET FEATURES [Reserved for Serial ATA]
  27 00 00 00 00 00 e0 00  23d+07:42:31.754  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 00  23d+07:42:31.754  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00  23d+07:42:31.754  SET FEATURES [Set transfer mode]

Error 9562 occurred at disk power-on lifetime: 23397 hours (974 days + 21 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 ff ff ff 4f 00  23d+07:42:28.937  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00  23d+07:42:28.921  READ FPDMA QUEUED
  60 00 08 18 28 71 42 00  23d+07:42:28.911  READ FPDMA QUEUED
  60 00 08 80 07 71 42 00  23d+07:42:28.911  READ FPDMA QUEUED
  60 00 00 e0 0b 71 42 00  23d+07:42:28.911  READ FPDMA QUEUED

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     10512         -
# 2  Short offline       Completed without error       00%     10504         -
# 3  Short offline       Completed without error       00%     10504         -
# 4  Short offline       Completed without error       00%        26         -
# 5  Short offline       Completed without error       00%        22         -
# 6  Short offline       Completed without error       00%        22         -
# 7  Short offline       Completed without error       00%        21         -
# 8  Short offline       Completed without error       00%        21         -
# 9  Short offline       Completed without error       00%        13         -
#10  Short offline       Completed without error       00%        13         -
#11  Short offline       Completed without error       00%        11         -
#12  Short offline       Completed without error       00%         1         -
#13  Short offline       Completed without error       00%         0         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

mac
29.06.2015, 11.04
Sì, i comandi sono corretti.

Dal rescue pro (ma puoi farlo anche dal tuo server senza riavviare), puoi vedere lo stato S.M.A.R.T. del disco (ci mette pochissimo).
Ed eventualmente controllare il filesystem (da fare con cautela).
Mi pare di capire dai precedenti post, che hai già visto il report di smartctl -a /dev/sda però, mi sembra di ricordare che OVH "voglia" (o comunque preferisca) i risultati della loro diagnostica... Io la farei comunque.
poi mandi tutti i log con gli errori che hai attraverso un ticket e senti cosa ti dicono.

PFM
29.06.2015, 10.12
Grazie per la tua risposta MAC, quindi per scollegare il disco SDA dal RAID-1 dovrei dare i comandi "inversi":

Codice:
# mdadm --manage /dev/md2 --remove /dev/sda2 
# mdadm --manage /dev/md3 --remove /dev/sda3
Quindi andare in RescuePro ed eseguire il test dei singoli HDD?

Avevo provate il rescuepro pochi giorni fa, perché mi era arrivato un messaggio di RAID-1 degraded e la macchina si era riavviata in RescuePro da sola.

Però al test dei dischi (che in verità durava pochissimi secondi) non venivano rilevati errori... certo, il test durava veramente una frazione di secondo, e non so se è normale questa cosa...

mac
28.06.2015, 14.25
Concordo sulla possibilità di un problema HW.
Toglilo dal raid, fai un backup di tutto quello che non puoi perdere, riavvia in rescue, fai un check del disco, annotati il numero di serie ed eventualmente apri un ticket indicando il numero di serie del disco.

PFM
28.06.2015, 11.43
Buongiorno,

l'altro ieri ho notato che uno dei due dischi SATA 2TB del mio server dedicato SoYouStart, precisamente il disco "sda" si era "sganciato" dal RAID-1 software.

Il RAID software risultava "degraded".

Interrogando il disco con il comando:

smartctl -a /dev/sda1

ho notato che ha un numero elevatissimo di errori, per la precisione 9566 [Error 9566 occurred at disk power-on lifetime: 23397 hours (974 days + 21 hours)]

Così ho aggiunto nuovamente il disco "sda" al RAID con i comandi:

Codice:
mdadm --manage /dev/md2 --add /dev/sda2 
mdadm --manage /dev/md3 --add /dev/sda3
Il sistema ha effettuato il resync delle due partizioni in una decina di ore e si è concluso correttamente, ottenendo alla verifica con

cat /proc/mdstat

Codice:
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] [faulty]
md2 : active raid1 sda2[0] sdb2[1]
      20478912 blocks [2/2] [UU]

md3 : active raid1 sda3[0] sdb3[1]
      1932506048 blocks [2/2] [UU]
Da questa notte, però, è iniziato il "check" è va LENTISSIMO e non capisco se si tratta di una cosa "normale" oppure se uno dei due Hard Disk in mirroring, verosimilmente il disco "sda", ha problemi di natura hardware.

Questi i risultati con il comando cat /proc/mdstat:

Codice:
# cat /proc/mdstat

Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] [faulty]
md2 : active raid1 sda2[0] sdb2[1]
      20478912 blocks [2/2] [UU]

md3 : active raid1 sda3[0] sdb3[1]
      1932506048 blocks [2/2] [UU]
      [>....................]  check =  0.6% (12137920/1932506048) finish=210309.2min speed=151K/sec
Qualche suggerimento?

Grazie.