Page 2 of 2 FirstFirst 12
Results 11 to 18 of 18

Thread: Fix bad blocks

  1. #11
    Quote Originally Posted by sunflashx View Post
    Samsung's advance RMA system is great. They'll ship you a new drive, you can yank one drive and immediately start your rebuild. Cost me $12 or so to ship the old drive back.
    And now that they're Seagate, it appears that the advanced RMA option isn't available, at least for the Samsung HD204UI. Anyone know of a way to do this post merger?

  2. #12
    Junior Member
    Join Date
    Jul 2012
    Location
    South France
    Posts
    12
    Hello,

    Sorry if i burry out this topic, but here's what i tried to manage bad blocks:

    Package sg3_utils ins't included in freenas by default (indeed very strange as it's kinda useful for a storage server) so i manually installed it:

    Code:
    # mount -uw /
    # mkdir /root/sg3_utils
    # cd /root/sg3_utils
    # wget http://ftp2.freebsd.org/pub/FreeBSD/ports/amd64/packages-8.2-release/sysutils/sg3_utils-1.28.tbz
    # pkg_add sg3_utils-1.28.tbz
    # mount -ur /
    Then i tried to check my bad blocks listed
    Code:
    # smartctl -l selftest /dev/ada1
    
    === START OF READ SMART DATA SECTION ===
    SMART Self-test log structure revision number 1
    Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
    # 1  Extended offline    Completed: read failure       30%     10685         2903215976
    
    # /usr/local/bin/sg_verify --lba=2903215976 /dev/ada1
    
    verify (10): transport: (pass2:ahcich2:0:0:0): VERIFY(10). CDB: 2f 0 ad b 8f 68 0 0 1 0
    (pass2:ahcich2:0:0:0): CAM status: CCB request was invalid
    
    Verify(10) failed near lba=2903215976 [0xad0b8f68]
    Now i'm stuck here... It seems that my RAID controller (IBM ServeRAID C100) can't speak with CAM framework... Or my drive (WD2003FYYS-02W0B0) can't speak SCSI.
    Does someone have any clues ?

    Thanks.

  3. #13
    Junior Member
    Join Date
    Jul 2012
    Location
    South France
    Posts
    12
    Finally, Western Digital RE3 /RE4 drives do not simply speak SCSI.

    I've tried on my home NAS having a RE3 drive, and i got the same result:
    Code:
    [root@freenas] ~# sg_verify --lba=10340032 /dev/ada0
    verify (10): transport: (pass0:ahcich0:0:0:0): VERIFY(10). CDB: 2f 0 0 9d c6 c0
    (pass0:ahcich0:0:0:0): CAM status: CCB request was invalid
    
    Verify(10) failed near lba=10340032 [0x9dc6c0]
    Only solution is what ? Removing the disk from zpool, fill it with zeros until HDD firmware finds out that the sector is not writable, remaps it, and then attach the disk to the zpool again and resilver ?

  4. #14
    Junior Member
    Join Date
    Jul 2012
    Location
    South France
    Posts
    12
    Okay... new round.

    I've played around with dd and think got success:

    These are some lines of my smartctl -a /dev/ada1 output before:

    Code:
    ...
      5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
    ...
    196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       1
    
    ...
    # 2  Extended offline    Completed: read failure       30%     10899         2903215976
    ...
    I actually disabled disk geometry protection and then zerofilled the sector with dd:
    Code:
    # sysctl kern.geom.debugflags=0x10
    # dd bs=512 seek=2903215976 if=/dev/zero of=/dev/ada1 count=1
    # sysctl kern.geom.debugflags=0x0
    Now my smartctl output says
    Code:
    ...
      5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
    ...
    196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
    ...
    I'm making a long selftest right now to be sure.
    Hopefully this helps someone.

    If you have SATA drives, use the dd technique.
    If you have SAS drives, install the sg3_utils package i stated in this topic and follow this guide http://smartmontools.sourceforge.net...khowto.html#bb

    Cheers.

  5. #15
    Senior Member paleoN's Avatar
    Join Date
    Apr 2012
    Posts
    1,087
    Quote Originally Posted by deajan View Post
    I actually disabled disk geometry protection and then zerofilled the sector with dd:
    Correct me if I'm wrong, but I don't believe it's necessary to disable geom unless the bad block was in one of the GPT labels. It's not clear what your steps were, but you would want to offline the disk before dd'ing it and online it afterwards.
    --
    FreeNAS-8.3.1-RELEASE-p2-x64 | SilverStone SST-KL04B | ASUS F1A75-V Pro | 9301 CT NIC
    AMD A6-3500 Llano CPU | 8GB DDR3 RAM | 4 x Seagate ST
    2000DM001 2TB (striped mirrors)

  6. #16
    Junior Member
    Join Date
    Jul 2012
    Location
    South France
    Posts
    12
    I've actually tried putting offline the disk, even exporting the zpool didn't the trick.
    As long as i did not change the sysctl parameter i suggested, everytime i tried dd i ended with:
    Code:
    dd: /dev/ada1: Operation not permitted
    I might be wrong too (i'm not a BSD expert at all), but i think kern.geom.debugflags provides protection against "raw" writing to disk with tools like fdisk / gdisk or in my case dd.
    Last edited by deajan; 09-20-2012 at 09:23 AM.

  7. #17
    Senior Member paleoN's Avatar
    Join Date
    Apr 2012
    Posts
    1,087
    Quote Originally Posted by deajan View Post
    I've actually tried putting offline the disk, even exporting the zpool didn't the trick.
    You would offline the disk when you were working on it. Then when you online the disk it would resilver if needed. Unless you destroyed the partitions, geom would still protect the disk.

    Quote Originally Posted by deajan View Post
    I might be wrong too (i'm not a BSD expert at all), but i think kern.geom.debugflags provides protection against "raw" writing to disk with tools like fdisk / gdisk or in my case dd.
    Ah, I see now. Thanks.
    --
    FreeNAS-8.3.1-RELEASE-p2-x64 | SilverStone SST-KL04B | ASUS F1A75-V Pro | 9301 CT NIC
    AMD A6-3500 Llano CPU | 8GB DDR3 RAM | 4 x Seagate ST
    2000DM001 2TB (striped mirrors)

  8. #18
    This was a good post and very helpful. I have a total of 12 of these drives, one which suddenly got the click of death one day and 11 more that are showing pending sector reallocation counts. I doubt they are all bad so I'm working on trying to repair the drives one at a time by taking them offline and am currently in the process of running the long test while they are still in the server and since the server is always online this means I don't have to keep another machine turned on to repair them.
    So far I'm running the "smartctl --test=long /dev/ada0" and it reported "Please wait 347 minutes for test to complete"
    Again thank you for this post and in information contained within. Newbs like me appreciate it.
    SuperMicro X7DBN w/MV-SATA8 PCI-X card. 12 2TB Samsung F4's in a 2 pools of 6 drives configuration, 12GB of FB DIMMs, dual 3.2Ghz dual core Xeons MV's with hyper-threading, dual 800 watt hot swap power supplies.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •