Hello,
Sorry if i burry out this topic, but here's what i tried to manage bad blocks:
Package sg3_utils ins't included in freenas by default (indeed very strange as it's kinda useful for a storage server) so i manually installed it:
Then i tried to check my bad blocks listedCode:# mount -uw / # mkdir /root/sg3_utils # cd /root/sg3_utils # wget http://ftp2.freebsd.org/pub/FreeBSD/ports/amd64/packages-8.2-release/sysutils/sg3_utils-1.28.tbz # pkg_add sg3_utils-1.28.tbz # mount -ur /
Now i'm stuck here... It seems that my RAID controller (IBM ServeRAID C100) can't speak with CAM framework... Or my drive (WD2003FYYS-02W0B0) can't speak SCSI.Code:# smartctl -l selftest /dev/ada1 === START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 30% 10685 2903215976 # /usr/local/bin/sg_verify --lba=2903215976 /dev/ada1 verify (10): transport: (pass2:ahcich2:0:0:0): VERIFY(10). CDB: 2f 0 ad b 8f 68 0 0 1 0 (pass2:ahcich2:0:0:0): CAM status: CCB request was invalid Verify(10) failed near lba=2903215976 [0xad0b8f68]
Does someone have any clues ?
Thanks.
Finally, Western Digital RE3 /RE4 drives do not simply speak SCSI.
I've tried on my home NAS having a RE3 drive, and i got the same result:
Only solution is what ? Removing the disk from zpool, fill it with zeros until HDD firmware finds out that the sector is not writable, remaps it, and then attach the disk to the zpool again and resilver ?Code:[root@freenas] ~# sg_verify --lba=10340032 /dev/ada0 verify (10): transport: (pass0:ahcich0:0:0:0): VERIFY(10). CDB: 2f 0 0 9d c6 c0 (pass0:ahcich0:0:0:0): CAM status: CCB request was invalid Verify(10) failed near lba=10340032 [0x9dc6c0]
Okay... new round.
I've played around with dd and think got success:
These are some lines of my smartctl -a /dev/ada1 output before:
I actually disabled disk geometry protection and then zerofilled the sector with dd:Code:... 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 ... 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 1 ... # 2 Extended offline Completed: read failure 30% 10899 2903215976 ...
Now my smartctl output saysCode:# sysctl kern.geom.debugflags=0x10 # dd bs=512 seek=2903215976 if=/dev/zero of=/dev/ada1 count=1 # sysctl kern.geom.debugflags=0x0
I'm making a long selftest right now to be sure.Code:... 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 ... 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 ...
Hopefully this helps someone.
If you have SATA drives, use the dd technique.
If you have SAS drives, install the sg3_utils package i stated in this topic and follow this guide http://smartmontools.sourceforge.net...khowto.html#bb
Cheers.
--
FreeNAS-8.3.1-RELEASE-p2-x64 | SilverStone SST-KL04B | ASUS F1A75-V Pro | 9301 CT NIC
AMD A6-3500 Llano CPU | 8GB DDR3 RAM | 4 x Seagate ST2000DM001 2TB (striped mirrors)
I've actually tried putting offline the disk, even exporting the zpool didn't the trick.
As long as i did not change the sysctl parameter i suggested, everytime i tried dd i ended with:
I might be wrong too (i'm not a BSD expert at all), but i think kern.geom.debugflags provides protection against "raw" writing to disk with tools like fdisk / gdisk or in my case dd.Code:dd: /dev/ada1: Operation not permitted
Last edited by deajan; 09-20-2012 at 09:23 AM.
--
FreeNAS-8.3.1-RELEASE-p2-x64 | SilverStone SST-KL04B | ASUS F1A75-V Pro | 9301 CT NIC
AMD A6-3500 Llano CPU | 8GB DDR3 RAM | 4 x Seagate ST2000DM001 2TB (striped mirrors)
This was a good post and very helpful. I have a total of 12 of these drives, one which suddenly got the click of death one day and 11 more that are showing pending sector reallocation counts. I doubt they are all bad so I'm working on trying to repair the drives one at a time by taking them offline and am currently in the process of running the long test while they are still in the server and since the server is always online this means I don't have to keep another machine turned on to repair them.
So far I'm running the "smartctl --test=long /dev/ada0" and it reported "Please wait 347 minutes for test to complete"
Again thank you for this post and in information contained within. Newbs like me appreciate it.
SuperMicro X7DBN w/MV-SATA8 PCI-X card. 12 2TB Samsung F4's in a 2 pools of 6 drives configuration, 12GB of FB DIMMs, dual 3.2Ghz dual core Xeons MV's with hyper-threading, dual 800 watt hot swap power supplies.