AHCI timeouts

Discussion in 'Bug Reporting' started by Durkatlon, Aug 27, 2011.

  1. Offline

    Durkatlon FreeNAS Aware

    Member Since:
    Aug 19, 2011
    Messages:
    322
    Message Count:
    322
    Likes Received:
    3
    Trophy Points:
    18
    Location:
    San Diego, CA
    Durkatlon, Aug 27, 2011

    I've tried to find information about this and there appear to be similar issues with other FreeBSD based configs, but haven't found anything specific. Basically I'm getting time out errors on the component disks of my ZFS mirror.

    The timeouts are sometimes forever, and sometimes the drive comes back eventually. If it doesn't come back, a reboot is the only recourse. I have an identical motherboard in use under FreeNAS .7 without these problems, but there are different drives in that one (Western Digital as opposed to Seagate)

    The problem system is as follows:

    Motherboard: Via EPIA SN10000G (C7 processor 1GHz)
    Memory: 4GB (2x Kingston KVR677D2/2GR)
    Disks: 2x 1.5TB Seagate. ZFS mirror.
    FreeNAS8.0.1B4

    Below I've pasted the output of "camcontrol identify" for one of the disks, my dmesg output and my loader.conf.

    camcontrol identify
    Code (text):
    1.  
    2. pass0: <ST1500DL003-9VT16L CC32> ATA-8 SATA 3.x device
    3. pass0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
    4.  
    5. protocol              ATA/ATAPI-8 SATA 3.x
    6. device model          ST1500DL003-9VT16L
    7. firmware revision     CC32
    8. serial number         5YD1VX7B
    9. WWN                   5000c5002f93cc0f
    10. cylinders             16383
    11. heads                 16
    12. sectors/track         63
    13. sector size           logical 512, physical 512, offset 0
    14. LBA supported         268435455 sectors
    15. LBA48 supported       2930277168 sectors
    16. PIO supported         PIO4
    17. DMA supported         WDMA2 UDMA6
    18. media RPM             5900
    19.  
    20. Feature                      Support  Enabled   Value           Vendor
    21. read ahead                     yes  yes
    22. write cache                    yes  yes
    23. flush cache                    yes  yes
    24. overlap                        no
    25. Tagged Command Queuing (TCQ)   no   no
    26. Native Command Queuing (NCQ)   yes      32 tags
    27. SMART                          yes  yes
    28. microcode download             yes  yes
    29. security                       yes  no
    30. power management               yes  yes
    31. advanced power management      no   no
    32. automatic acoustic management  yes  yes 0/0x00  254/0xFE
    33. media status notification      no   no
    34. power-up in Standby            no   no
    35. write-read-verify              yes  no  0/0x0
    36. unload                         no   no
    37. free-fall                      no   no
    38. data set management (TRIM)     no
    39.  
    dmesg output, with some example timeout at the end:
    Code (text):
    1.  
    2.  
    3. Copyright (c) 1992-2011 The FreeBSD Project.
    4. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
    5.     The Regents of the University of California. All rights reserved.
    6. FreeBSD is a registered trademark of The FreeBSD Foundation.
    7. FreeBSD 8.2-RELEASE-p2 #0: Wed Jul 13 06:28:22 PDT 2011
    8.     jpaetzel@servant.iXsystems.com:/b/home/jpaetzel/sf_freenas_build/obj.i386/i386/b/home/jpaetzel/sf_freenas_build/FreeBSD/src/sys/FREENAS.i386 i386
    9. Timecounter "i8254" frequency 1193182 Hz quality 0
    10. CPU: VIA C7 Processor 1000MHz (1009.88-MHz 686-class CPU)
    11.   Origin = "CentaurHauls"  Id = 0x6d0  Family = 6  Model = d  Stepping = 0
    12.   Features=0xa7c9bbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,CMOV,PAT,CLFLUSH,ACPI,MMX,FXSR,SSE,SSE2,TM,PBE>
    13.   Features2=0x4181<SSE3,EST,TM2,xTPR>
    14.   VIA Padlock Features=0xffcc<RNG,AES,AES-CTR,SHA1,SHA256,RSA>
    15. real memory  = 4294967296 (4096 MB)
    16. avail memory = 3403685888 (3246 MB)
    17. ACPI APIC Table: <022708 APIC1721>
    18. ioapic0 <Version 0.3> irqs 0-23 on motherboard
    19. ioapic1 <Version 0.3> irqs 24-47 on motherboard
    20. kbd1 at kbdmux0
    21. netsmb_dev: loaded
    22. cryptosoft0: <software crypto> on motherboard
    23. acpi0: <022708 RSDT1721> on motherboard
    24. acpi0: [ITHREAD]
    25. acpi0: Power Button (fixed)
    26. acpi0: reservation of fec00000, 1000 (3) failed
    27. acpi0: reservation of fee00000, 1000 (3) failed
    28. acpi0: reservation of 0, a0000 (3) failed
    29. acpi0: reservation of 100000, cfe00000 (3) failed
    30. Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
    31. acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0
    32. cpu0: <ACPI CPU> on acpi0
    33. pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
    34. pci0: <ACPI PCI bus> on pcib0
    35. agp0: <VIA 3364 (P4M900) host to PCI bridge> on hostb0
    36. agp0: aperture size is 128M
    37. pcib1: <ACPI PCI-PCI bridge> at device 1.0 on pci0
    38. pci1: <ACPI PCI bus> on pcib1
    39. vgapci0: <VGA-compatible display> mem 0xd8000000-0xdfffffff,0xfd000000-0xfdffffff irq 16 at device 0.0 on pci1
    40. pcib2: <ACPI PCI-PCI bridge> irq 27 at device 2.0 on pci0
    41. pci2: <ACPI PCI bus> on pcib2
    42. pcib3: <ACPI PCI-PCI bridge> irq 31 at device 3.0 on pci0
    43. pci3: <ACPI PCI bus> on pcib3
    44. vge0: <VIA Networking Velocity Gigabit Ethernet> port 0xe800-0xe8ff mem 0xfeaffc00-0xfeaffcff irq 28 at device 0.0 on pci3
    45. miibus0: <MII bus> on vge0
    46. ip1000phy0: <IC Plus IP1001 10/100/1000 media interface> PHY 1 on miibus0
    47. ip1000phy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto
    48. vge0: Ethernet address: 00:40:63:f4:95:ee
    49. vge0: [ITHREAD]
    50. ahci0: <VIA VT8251 AHCI SATA controller> port 0xdc00-0xdc07,0xd880-0xd883,0xd800-0xd807,0xd480-0xd483,0xd400-0xd40f mem 0xfcfffc00-0xfcffffff irq 21 at device 15.0 on pci0
    51. ahci0: [ITHREAD]
    52. ahci0: AHCI v1.00 with 4 3Gbps ports, Port Multiplier not supported
    53. ahcich0: <AHCI channel> at channel 0 on ahci0
    54. ahcich0: [ITHREAD]
    55. ahcich1: <AHCI channel> at channel 1 on ahci0
    56. ahcich1: [ITHREAD]
    57. ahcich2: <AHCI channel> at channel 2 on ahci0
    58. ahcich2: [ITHREAD]
    59. ahcich3: <AHCI channel> at channel 3 on ahci0
    60. ahcich3: [ITHREAD]
    61. atapci0: <VIA 8251 UDMA133 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xfc00-0xfc0f at device 15.1 on pci0
    62. ata0: <ATA channel 0> on atapci0
    63. ata0: [ITHREAD]
    64. ata1: <ATA channel 1> on atapci0
    65. ata1: [ITHREAD]
    66. uhci0: <VIA 83C572 USB controller> port 0xcc00-0xcc1f irq 20 at device 16.0 on pci0
    67. uhci0: [ITHREAD]
    68. usbus0: <VIA 83C572 USB controller> on uhci0
    69. uhci1: <VIA 83C572 USB controller> port 0xd000-0xd01f irq 22 at device 16.1 on pci0
    70. uhci1: [ITHREAD]
    71. usbus1: <VIA 83C572 USB controller> on uhci1
    72. uhci2: <VIA 83C572 USB controller> port 0xd080-0xd09f irq 21 at device 16.2 on pci0
    73. uhci2: [ITHREAD]
    74. usbus2: <VIA 83C572 USB controller> on uhci2
    75. ehci0: <VIA VT6202 USB 2.0 controller> mem 0xfcfff800-0xfcfff8ff irq 22 at device 16.4 on pci0
    76. ehci0: [ITHREAD]
    77. usbus3: EHCI version 1.0
    78. usbus3: <VIA VT6202 USB 2.0 controller> on ehci0
    79. isab0: <PCI-ISA bridge> at device 17.0 on pci0
    80. isa0: <ISA bus> on isab0
    81. vr0: <VIA VT6102 Rhine II 10/100BaseTX> port 0xc800-0xc8ff mem 0xfcfff400-0xfcfff4ff irq 23 at device 18.0 on pci0
    82. vr0: Quirks: 0x0
    83. vr0: Revision: 0x7c
    84. miibus1: <MII bus> on vr0
    85. ukphy0: <Generic IEEE 802.3u media interface> PHY 1 on miibus1
    86. ukphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
    87. vr0: Ethernet address: 00:40:63:f4:95:ed
    88. vr0: [ITHREAD]
    89. acpi_button0: <Sleep Button> on acpi0
    90. acpi_button1: <Power Button> on acpi0
    91. pcib4: <ACPI Host-PCI bridge> on acpi0
    92. pci128: <ACPI PCI bus> on pcib4
    93. pcib5: <PCI-PCI bridge> at device 0.0 on pci128
    94. pci130: <PCI bus> on pcib5
    95. pcib6: <PCI-PCI bridge> at device 0.1 on pci128
    96. pci129: <PCI bus> on pcib6
    97. pci128: <multimedia, HDA> at device 1.0 (no driver attached)
    98. atrtc0: <AT realtime clock> port 0x70-0x71 irq 8 on acpi0
    99. atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
    100. atkbd0: <AT Keyboard> irq 1 on atkbdc0
    101. kbd0 at atkbd0
    102. atkbd0: [GIANT-LOCKED]
    103. atkbd0: [ITHREAD]
    104. uart0: <16550 or compatible> port 0x3f8-0x3ff irq 3 flags 0x10 on acpi0
    105. uart0: [FILTER]
    106. uart1: <16550 or compatible> port 0x2f8-0x2ff irq 3 on acpi0
    107. uart1: [FILTER]
    108. pmtimer0 on isa0
    109. orm0: <ISA Option ROM> at iomem 0xce000-0xcefff pnpid ORM0000 on isa0
    110. sc0: <System console> at flags 0x100 on isa0
    111. sc0: VGA <16 virtual consoles, flags=0x300>
    112. vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
    113. ppc0: parallel port not found.
    114. est0: <Enhanced SpeedStep Frequency Control> on cpu0
    115. p4tcc0: <CPU Frequency Thermal Control> on cpu0
    116. Timecounter "TSC" frequency 1009877267 Hz quality 800
    117. Timecounters tick every 1.000 msec
    118. usbus0: 12Mbps Full Speed USB v1.0
    119. usbus1: 12Mbps Full Speed USB v1.0
    120. usbus2: 12Mbps Full Speed USB v1.0
    121. usbus3: 480Mbps High Speed USB v2.0
    122. ugen0.1: <VIA> at usbus0
    123. uhub0: <VIA UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0
    124. ugen1.1: <VIA> at usbus1
    125. uhub1: <VIA UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus1
    126. ugen2.1: <VIA> at usbus2
    127. uhub2: <VIA UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus2
    128. ugen3.1: <VIA> at usbus3
    129. uhub3: <VIA EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus3
    130. uhub0: 2 ports with 2 removable, self powered
    131. uhub1: 2 ports with 2 removable, self powered
    132. uhub2: 2 ports with 2 removable, self powered
    133. uhub3: 6 ports with 6 removable, self powered
    134. ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
    135. ada0: <ST1500DL003-9VT16L CC32> ATA-8 SATA 3.x device
    136. ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
    137. ada0: 1430799MB (2930277168 512 byte sectors: 16H 63S/T 16383C)
    138. ada1 at ahcich2 bus 0 scbus2 target 0 lun 0
    139. ada1: <ST1500DL003-9VT16L CC32> ATA-8 SATA 3.x device
    140. ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
    141. ada1: 1430799MB (2930277168 512 byte sectors: 16H 63S/T 16383C)
    142. ada2 at ata0 bus 0 scbus4 target 0 lun 0
    143. ada2: <CF Card Ver6.04> ATA-5 device
    144. ada2: 133.000MB/s transfers (UDMA6, PIO 512bytes)
    145. ada2: 3811MB (7806960 512 byte sectors: 16H 63S/T 7745C)
    146. Trying to mount root from ufs:/dev/ufs/FreeNASs1a
    147. ZFS filesystem version 4
    148. ZFS storage pool version 15
    149. vge0: link state changed to UP
    150. fuse4bsd: version 0.3.9-pre1, FUSE ABI 7.8
    151. ahcich2: Timeout on slot 24 port 0
    152. ahcich2: is 00000000 cs 01000000 ss 00000000 rs 01000000 tfd 50 serr 00000000
    153. ahcich2: AHCI reset: device not ready after 31000ms (tfd = 00000080)
    154. ahcich2: Timeout on slot 24 port 0
    155. ahcich2: is 00000000 cs 01000000 ss 00000000 rs 01000000 tfd 80 serr 001a0000
    156. ahcich0: Timeout on slot 22 port 0
    157. ahcich0: is 00000000 cs 00400000 ss 00000000 rs 00400000 tfd 50 serr 00000000
    158.  
  2. Offline

    Durkatlon FreeNAS Aware

    Member Since:
    Aug 19, 2011
    Messages:
    322
    Message Count:
    322
    Likes Received:
    3
    Trophy Points:
    18
    Location:
    San Diego, CA
    Durkatlon, Aug 27, 2011

    Went over 10,000 char limit. Here is the loader.conf:

    Code (text):
    1.  
    2. #
    3. # Boot loader file for FreeNAS.  This relies on a hacked beastie.4th.
    4. #
    5. autoboot_delay="2"
    6. loader_logo="freenas"
    7. #Fix booting from USB device bug
    8. kern.cam.boot_delay=10000
    9.  
    10. # GEOM support
    11. geom_mirror_load="YES"
    12. geom_stripe_load="YES"
    13. geom_raid3_load="YES"
    14. #geom_raid5_load="YES"
    15. geom_gate_load="YES"
    16. ntfs_load="YES"
    17. smbfs_load="YES"
    18.  
    19. hw.hptrr.attach_generic=0
    20.  
    21. # Customization for buckyball
    22. vm.kmem_size="512M"
    23. vm.kmem_size_max="768M"
    24. vfs.zfs.prefetch_disable=1
    25. vfs.zfs.arc_max="40M"
    26. vfs.zfs.vdev.cache.size="5M"
    27.  
  3. Offline

    Durkatlon FreeNAS Aware

    Member Since:
    Aug 19, 2011
    Messages:
    322
    Message Count:
    322
    Likes Received:
    3
    Trophy Points:
    18
    Location:
    San Diego, CA
    Durkatlon, Aug 28, 2011

    2 updates.

    First I tried to change the SATA Controllers mode from AHCI to IDE. This changed the errors from AHCI timeout to "Timeout waiting for write DRQ". This happens on both drives the first time they are accessed. It seemed like after that everything would be OK, but that's not quite true. If I reboot the NAS and then immediately start an rsync from a remote machine, there can be Input/Output errors.

    Second I replaced the Seagate Barracudas with Western Digital Caviar Green drives. Interestingly enough I got one EADS and one EARS drive because a Fry's open-box apparently contained the wrong kind of drive. I set the ZFS mirror up with the "Force 4096 byte sectors" checkbox turned on, so the EADS would also use the larger sector size. After replacing the drives I also set the SATA controller back to AHCI mode.

    I'm now copying everything back from a backup to the new drives, and so far so good. Haven't had any AHCI timeouts yet. We'll see what happens after I get all the data back on there and reboot it a million times and generally put it through its paces.
  4. Offline

    Tekkie

    Member Since:
    May 31, 2011
    Messages:
    332
    Message Count:
    332
    Likes Received:
    0
    Trophy Points:
    0
    Occupation:
    Software Development &amp; Technology Management,
    Location:
    Santa Monica
    Tekkie, Aug 30, 2011

    AHCI timeouts indicate that one of your drives is either dieing or dead, because its no longer responding to commands sent to it.

    You should also see messages about some pass driver not being installed etc.
  5. Offline

    Durkatlon FreeNAS Aware

    Member Since:
    Aug 19, 2011
    Messages:
    322
    Message Count:
    322
    Likes Received:
    3
    Trophy Points:
    18
    Location:
    San Diego, CA
    Durkatlon, Aug 30, 2011

    This is something different. I am getting these errors with 4 separate, identical, brand-new drives. Something about this particular drive firmware and the way it interacts with FreeBSD8. Could even be the combination of the drives and the SATA controller. I have another motherboard coming in a few days and I'll experiment further with these to see if it's the drives or the controller.
  6. Offline

    Durkatlon FreeNAS Aware

    Member Since:
    Aug 19, 2011
    Messages:
    322
    Message Count:
    322
    Likes Received:
    3
    Trophy Points:
    18
    Location:
    San Diego, CA
    Durkatlon, Sep 6, 2011

    One more update on this. I built a second FreeNAS8 system in which I put the same Seagate drives that were suffering time-outs on the first system with the Via EPIA board. The new system runs an Asus E35M1-I Deluxe (FreeNas8.0.1-B4-amd64). I was able to import the zpool from the old system and I have been running this system for about a week now without any time-outs.

    At this point I suspect the SATA controller on the Via EPIA board is the culprit. Switching to Western Digital drives did not ultimately help on that board. The time-outs started happening there as well after I got the drives fully loaded with data. I am actually in the process of getting another E35M1-I to replace the mobo in the problem NAS.

    Interesting in all this is that I have 2 more systems that each have Via EPIA boards in them (an SN10000 and an SN18000 respectively) that are both running FreeNAS 0.7.2 (Shere). These systems have been humming along for well over a year with absolutely no problems.

    There clearly is a problem between the SATA controller on the EPIA boards and FreeNAS 8 (and consequently FreeBSD 8.2), but the same problem does not seem to happen with the older baselines of FreeBSD.

    If this problem could be addressed it would be great, but I suspect if it can be addressed at all it will fall to the FreeBSD driver developers, not the FreeNAS team.
  7. Offline

    Tekkie

    Member Since:
    May 31, 2011
    Messages:
    332
    Message Count:
    332
    Likes Received:
    0
    Trophy Points:
    0
    Occupation:
    Software Development &amp; Technology Management,
    Location:
    Santa Monica
    Tekkie, Sep 7, 2011

    Could it be that the SATA controller on that mobo is just bad?
  8. Offline

    Durkatlon FreeNAS Aware

    Member Since:
    Aug 19, 2011
    Messages:
    322
    Message Count:
    322
    Likes Received:
    3
    Trophy Points:
    18
    Location:
    San Diego, CA
    Durkatlon, Sep 7, 2011

    It's possible, but I've had this board for a while and use it for building temporary systems running various flavors of Linux. I've never seen these issues there.

    Once I get the new E350 motherboard swapped in, I'll put FreeNAS 7 on this EPIA board and see if the problems persist or if they go away.
  9. Offline

    djoole FreeNAS Aware

    Member Since:
    Oct 3, 2011
    Messages:
    150
    Message Count:
    150
    Likes Received:
    6
    Trophy Points:
    18
    Location:
    France
    djoole, Oct 9, 2011

    I have the same problem :
    Code (text):
    1. Oct 10 00:17:51 nas kernel: ahcich3: Timeout on slot 19 port 0
    2. Oct 10 00:17:51 nas kernel: ahcich3: is 00000000 cs 10000000 ss 1ff80000 rs 1ff80000 tfd 40 serr 00000000
    HDDs Samsung EcoGreen F4 (HD204UI 2TB, with patched firmware i case you ask)
    Mobo Asus P8P67 R3.1 with Intel Intel Cougar Point AHCI SATA controller

    I hope i won't be having a lot of these timeouts/freezes!
  10. Offline

    Durkatlon FreeNAS Aware

    Member Since:
    Aug 19, 2011
    Messages:
    322
    Message Count:
    322
    Likes Received:
    3
    Trophy Points:
    18
    Location:
    San Diego, CA
    Durkatlon, Oct 9, 2011

    No good fixes for this problem unfortunately, djoole. I ended up abandoning that motherboard in favor of a different one, but continued to use the same drives etc. The problematic motherboard I have since used for other experimental setups with no problems. It appears to be an issue specifically with FreeBSD8.2-derived distributions. I use the same mobo with FreeNAS7 without any issues.
  11. Offline

    djoole FreeNAS Aware

    Member Since:
    Oct 3, 2011
    Messages:
    150
    Message Count:
    150
    Likes Received:
    6
    Trophy Points:
    18
    Location:
    France
    djoole, Oct 10, 2011

    Thanks for replying.
    I read somewhere else that problem doesn't occur with "old ahci driver".
    Do you know where is this old driver and how to install it and what will i loose compared to the actual one?

    I got another AHCI timeout today, this time on a seagate drive, leading to a complete freeze of the pool. And soft reboot didn't work, i had to hard reboot the computer.

    This won't be possible, a NAS simply need to be reliable, and now it's the opposite.
    I'm starting to regret my Sinology...
  12. Offline

    djoole FreeNAS Aware

    Member Since:
    Oct 3, 2011
    Messages:
    150
    Message Count:
    150
    Likes Received:
    6
    Trophy Points:
    18
    Location:
    France
    djoole, Oct 11, 2011

    Hi again.

    Now it's worse, even after a fresh hard reset, the minute i try to access my pool (even only for read), the ada0 seagate timeouts, and NAS is frozen.

    Now at the boot, smartd reports errors on the ada0 drive.

    I tried to put the disk in my PC and do a DOS Seatools diagnosis, but no error found.

    I think i have a problem of compatibility with the trio mobo/drive/freenas.

    Maybe using the old ahcpi driver would be a solution (as i read on other forums).

    But i don't have any clue on how to use the old ahcpi driver.

    Please help :(
  13. Offline

    Durkatlon FreeNAS Aware

    Member Since:
    Aug 19, 2011
    Messages:
    322
    Message Count:
    322
    Likes Received:
    3
    Trophy Points:
    18
    Location:
    San Diego, CA
    Durkatlon, Oct 11, 2011

    I think the old driver that people refer to is the one that came with FreeBSD 7.x (and by extension FreeNAS 0.7.x). One thing you could try is to switch your SATA controller to IDE mode in the BIOS. That should cause FreeNAS to not use AHCI. It's possibly that your timeout message with change to "timeout waiting for write DRQ" but it's worth a shot.

    By the way, your symptoms sound just like mine. In my case after the initial time-out and freeze the system would be fine, provided I didn't set a spindown delay on the drives. In other words, the drives would have to be set up to spin constantly. Once a drive spins down, the next time it spins up you'll likely get the same time-out message.
  14. Offline

    djoole FreeNAS Aware

    Member Since:
    Oct 3, 2011
    Messages:
    150
    Message Count:
    150
    Likes Received:
    6
    Trophy Points:
    18
    Location:
    France
    djoole, Oct 11, 2011

    I'll try to set the controller to IDE in the BIOS, thanks.
    Though, i have 2 AHCI controllers in the mobo :
    - Intel Cougar with 6 SATA ports
    - Marvell 9120 with 2 SATA ports

    In FreeNAS 8, i have a zpool with 2 vdevs :
    Code (text):
    1.   pool: zepool
    2.  state: ONLINE
    3.  scrub: none requested
    4. config:
    5.  
    6.         NAME        STATE     READ WRITE CKSUM
    7.         zepool      ONLINE       0     0     0
    8.           raidz1    ONLINE       0     0     0
    9.             ada4p2  ONLINE       0     0     0
    10.             ada5p2  ONLINE       0     0     0
    11.             ada6p2  ONLINE       0     0     0
    12.             ada7p2  ONLINE       0     0     0
    13.           raidz1    ONLINE       0     0     0
    14.             ada0p2  ONLINE       0     0     0
    15.             ada1p2  ONLINE       0     0     0
    16.             ada2p2  ONLINE       0     0     0
    17.             ada3p2  ONLINE       0     0     0
    18.  
    19. errors: No known data errors
    ada0 and ada1 are on the Marvell controller.
    ada2 and ada3 (and ada 4 to 7) are on the Intel controller.

    Is it okay if i switch only the Marvel controller to IDE mode?

    And what is exactly the difference between running my SATA drives in IDE mode and AHCI mode?




    Also, now at each boot, i have these messages :
    Code (text):
    1. Oct 11 19:44:09 nas smartd[1697]: Device: /dev/ada1, 315 Currently unreadable (pending) sectors
    2. Oct 11 19:44:11 nas smartd[1697]: Device: /dev/ada1, 315 Offline uncorrectable sectors
    What does it mean exactly? Seatools long test (consisting in reading every sector of the drive) passed, so how is it possible that there is 315 bad sectors? Is it permanent?
    I didn't have these bad sectors before having the AHCI timeouts issue.
    At the first freeze + reboot, there was 192 bad sectors, and now 315...
  15. Offline

    William Grzybowski FreeNAS Guru

    Member Since:
    May 27, 2011
    Messages:
    1,662
    Message Count:
    1,662
    Likes Received:
    23
    Trophy Points:
    38
    Location:
    Curitiba, Brazil
    William Grzybowski, Oct 11, 2011

    Those smart errors means your ada1 disk is having problems with unreadable sectors (aka bad blocks), he is likely to be dying soon...

    AHCI is a protocol that allows a lot of things, hotplug is one among of them, so IDE means no hotplug... nad yes you cn set it just to marvell, if thats is the only controller with problem with freenas
  16. Offline

    djoole FreeNAS Aware

    Member Since:
    Oct 3, 2011
    Messages:
    150
    Message Count:
    150
    Likes Received:
    6
    Trophy Points:
    18
    Location:
    France
    djoole, Oct 11, 2011

    Here is the detailed hardware config of the NAS, and the history of my problems, if it can help in anyway..
    Code (text):
    1. Asus P8P67 R3.1  - FreeNAS 8.0.1 RC2
    2.   |
    3.   |-Marvell 9120 controler
    4.   |  |
    5.   |  |-6Gb/s--ada0--WDC WD20EADS-00S2B0 01.00A01 (WD Caviar Green)---\
    6.   |  |-6Gb/s--ada1--ST32000542AS CC34 (Seagate LP)--------------------|
    7.   |                                                                   |
    8.   |-Intel AHCI Cougar Point controler                                 |--RAIDZ1--|
    9.      |                                                                |          |
    10.      |-6Gb/s--ada2--ST32000542AS CC34 (Seagate LP)--------------------|          |__zepool
    11.      |-6Gb/s--ada3--ST32000542AS CC34 (Seagate LP)-------------------/           |
    12.      |-3Gb/s--ada4--SAMSUNG HD204UI 1AQ10001 (EcoGreen F4)-----------\           |
    13.      |-3Gb/s--ada5--SAMSUNG HD204UI 1AQ10001 (EcoGreen F4)------------|__RAIDZ1__|
    14.      |-3Gb/s--ada6--SAMSUNG HD204UI 1AQ10001 (EcoGreen F4)------------|
    15.      |-3Gb/s--ada7--SAMSUNG HD204UI 1AQ10001 (EcoGreen F4)-----------/
    16.  
    History :

    - Only the 4 Samsung in the NAS
    Huge transfer from the old NAS (Syno with the other 4 disks) to FreeNAS.
    ==> AHCI timeout in loop on ada4 like this :
    Code (text):
    1. ahcich4: Timeout on slot 24 port 0
    2. ahcich4: is 00000000 cs 01000000 ss 00000000 rs 01000000 tfd 50 serr 00000000
    3. ahcich4: AHCI reset: device not ready after 31000ms (tfd = 00000080)
    4. ahcich4: Timeout on slot 24 port 0
    5. ahcich4: is 00000000 cs 01000000 ss 00000000 rs 01000000 tfd 80 serr 001a0000
    NAS frozen, nothing else to do other than hard reboot.

    - After reboot, i launch a scrub to check integrity of the pool.
    Scrub lasts only seconds. Thanks to the zpool status command i see there was a resilver (??) on ada4.
    Relaunching scrub, now much longer, ends up without errors, although i can notice the number 5 under CKSUM for ada4.
    This SAMSUNG F4EG having been manufactured in feb 2011 (aug 2011 for the 3 others), i decide to patch the firmware, just in case.

    - Relaunching transfer Syno->FreeNAS, no error

    - Placing the 4 drivers from the Syno into FreeNAS, second raidz1, added to the pool zepool ( :) )

    - Huge transfer to the pool
    ==> AHCI timeouts on ada1. NAS frozen, hard reboot, and after boot i get this message :
    Code (text):
    1. nas smartd[1707]: Device: /dev/ada1, 192 Currently unreadable (pending) sectors
    2. nas smartd[1707]: Device: /dev/ada1, 192 Offline uncorrectable sectors
    What does it mean? The drive has been damaged?

    - Launching smartctl -t long /dev/ada0 to test the drive state
    ==> AHCI timeouts on ada1. NAS frozen, hard reboot, and after boot i get this message :
    Code (text):
    1. nas smartd[1707]: Device: /dev/ada1, 315 Currently unreadable (pending) sectors (+123)
    2. nas smartd[1707]: Device: /dev/ada1, 315 Offline uncorrectable sectors (+123)
    Is FreeNAS damaging my Seagate drive?

    - I decide to stop the NAS in order to save my data, waiting for this problem to be solved. before that, i try to recover some files from the pool via SMB, but each time i try to read it, i get AHCI ada1 timeouts freezing the NAS.

    - I put ada1 drive in my PC, and launch a long test with DOS Seatools. test PASS!
    Now i don't understand anymore... is there bad sectors on the drive or not??

    - I put the drive back into the NAS (taking advantage of that to change the SATA cable, which seems cheaper thant the others), and launch smartctl -t long /dev/ada1

    And i'm here, waiting for the long SMART test to finish. So far, no error....

    According to Seatools, the drive is sane, so the problem is else where.... (to many variables!)

    If AHCI timeouts occur again il' try to set mobo AHCI Marvel controller to IDE mode.

    Si mon expérience vous parle et que vous avez des pistes, n'hésitez pas!
    Pour le moment je suis bien dégouté, qu'est-ce qu'on est dépendant d'un NAS à la maison!
  17. Offline

    djoole FreeNAS Aware

    Member Since:
    Oct 3, 2011
    Messages:
    150
    Message Count:
    150
    Likes Received:
    6
    Trophy Points:
    18
    Location:
    France
    djoole, Oct 11, 2011

    So if there are bad blocks on the drive, how Seatools couldn't find them?
    Seagate is asking me to give the Seatools error code with the RAM for changing the drive..

    And how come the drive didn't have any bad sector before playing with AHCI timeouts in FreeNAS?
    I didn't know a software could damage a hardware :/

    Well... i have to wait for another 2 hours and the long SMART test will be over, we'll see.


    I don't need hotplug, so IDE should be good for me, as long as it doesn't decrease transfer speed.
  18. Offline

    ProtoSD FreeNAS Guru

    Member Since:
    Jul 1, 2011
    Messages:
    3,358
    Message Count:
    3,358
    Likes Received:
    7
    Trophy Points:
    38
    Location:
    Leaving FreeNAS
    ProtoSD, Oct 11, 2011

    In my experience, manufacturer's diagnostics *rarely* if *ever* find any real problems. It is my feeling that this is intentional so they give the impression that their drives are better/more reliable. A few months ago I had a Hitachi drive that was making unusual clicking noises. I ran their diagnostics and it found nothing, but clearly there was a problem and I didn't trust the drive. Fortunately they didn't require an error code before requesting and RMA.

    It's possible the system you had the drive in before didn't report the errors you're seeing now, but they were still there. While I agree FreeNAS needs some improvements, I really doubt it is damaging your disks. While scrubbing causes a lot of disk activiity, I believe it will only cause a drive that is already about to fail to fail more quickly. Never waste time replacing a disk you don't have confidence in because it's not worth the risk of losing your data.
  19. Offline

    djoole FreeNAS Aware

    Member Since:
    Oct 3, 2011
    Messages:
    150
    Message Count:
    150
    Likes Received:
    6
    Trophy Points:
    18
    Location:
    France
    djoole, Oct 11, 2011

    Okay.
    Will create a RMA tomorrow :) (actually there is a special error code to provide corresponding at "Failed other diagnostic tools, I'm confident it is a bad drive")

    So i hope the AHCI timeouts i had on the Samsung were because of the buggy firmware now patched, and the timeouts i had on the Seagate were because it's damaged.
    I'll let you know.
    For now, the long SMART test is still running, and no timeouts, this is the first time i can put the test this far (60% remaining).

    But is there anyway FreeNAS continue to work with the damaged drive (i think i'll have to wait a few weeks before getting the new drive)? Can't it "mark" the damaged sectors in order not to use them anymore?
    Do i have to make another scrub?
  20. Offline

    Durkatlon FreeNAS Aware

    Member Since:
    Aug 19, 2011
    Messages:
    322
    Message Count:
    322
    Likes Received:
    3
    Trophy Points:
    18
    Location:
    San Diego, CA
    Durkatlon, Oct 11, 2011

    Just my $0.02, but you'll probably find a new drive will have the same issues. If anything the AHCI problems are probably related to the SMART errors. I am using the drives that gave me timeouts with a different mobo with absolutely no problems. This whole timeout business is something odd about FreeBSD8.2.

Share This Page