Recent Posts

2013-01-08

How to tell if your Intel system might be affected by Cougar Point SATA Bug


There was a big product recall in the beginning of 2011, when Intel pulled back from the market B2 stepping 6-series chipset based motherboards. The problem was in chips marketed as H67, P67, HM67, HM65, Q67, Q65, B65, Z68, UM67, QS67, QM67 and server incarnations C202, C204, C206. The problem was that the 3 Gbps SATA ports stability degraded over time, while the 6 Gbps SATA ports remained fully functional. The link above from Anandtech discuses the case.

The problem can be observed by the following in your dmesg on Linux boxes:
[19115.610095] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[19115.614138] ata3.00: configured for UDMA/33
[19115.614164] ata3: EH complete
[19115.614422] ata3.00: exception Emask 0x40 SAct 0x0 SErr 0x80800 action 0x6
[19115.614489] ata3.00: irq_stat 0x40000001
[19115.614543] ata3: SError: { HostInt 10B8B }
[19115.614598] ata3.00: failed command: IDENTIFY PACKET DEVICE
[19115.614661] ata3.00: cmd a1/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in
[19115.614664]          res 51/04:01:00:00:00/00:00:00:00:00/00 Emask 0x41 (internal error)
[19115.614819] ata3.00: status: { DRDY ERR }
[19115.614873] ata3.00: error: { ABRT }
[19115.614931] ata3: hard resetting link
The thing to look at are the lines with SError, so:
oshii:~# dmesg | grep SError | tail -n 10
[10873.423785] ata4: SError: { UnrecovData HostInt 10B8B BadCRC }
[10873.426642] ata3: SError: { UnrecovData HostInt 10B8B LinkSeq }
[10873.751681] ata3: SError: { UnrecovData HostInt 10B8B Handshk }
[10873.772417] ata4: SError: { UnrecovData HostInt 10B8B BadCRC }
[10874.094513] ata3: SError: { UnrecovData HostInt 10B8B Handshk }
[10874.430439] ata3: SError: { UnrecovData HostInt 10B8B Handshk }
[10874.764108] ata3: SError: { UnrecovData HostInt 10B8B Handshk }
[10875.090231] ata3: SError: { UnrecovData HostInt 10B8B Handshk }
[19115.294031] ata3: SError: { UnrecovData Proto HostInt 10B8B LinkSeq TrStaTrns }
[19115.614543] ata3: SError: { HostInt 10B8B }
According to many sites around the web, the above means SATA cable errors or SATA power supply errors. Strings ata3 and ata4 are the SATA ports that are experiencing the problem. The above means that this happened on first two 3 Gbps SATA ports on the board, as the kernel enumerates them starting from ata1. Ata1 and ata2 are 6 Gbps ports.

There are tools to check if your system might be affected for Windows, but there are none for Linux.  Those tools check the revision of SATA controller. On Linux thou, the only thing to do is to run lspci and check the revision. 04 means potential problems, 05 means your system is OK.

Example of an affected system:

oshii:~# lspci | grep SATA
00:1f.2 SATA controller: Intel Corporation Cougar Point 6 port SATA AHCI Controller (rev 04)

Example of a system that is OK:

motoko3:~# lspci | grep SATA
00:1f.2 SATA controller: Intel Corporation Cougar Point 6 port SATA AHCI Controller (rev 05)
If you use 3 Gbps SATA ports and have revision 04 of the SATA AHCI Controller on a Sandy Bridge Intel 6-Series chipset, you might want to claim your MOBO under warranty.