I don't know if I'll ever buy another Seagate drive. This is a tale of 3 months of computing misery.
Rewind
When I saw the news back in early July that Mac OS X 10.7 Lion had hit Gold Master (GM), the expected release to paying customers, I immediately downloaded and installed on my main machine, an iMac.
A few days later, I filed Radar 9727925 (OpenRadar link). Lion was randomly automatically ejecting my external Seagate HD. This had never happened under any version of OS X 10.6 Snow Leopard. I had all my code on this drive, the iMac's internal drive was near full, as well as a ton of media files I keep the Apple TV feed with. Also, all my photos were on this drive. I figured it was an obvious bug, some bizarre oversight, and should definitely get fixed. I thought it was bad enough that 10.7.0 would get the dubious distinction of needing GM2.
That didn't happen. 10.7.0 shipped as expected, and my external disk keep ejecting with a scary modal error dialog like clockwork. I was frustrated. My excitement for Lion was high coming back from WWDC 2011, but I had staid my hand on the primary machine until the OS reached GM. Of course, I had installed to a different external drive to do testing, and it was great on my iMac, I thought I was home free.
But I couldn't have been more wrong about Lion, or at least I thought for three months…
The Long Wait
After filing my bug, Apple responded before 10.7.0 shipping asking me to run a disk-debug command and capture the trace of what happened during the random ejection. I collected the information and supplied it to Apple. I eagerly awaited a response, but none came.
I installed 10.7.1 as soon as it came out, hopeful my bug would be stealth resolved. It wasn't and I grew despondent. How could Apple leave this bug unfixed? I searched the Apple Support Communities. Clearly other people were affected by this. This was the worst Mac OS X bug I had experienced since I started using it with 10.3.x.
As soon as the first 10.7.2 builds started arriving in the Mac developer center, I deployed it. The iMac I am putting this on is the whole house iTunes media server, I was risking wife and kid wrath making this move. They do not tolerate beta. Early 10.7.2 betas didn't resolve the bug. I started to panic. Could 10.7.2 make it through development without this issue getting fixed?
Desperation
If you've ever been to an Apple developer event, you know they say if you have any questions just email them. I've always admired that, and wouldn't know if I would feel comfortable if I were in those shoes encouraging public emails for help.
On August 30, 2011, I emailed Michael Jurewitz out of desperation:
Hi Mike, Can you figure out what the real status is of Radar 9727925? In short, the external drive where I keep a huge chunk of my files, including my code, keeps randomly ejecting a dozen or more times a day. Never happened in Snow Leopard. I hated to even think to ask you, but I filed this bug close to 3 months ago and it's driving me crazy. If anything it's gotten worse in the latest 10.7.2 seed. Yes I am on the bleeding edge hoping this one bug gets fixed. Thanks so much, Dave
I couldn't stand sending that email. I just about cried when I got the automated Out of Office reply. What was I going to do now? I remembered that I could email Apple and ask for a status update on the bug. I did that and hoped for the best, but I couldn't take it anymore, time to start deep debugging to prove whether it was anything specific to my environment.
Taking Matters Into My Own Hands
Deep Debugging OS X is really not that complicated. Here's the rough outline:
- Verify your System disk for any corruption
- Repair Disk Permissions
- Create a new user account and see if you can reproduce the bug
- Install a new copy of the current OS on an internal disk partition and see if you can reproduce the bug
- Remove anything 3rd party that starts up with the OS one at a time until the bug stops such as:
- StartupItems
- LoginItems
- LaunchAgents
- LaunchDaemons
- Kernel Extensions (.kexts)
- File a Radar or GTFO
But I started going through the whole debugging sequence just to be extra double sure this wasn't specific to my environment. I executed steps 1-4 and reproduced the bug. I was even more certain this had to be a Lion bug. With a bare OS X install, the external disk was randomly ejecting. This is the smoking gun right? Wrong!
I went through all the 3rd party code in step 5 that was loading on my system. Something interesting caught my eye. Lion wasn't loading some Seagate kernel extensions since they didn't include 64-bit versions. I went out to Seagate's site, and found GoFlex for Mac 1.1.2, but I didn't install it. Normally when debugging something, you try and remove 3rd party stuff to fix a problem, you don't add it back in. Besides, when I skimmed the description of this software, the usefulness of the package seemed limited to Drive Settings and Diagnostics, the only capitalized words in the summary that weren't MacOS or GoFlex. I'm sure I was trying to figure this out one night bleary eyed well past midnight, so I was probably in a hurry due to exhaustion and didn't feel like reading the paragraph, my mistake! I put the problem on hold for a few more days.
The Workaround
A few days go by, I install the latest 10.7.2 beta without it resolving the bug, I brainstorm on what it could be. During one of these sessions, it dawned on me that it looks like the drive is getting ejected if it's not being used. There is some kind of sleep timer being fired! I check the Energy Saver System Preference, but Put the hard disk(s) to sleep when possible is unchecked. This must be where the bug is, Lion is always sleeping the drive when possible but there is some incompatibility, thus the modal error dialog I reasoned.
I hadn't really done much with Automator, but I figured this was the kind of thing it might be good at. I whipped up a quick workflow app that did a search for a string in file names on the external drive, and had it open as a LoginItem. Voila! The external drive stopped ejecting! It was a sleep timer. The relief was immediate, it was like I just scored a touchdown, or for the non-sports inclined, like I had just solved an incredibly difficult puzzle. I could wait as long as it took for Apple to fix this Lion bug now, I wasn't really impacted anymore and I stopped thinking about it. Only it wasn't a Lion bug at all, not really.
Finally, The Solution
On October 3, 2011, Apple responds to my request for more information on the bug. I don't know if Michael Jurewitz was the invisible hand that got this moving, or it was just my place in the request for info queue, but I was excited when I received the mail. I paraphrase because of NDA restrictions. The message thanks me for my patience and told me that to fix the issue I must install GoFlex for Mac 1.1.2! How is it possible that to use a USB or FireWire (the GoFlex does both, reason I bought it) in OS X Lion you need 3rd party software? I then went back to the Seagate site and read the description of this software again, and it hit me like a shot of vodka:
drivers to disable the built-in sleep timer on the drive