Sweating The Small Stuff

Two weeks ago, one of our long-term customers returned a late 2008 Xserve to us stating it was not recognizing PCI cards in either slot. I jumped at the chance to take a look at it since we don’t see many broken Xserves coming back in. Xserves are traditionally easy machines to work on. Many of the components are user-installable and the whole thing can be stripped down in about five minutes.

Thinking that I was either going to find a failed PCI slot or Main Logic Board (more likely since both slots supposedly were non-functional), I grabbed a PCI card for testing and powered on the Xserve; it booted to a Kernel Panic while loading the kernel (the part of the boot process where the grey Apple logo is on screen). This was not what I was anticipating. Still going along the lines of a potential issue with the PCI slots, I removed both PCI cards and reboot the machine; Kernel Panic (KP). Ok, time to go back to basic troubleshooting.

First I attempted to boot to the 10.5 Server Install DVD, it KP’d to that as well as an external hard drive with a known good boot volume. Then, I swapped the RAM, which yielded no change. I then manually ran the EFI Firmware Update for that Xserve, but it wouldn’t accept it. Traditionally, with desktop Macs and Xserves if the machine is experiencing Kernel Panics while loading the kernel and both operating system and RAM have been ruled out the issue is with the processor. Luckily, we had an identical Xserve in the shop that I was able to borrow some parts from. I swapped out the processor, but still no change. I was able to then successfully run Apple’s Service Diagnostics in EFI, which told me everything passed. Logically speaking, the issue should be a Main Logic Board at this point, so I ordered one up and let it go for the day.

The next day, Jon, another great SDE tech, installed the replacement logic board and to his chagrin he was greeted with a lovely Kernel Panic on boot. Ugh. He let it sit and the next day I was back in the office and I started scouring the service manual for tips. All status lights were displaying their normal state, with the exception of the System Identifier Light which blinked to let me know that I had the top cover removed. Next step, minimal system! I disconnected everything except for MLB, processor/heat sink, power supply and distribution board, RAM, fan array and video card. I attempted to boot to my known-good external hard drive and still received a KP in return. For my next trick, I replaced all of the minimal system components with the parts from the identical Xserve that we had with the exception of the replacement logic board and processor; still nada!

Just to be thorough (read: stubborn), I then proceeded to replace every component aside from the replacement logic board with the parts from the identical Xserve. My thought was to then work backwards eliminating one component at a time until I found the piece of hardware that was causing the issue. I never got that far. Even with all of the good components in place the same issue still occurred. At this point it was just about comical, and from being in situations like this before I felt it had to be something really simple that I was missing; but what?!

I called in two other techs and talked them through my process. We all stared at the machine for a bit and scratched our heads, but no ideas were generated. Then, an even more bizarre issue occurred. The external hard drive that I was using for testing has three partitions; two 10.5 and one 10.4 boot. During one last attempt at booting the machine the power button was pressed, but none of us bothered holding down the option key to get to the EFI boot manager. I turned around and realized the machine had successfully boot to the 10.4 partition and was functioning. This should not be possible; a late 2008 Xserve should not be able to boot into Tiger! At least from here I was able to verify that the firmware was up to date, but now I was even more confused.

It was time to call in the big guns. Feeling a little defeated, I picked up the phone and dialed Apple Enterprise Support; Apple’s tech line for help with servers and enterprise software. I explained my process and issue to the tech, who also seemed stumped. I’ll admit that my first call wasn’t terribly productive. The tech seemed to have trouble following my triage process and he ended up telling me to reinstall 10.5 Server on the internal hard drive and/or to try the firmware update again. Despite knowing neither should resolve the issue, I did them and then called back when that didn’t work. The second time I called I got a tech who seemed really interested in the case. He ended up putting me on hold while he “asked the room” for advice. The one unanimous answer was that Tiger should not boot on that model Xserve and they suggested that I order yet another logic board, thinking the one I had received was defective.

Ok, one day of waiting for another board. It arrived, and I did the replacement this time. I was not surprised at all when I had yet another Kernel Panic staring back at me on boot. At this point I had the broken Xserve right across from the known-good Xserve that I was using as a parts-donor and after stepping back for a moment, I saw the problem. At first, I didn’t believe it. Even while I was then “fixing” the broken Xserve I was grumbling about how stupid it was. When I boot the Xserve and it happily booted to its internal hard drive without a hitch, I was relieved, annoyed and a little embarrassed all at the same time. So, what did I notice?

Well, there are two slots for the processor; since they can be configured with one or two processors. The good Xserve properly had the processor in CPU A. The defective Xserve had the processor in CPU B. Of course it was panicking on boot! I suppose the only silver lining is it is interesting to know that a late 2008 Xserve is able to boot into Tiger if its processor is in the wrong slot, but I can’t say that’s very useful information. After speaking with the customer, it was confirmed that they had a tech there who had upgraded the Xserve himself to two processors and he accidentally removed the wrong one before shipping the machine back to us. Since it’s incredibly uncommon for a customer to rearrange the processor configuration it hadn’t dawned on me (or the three other techs looking over my shoulder) that the processor was in the wrong place.

The good news is that the original issue—the two non-working PCI-slots—was resolved by replacing the logic board. The machine is once again a happy, functioning Xserve and I have been re-taught the lesson that if a problem seems that convoluted there’s probably a simple solution that’s being overlooked.

Similar Posts

  • Happy Tuesday,

    Much of the state is digging out following record-breaking snowfall over the weekend (33″!). I didn’t even realize we were in the midst of a major weather event until I saw friends’ Facebook posts; only a few measly inches fell here in the Mad River Valley!

    I’m heading to Burlington tomorrow, where Katie reports still-unplowed roads are the norm. Our Service Writer in Burlington travels from Swanton every day and had a very crazy commute!

    This edition of Tech Tails has two meaty articles. Rebecca writes about an Xserve that had all of us baffled, Ed shares his tips to avoid and rectify system slowdowns, plus some tips for your gadgets in extreme temperatures.

    Happy new year from all of us at Small Dog Electronics!

    As always, thanks for reading and keep in touch.

    Matt
    “matt@smalldog.com”:mailto:matt@smalldog.com

  • Ten Tips for Dealing with Unexpected Mac Slowdowns

    A friend recently sent me an email, questioning why his MacBook Pro with 4GB of RAM was “getting slower and slower, with an increasing frequency of the appearance of the SRWOD (spinny rainbow wheel of death).” This is something I occasionally hear about, but haven’t experienced (except for Safari randomly bogging down for several seconds).

    Unfortunately, mysterious computer slowdowns can be difficult to diagnose. Overstuffed system cache, old temp files, corrupted preferences, a hard drive in the early stages of failure, and faulty RAM are always candidates for causing this problem. Here are some suggestions to resolve system slowdowns.

    Also, please make sure you have a solid backup of your Macs important data before proceeding. *I’ll say it again: make sure your Mac is backed up properly before proceeding.*

    1. Any Mac will slow down when its hard drive is almost full, regardless of processor speed. Simply moving some of your data (especially media files like movies, video podcasts, etc) to an external drive can greatly improve a Mac’s responsiveness.

    Read how to reclaim hard drive space in an old Kibbles article “by clicking here.”:http://www.smalldog.com/kibbles/kibbles_display.php?id=557

    2. Clear your Mac’s desktop. The OS has to draw each of those icons as separate windows, so when you have dozens of files littered on the desktop the system is taxed. Clearing the Macs desktop is proven to improve system performance.

    3. Make sure your computer is up to date with all the latest software and firmware updates from Apple. This can go a long way to improving system performance. To check this, click the Apple in the top left corner of the screen and select “Software Update…”

    4. Simply running a free maintenance program can often help bring a sluggish and flakey machine back to speed. These programs force the Mac’s regular Unix maintenance scripts; normally these run daily, weekly, and monthly early in the morning. “Click here for further reading on this.”:http://support.apple.com/kb/HT2319?viewlocale=en_US

    I use a program called Onyx (free) to run these scripts. You can get it for Tiger (10.4) and Leopard (10.5) as well as Snow Leopard. It’s effective and easy to use. It starts by checking the S.M.A.R.T. status of your hard drive, so you can determine if the drive is failing. This step takes several minutes. After that Onyx can flush system cache, etc.

    One catch about Onyx is that it has several options that most people shouldn’t use, such as the option for erasing bookmarks and internet browsing history. I do like and recommend Onyx, though–get it from Apple’s site “by clicking here (version for 10.6).”:http://www.apple.com/downloads/macosx/system_disk_utilities/onyx.html For 10.5 and older, “click here to find your version on VersionTracker.”:http://www.versiontracker.com/dyn/moreinfo/macosx/20070

    You can also download a simpler program called MacJanitor that will only run the maintenance scripts “by clicking here.”:http://personalpages.tds.net/%7Ebrian_hill/macjanitor.html When a tech diagnoses your Mac, he or she runs a battery of programs that are similar to Onyx. This takes several hours. However, Onyx does a great job for occasional repairs and maintenance.

    5. Check the health of your hard drive. I depend on Onyx to verify the S.M.A.R.T. status of my Mac’s hard drive. Immediately back up your computer if you think there’s a real issue with the drive. Then consider using a dedicated drive diagnostic/repair tool such as “Disk Warrior.”:http://www.smalldog.com/product/41941 If the drive is having issues and you’re going to replace it, consider using a 7200RPM model. A faster hard drive will result in a (slightly) faster Mac.

    6. Check the health of your Mac’s RAM. There are several ways to test the health of your Mac’s RAM. I use “Rember,”:http://www.kelleycomputing.net/rember/ which is a free program that is a front-end GUI to a basic Unix ‘memtest’ command. You can read more about testing RAM “by clicking here.”:http://www.macfixit.com/article.php?story=20050524014158525

    7. Deal with mutant applications. Ok, so maybe the word “mutant” is unfair. However, it’s always a good idea to delete applications that you don’t use. I use “AppCleaner”:http://www.freemacsoft.net/AppCleaner/index.php to do this.

    Also, many apps install helper programs that run by default whenever you startup your Mac. This typically happens in the background, without the user having to confirm anything. Often these aren’t needed and can hog system resources without having anything to show for it. To disable startup items you don’t use, navigate to System Preferences > Accounts > Login items and uncheck the list.

    Finally, any active, running application uses system resources including CPU cycles, RAM and disk activity, even when it is in the background and you’re not using it. Some programs leak memory when they are running, which makes them gobble RAM over time.

    8. Use Activity Monitor and iStat Pro to analyze which system processes and applications are hogging system resources. You can download the “iStat Pro widget by clicking here.”:http://www.islayer.com/apps/istatpro/ Activity Monitor is found in the Utilities Folder which is nested in the Applications folder in OS X.

    9. If you have an Intel Mac, use Xslimmer to trim away the legacy PowerPC code from Universal binary applications. Read more “by clicking here.”:http://www.xslimmer.com/

    10. Programs that automatically perform syncing, indexing, and backup operations on your Mac can occasionally slow it down. They can sometimes cause minor drags that slow the system for a couple of seconds at a time.

    If none of these helps, the problem will likely be more time-consuming to resolve. At Small Dog, our techs run a battery of tests with several software and hardware tools to seek out and fix strange system slowdowns. Hopefully the above suggestions will keep you from having to send in your machine!

    __Editor’s note: Check out “this cheeky website”:http://marbleofdoom.com/about.html to log your time spent waiting for the “Spinning Beach Ball of Death!”__

  • Tip of the Week: Cold Weather Care

    During the colder months it’s important keep in mind that cold objects entering a warm, moist environment (like your home or workplace) will become damp with condensation. As liquid exposure of any type can void your warranty and result in costly repair, and as Apple now installs liquid exposure indicators inside each of its products, it’s vital that you keep your electronic gear safe.

    If at all possible, do not keep your laptop, iPod, iPhone or other electronic gear in the car overnight in the cold. We’re beginning to see a few victims of condensation come through the shop, and it’s easy to avoid. If you find yourself with a moisture-covered device, the first thing to do is turn it off and remove the battery. iPod and iPhone users can only shut down and wait as their batteries are not removable.

    Legendary data recovery firm (and Small Dog data recovery partner) “Drive Savers”:www.drivesavers.com notes that this exposure to hard drives is particularly serious: “Cold weather can wreak havoc on temperature-sensitive hard drives used in computers, game consoles, MP3 players and video recorders. Condensation buildup on the drive platters and frozen components can lead to drive failure and data loss.