RL Magazine
Edition 45
Technical Trends: Failures and Liars
by L. Bryant Underwood

Return to Menu

Eric Arnum had a brilliant report on computing warranty rates that came out on September 27th. If you do not subscribe to his weekly reports, you should, he always presents newfound insights with solid data. The most interesting chart in that report was the accrual and claim rate chart for computing and storage. When I looked at the data, my first thought was- ‘great now I have some backup data to reverse warranty accruals to cash’. Always a good thing.

I shared the report with Byron, an Engineering Director at a major defense contractor. Byron’s take was much more interesting. His first thoughts were that the fear of lead (pb)-free increasing failure rates seems to have been mitigated to a great extent. For Byron and other defense and public-safety engineers lead-free was always a great concern with regard to reliability. The use of lead-free solder is great for the environment, but it creates a host of issues with reliability. Most of these are related to the tendency of lead-free solder to almost come to life. In some conditions, lead-free solder will grow long strings that will wind around until it finds something to short out. In low temperature conditions the solder will just powder and components will fall off the board. None of these are good for things like spacecraft, missiles or fighter jets. But clearly the data reflect a very positive trend.

Shortly after reviewing this data, I was speaking with some folks that run large repair operations for contract manufacturers. These guys were repairing mostly tablets and cellphones. All were struggling with major issues related to high rates of False Failures/NTFs (no trouble found). In short, depending on the product there were 20-40% false fails at the low-end. For some products the false failure rates jumped to almost 80%. The anecdotal false failure rates for these repair operations averaged just under ~50%. How is the possible? How has the hardware side of technology improved so much but the massive costs from false fail rates seem to be growing?

The Human Interface
It may not be obvious but when machines deal with machines the failure rates are always lower. Most computing interconnections are based on defined network protocols and have a lot of fault tolerance built in. In addition the M2M (machine to machine) interface moves at such high data rates compared to the human interface that failures can be effectively ignored. Stated another way, if a computer gets bad data it will just wait 10 milliseconds and then request a resend. For people things are not so easy. Someone using a cellphone is very sensitive to a host of noises, and audio artifacts that disrupt speech intelligibility. One of the worst is the odd phase-error induced warbling you get when the cellphone’s CODEC is attempting to recover lost packets as you drive through an area of weak coverage. What happens next? The user just blames the phone. It gets sent into repair and viola’ a false-failure is born. But there is more to it than that right? Yes, I believe there is.

The ARM Miracle
If you had not noticed there are two main camps for computing. Servers and PCs based on x86/Intel and Mobile devices based on the ARM fabless model. ARM products are based on RISC (reduced instruction sets) and designed to use less power. That capability to leverage low power was a boon to mobile devices. But there were secondary benefits that are coming to the forefront. For a processor power = heat. If heat is low I can add more capability, reduce product size and in turn, lower cost. Those trends have clearly improved the functionality of all mobile devices.

With that said, we are at a bit of a crossroads today. Just a few years ago mobile devices had several processors. The main processor was focused on management and the user interface. The housekeeping tasks of audio processing, the wireless interface, camera control and graphics were all handled by their own dedicated ICs. What has happened today is that the processors are getting fast enough to handle all of these tasks on their own. Also, this trend is what is behind the recent news from companies like TI that they are shifting focus from mobile ICs to embedded systems. The great part of this is that costs go down, power needs are less and the size shrinks. However, the not so great part is that the current capability of the processors and software is not perfectly ready for the transition to mobile devices with such pared down circuitry. In my opinion we are 1-2 years or so away from fully capable mobile devices that can multitask and manage the all the interface tasks in a fully transparent manner. The “noise” caused by this partial-capability is what I believe is driving the current surge in false fails.

As you can imagine the costs incurred from this movement of inventory to and fro are immense. The sustaining response is to ride it out as the products continue to improve. In the near term, better training for the call-centers and retail staff to keep the unit from entering the repair process is the principle response in mitigating some of the costs. On the positive side it is a great opportunity to have a conversation with your Customers in reducing costs and assessing ways in keeping those liars (false failures) out of your supply chain.
Bryant Underwood manages Public Safety Sourcing for Cassidian Communications, an EADS North America Company in Frisco Texas.

Return to Menu