Amplifier reliability is key to economical systems

The optical-networking industry has endured many changes in the past few years, but the demand for high reliability in optical products has remained constant. The costs associated with failure of network elements such as amplifiers can be significant (see Figure 1). Network suppliers expect failure rates of >500 FITs* (failure in tests) for single-channel amplifiers and >1,500 FITs for fully controlled wideband amplifiers.

Figure 1. The costs associated with making a field repair to a failed optical amplifier is significantly higher than the initial cost of the amplifier.

Unfortunately, there is a misconception held by many manufacturers that successfully completing qualification testing assures product reliability. While the standards set forth by Telcordia Technologies, the International Electrotechnical Commission, and others provide very detailed and prescriptive measures for product qualification, meeting these requirements is not always predictive of actual field reliability.

Qualification testing establishes a minimum standard for market entry. This process is only part of a broader reliability program that identifies and eliminates problems in the design and manufacturing process. Without this program, qualification testing is time-consuming and expensive—and still does not ensure reliability.

The earliest versions of optical amplifiers were simple in design. Single-channel amplifiers consisted of less than 10 optical components. These devices typically included a single low-power (<150-mW) laser pump, printed circuit board with simple electronic circuitry, and a few passive elements.

Today's WDM amplifiers are far more complicated, including full electronic controls to manage system transient response, multiple high-power (>450-mW) pump lasers, and between 30 and 60 optical components. All components are squeezed into a mechanical platform far smaller than the early designs. With the increased complexity and capability, amplifiers inherently contain more failure modes.

Changing market conditions have also created new reliability challenges. Design teams are under great pressure to deliver products on aggressive timelines and at lower cost. Business economics have produced leaner organizations with fewer resources to spend on lengthy data generation and analysis. For many manufacturers, reliability engineering is now reduced to qualification testing and integrated into the quality organization.

Pricing pressure forces amplifier makers to initiate aggressive cost reduction initiatives. In many cases, module assembly is transferred to low-labor-rate countries that face time and culture gaps with the remaining organization. Often, individual compensation is tied to productivity measures and quotas that are in conflict with reliability and quality objectives.

This reality is a challenge for maintaining reliability unless assembly-related failure modes are "designed-out" ahead of time. As a result of these changes in the market, product reliability can slip through the cracks until a major issue arises. Then it is typically too late and very expensive.

A comprehensive reliability program is a multifunctional effort. The success of any program hinges on its ability to optimize the tradeoffs between performance, design complexity, manufacturability, reliability, and cost. Sacrificing one or more of these attributes at the expense of the others can put the success of the entire program in jeopardy.

Good program management ensures a proper balance, but only when the tradeoffs between these attributes are accurately quantified. Therefore, the reliability engineering group must be tightly linked to other functional groups: sales/marketing, design, components engineering, manufacturing, and field support.

A reliability program requires sound processes in at least five key areas: requirements definition, components selection and application, robust design practices, manufacturing practices, and reliability growth.

Reliability requirements are defined by evaluating the application and how the product is to be used. Many times, requirements are contained in the product specification and industry standards. Requirements definition can have tremendous detail in optical and electrical performance, yet lack specifics when it comes to reliability.

As a result, there are often a large number of unwritten requirements that must be identified and included in the design phase. Reliability issues in the field often arise due to conflicts between product capability and these unwritten requirements.

One example of the requirements definition's importance involves the early reliability issue. During system startup, the control circuitry allowed the amplifiers to idle with the pumps enabled despite the absence of an input signal. That resulted in significant levels of unabsorbed pump power exiting the erbium coil and impinging on optical components adjacent to the coils. These optical components were designed to operate in the signal band with wavelengths of 1500–1600 nm—and not designed to dissipate the pump band wavelengths. A number of component failures were observed as a result of optical power damage.

Design modifications to the control circuitry were made to shut down the amplifiers when the input signal is lost. Many field failures could have been avoided if it was known that the amplifier could be idle with the pumps enabled.

Almost 80% of field returns result from component failures (see Figure 2). Thus, a rigorous component management program is also essential. Such programs include developing detailed component specifications, using standard parts from approved vendors, and establishing a robust integration process.

Figure 2. Better FIT: A rigorous component management program can reduce component failures, which result in most field returns for optical amplifiers.

That can be achieved by establishing a component engineering function whereby component engineers work closely with the design and reliability functions to define the functional requirements of a specific component in comparison with the stresses to which the component will be exposed. Component engineering then works with the suppliers to validate that the components meet the requirements.

Frequently, that involves specialized testing and life modeling to assure reliability. Supplier selection is based on design assessment, process and reliability assessment, and periodic auditing.

A strong reliability program also requires robust design practices. These practices incorporate reliability techniques that have been proved in multiple industries such as automotive, aerospace, and medical. Objectives include:

Ensuring the design is tolerant of the simultaneous component parametric changes expected over the service life of the product.
Minimizing sources for single point failures (i.e., single-failure modes that will cause complete product failure).
Ensuring that critical failure modes are detectable.

Techniques for evaluating the design include failure mode, effect, and criticality analysis (FMECA); worst case analysis; fault tree analysis; risk assessment; design validation; stress analysis (thermal, mechanical, electrical, optical); component de-rating; design similarity; reliability prediction; limit testing; and destructive physical analysis. Through this analysis, engineers confirm that the design will meet the reliability objectives.

Manufacturability is a critical element of any reliability program, especially given the high labor content of optical-amplifier assembly and the trend toward offshore manufacturing. Generic build standards must be established for products that simplify assembly and eliminate a significant number of technique-dependent processes. These procedures are a key input in the design for manufacturability (DFM) rules. Analytical techniques for assessing manufacturability include process FMECA as well as implementation of effective in-process screens/monitoring.

Design for reliability (DFR) and DFM rules must be adopted for a robust design and manufacturing process. There is a strong relationship between design rules and the requirements.

Once a robust design has been transferred to manufacturing, ongoing quality and reliability can only be sustained if process changes are controlled in manufacturing and at the supplier. Seemingly small changes (equipment modifications, cycle changes, fixturing) in a process can result in major field issues if not properly assessed and verified. Supplier audits, a change control process, process FMECA, and surveillance testing are useful tools to assess these changes.

Finally, effective monitoring of the quality and reliability of a product is essential to a company's ability to improve the reliability of future generations of product. This "reliability growth" process requires constant monitoring of failures, identification of root cause, identification of corrective action, and a means by which this information is fed into the development of future designs. Lessons learned from these efforts are then fed into the design rules, which are the medium for accomplishing reliability growth.

Establishing an end-to-end reliability process instead of relying only on qualification testing, ensures the reliable performance of optical amplifiers. The process starts with design and carries through the operational life of the product in the field. It is an integral part of the product and design from the beginning.

Reliability assurance is integrated into every functional area, with a small reliability engineering team located within the design organization. Although this team provides analysis and support for the entire business, they focus on new and unproven design attributes and processes.

Effective reliability engineering requires knowing where to direct resources. By focusing on DFR and DFM and taking a proactive approach to identifying and preventing failure modes as part of the design process, the manufacturer can attain failure rates of <150 FITs for single-channel amplifiers and <650 FITs for controlled wideband amplifiers.

Why invest the time and energy required to implement these procedures? Why bring to bear all the additional resources necessary for a successful reliability program? The answer doesn't just lie in satisfying customers and, in doing so, edging out the competition. These processes also cut many iterations out of a company's design-process cycle time. Reducing "do-overs" and preventing reliability issues from ever occurring promotes better performance, shorter cycle times, and a superior overall product.

Doug Harshbarger is director of amplifier marketing, Wendy Bahn is manager of reliability engineering for optical line modules, William Denson is a certified reliability engineer and senior reliability project leader, and Dr. Reinaldo Gonzalez is a senior reliability engineer at Avanex (Fremont, CA). They can be reached via the company's Website, www.avanex.com.

·The reliability of a product is represented by the failure rate, often reported in the unit of FIT (failure rate per billion cumulative operating hours).