Misunderstandings about protection strategies and differences in software implementations among vendors make SONET protection a headache.
John Brandte
Ncomm
Protecting SONET networks should be far easier than it has turned out to be. Standards bodies, vendors, and providers have been hard at work for some time trying to protect all aspects of SONET networks -- be it the facility equipment, the physical network links, or the switching equipment. But confusion about different protection mechanisms and unresolved incompatibility between vendors' solutions continues to stand in the way of effective protection, fail-over, and recovery in heterogeneous environments.
Protection switching -- for availability or reliability?
Let's begin by clarifying that protection switching, as defined in the standard, is a technique for addressing network availability (mean duration of failure) rather than reliability (mean time between failures).
Availability, or uptime, is measured by how quickly network operations are restored after a failure, with at least "five 9s" (99.999%) availability being the carrier standard. The motivation for implementing automatic protection switching (APS) is that even with the most reliable circuit, an outage -- even of short duration -- continues to be painful.
Backing up critical circuits has always been a key network design consideration, and automatic fail-over to alternative facilities has been available for decades. Leased-line analog circuits (1.2-19.2 kbits/sec) were backed up with dial-up analog circuits. Digital circuits (2.4-64 kbits/sec) were initially backed up by switched analog circuits, then by switched 4-wire and 2-wire digital service. With services like T1, using the idle leased line as a hot standby became more prevalent.
While protection methods like these met basic customer needs, there was an enormous downside to consider -- they were all proprietary to a single equipment provider. Heterogeneous environments were not an option. Changing vendors meant "forklift changes" not just of hardware, but also in operations. This situation still exists today with technologies like T1/T3 and E1/E3, which have neither the open standards nor uniform implementations of protection switching necessary to achieve cross-vendor interoperability.
Fortunately things are beginning to improve, thanks to newer physical layer WAN technologies implemented with SONET/SDH that, unlike T1/E1, have specific standards for APS.
Defining APS -- facility or equipment protection?
APS is often mistakenly used to describe two different kinds of protection -- that of equipment and that of the transmission facility. The methods to achieve equipment and facility protection are different, and only facility protection is defined in the SONET APS standard.
Equipment protection switching accommodates potential hardware failures, while the transport facility (fiber/coax/copper) itself is still functional. Should the equipment fail, alternative hardware is substituted. Usually, protection ports are on different boards from the protected ports to avoid cascading failure and enable nondisruptive hardware repairs.
The SONET APS standard defines facility protection switching, which deals with transport link failure. Should the transport medium become compromised, a mechanism is put in place to supply an alternative physical path. Redundant facilities are provisioned that may be switched into the original port or a new port.
There are four standards that define APS for SONET/SDH transport. The general set for SONET is described in document T1.105; GR-253 addresses linear APS, while GR-1400 and GR-1230 specify unidirectional path-switched ring (UPSR) and bidirectional line-switched ring (BLSR), respectively. The SDH equivalents are outlined in ITU documents ITU-T G.783 and ITU-T Q.784, ITU G.826, and ITU G.774.
SONET APS can be configured in linear (point-to-point) or ring network architectures, depending on the different needs of the application, traffic and incumbent equipment.
Linear APS at work
1+1 linear APS provides two redundant fiber links, each carrying identical traffic, with receivers at each end monitoring the bit streams and choosing the "best" link. It is costly because two receivers are required at each end point, twice as much fiber is needed, and no additional capacity is gained. Although 1+1 is the most expensive technique, it also offers the fastest recovery, often without any data loss.
1:1 works slightly differently and is less costly. Although 1:1 also requires a backup or "protection" fiber for each primary or "working" fiber, the protection fiber remains idle or carries low-priority traffic when not switched in.
1:n provides a single backup fiber for up to 14 primary fibers. The secondary fiber can carry low-priority traffic when not in use as a backup. This method is much less expensive than the other linear APS alternatives, because one secondary fiber provides coverage for multiple primary fibers.
All linear APS solutions share the drawback of asymmetric delay. Additional buffering at the nodes needed to overcome this problem raises equipment costs. But linear APS is simple to install and provides adequate point-to-point availability.
Ring APS at work
There are two main types of ring APS: UPSR and BLSR. UPSR is the simpler of the two, with its dual counter-rotating fiber links, each of which carries identical traffic. Both sending and receiving nodes monitor the two fibers and select the better of the two signals based on criteria such as bit error rate and Alarm Indication Signal (AIS).
Advantages of UPSR include the fact that the receiving node makes all decisions with no interaction with either local or remote transmitters, no communications channel is needed, and UPSR provides virtually uninterrupted service. The downside includes the need for redundant transceivers, as well as the introduction of asymmetric delay.
BLSR is frequently used in core network applications and is more complicated, using line switching to redirect traffic to the protection fiber in the event of failure. BLSR uses the K1/K2 bytes along with other local indications to raise a flag to switch. Once the flag is raised, an independent "controller" communicates via the K1/K2 bytes with the local backup facility (through the backup SONET/SDH transceiver) and then contacts a far-end transceiver to prepare for transfer of traffic from the failed working facility to the protection facility.
What happens after the switch is synchronized and executed depends on whether the configuration is revertive or non-revertive. If revertive, traffic is automatically switched back to the original working facility once it is recognized as operationally sound. In non-revertive scenarios, traffic remains on the protection fiber until it's manually switched back.
Switches are generally configured as non-revertive. Bad lines often appear to be fixed for short time periods while the provider is troubleshooting and repairing line problems. Automatically switching back and forth repeatedly would disrupt service unnecessarily.
All APS must meet certain performance thresholds to be standards-compliant. The total budget for detecting a failure and completing the switchover must occur within a maximum restoral time of 60 msec, with 10 msec to detect and 50 msec to switch.
There are several different criteria for making the switching decision, including:
- AIS
- Loss of pointer
- Unequipped (indicated in the C2 byte)
- Remote defect indication
- Bit-error ratio (severely errored seconds/errored seconds).
Where these indications are used depends on the type of APS. Line-level indicators are used for linear models, while line- and path-level are used for ring configurations. K1/K2 are used only when there is a sharing of the backup facility or when communication with the far end is required, typically in bi-directional and 1:n APS.
Today's struggle for interoperability
In looking at the implementation considerations associated with SONET APS, it's clear that, despite standardization efforts, providing interoperable protection switching is still not simple. Providers and equipment vendors acknowledge this complexity and are eager to find a solution. Yet vendors, currently using custom implementations, are fighting a losing battle to provide reliable protection switching that is fully interoperable across vendor platforms.
The root of the problem lies in the software -- multiple implementations, each of which interprets the standard in a slightly different way. While the standards themselves are well defined, it is difficult to make fully interoperable products when each vendor's APS software implements the standard just a little differently. In addition, WAN transport expertise may be outside a vendor's area of core competency, so the critical functions necessary to build stable APS source code implementations may not be well understood by developers who are not SONET experts.
Then consider that there are several critical areas that are extremely difficult to coordinate in cross-platform environments -- including timing, mapping, failure notification, alarm condition handling, and predictable performance -- and you're at the heart of the problem.
So what are the options? Well, carriers can continue to single-source equipment and avoid heterogeneous implementations. Or vendors and providers can hope that standards bodies or, more likely, a vendor consortium will define interoperability conformance details.
There is also another alternative. Board manufacturers and software vendors are developing and delivering standards-compliant, interoperable APS implementations. So whether it's embedded board-level firmware or source-code software that vendors choose to ease their interoperability woes, there's no longer any need to continue to be held captive by single-sourcing, with more alternatives becoming available every day.
John Brandte is vice president of marketing and business development at Ncomm (Salem, NH), a provider of APS and other telecom wide area networking source code, software and custom consulting. His experience in the communications industry spans engineering, standards development, business planning, and analysis.