The role of laser packaging in advancing AI technologies

Aug. 8, 2024
Optical I/O and co-packaged optics (CPO)-based solutions show great promise.

Vishal Chandrasekar / Ayar Labs

Recent growth in artificial intelligence (AI) deployments and applications underscores the need for large-bandwidth, highly efficient data transfer solutions. The thousands of GPUs deployed for AI training and inference are increasingly performance-bottlenecked by the traditional interconnect solutions used in data center networking. These approaches increase power consumption and costs and highlight the urgent need for innovative solutions.

Optical I/O and co-packaged optics (CPO) based solutions are among the most promising advancements. They aid the shift towards large, interconnected AI clusters by optimizing data throughput and system performance. These optical I/O and co-packaged solutions require innovative and highly advanced laser packaging strategies, which help boost the performance, scalability, and reliability of these systems that are foundational in the ongoing deployment of profitable AI infrastructure.

Laser packaging in silicon photonics

Silicon photonics has revolutionized data movement within and between logic and memory chips by using light instead of electrical signals. This has become increasingly important to deployments targeting generative AI. Traditional copper-based solutions increasingly constrain data flows across these systems, limiting the scale and data rates at which computing devices and memory capacity can be clustered.

Pluggable optics have been the most common method of converting electrical signals into optical signals (and vice versa) to address these shortcomings. CPO has successfully moved the I/O module away from the faceplate by integrating the module components into a single package alongside the compute or switch chip. Co-packaging brings the functions of the pluggable transceiver right next to the application-specific integrated circuit (ASIC), which reduces signal loss over copper links at high bandwidths.

Optical I/O presents a more integrated, energy-efficient solution for distributed computing systems, such as AI clusters that require high bandwidth density, low energy, and low interconnect latency. This is achieved by having a single electro-optical chiplet packaged with the compute ASIC performs the transmit, receive and data conversion functions of a transceiver built with discrete modules.

CPO and optical I/O solutions can use integrated or remote light sources to provide optical input to their co-packaged modules or chiplets. Let’s examine the two options and explore their benefits and tradeoffs.

Integrated light sources

Integrated light sources describe approaches where the light source is co-located with the CPO module or optical I/O chiplet, close to the GPU or other compute ASIC. This is achieved by the lasers being individually manufactured and co-packaged with the photonic integrated circuits (PICs) or monolithically fabricated with the PICs. On the other hand, remote light sources, also known as disaggregated or external lasers, are independently packaged and physically separated from the CPO modules, optical I/O chiplets, and ASICs. 

Modern AI systems require GPU or ASIC die with very high power consumptions, which cause very high temperatures in their immediate surroundings. Integrated light sources experience these very high temperatures due to their physical proximity, while systems with remote light sources can be engineered to experience better thermal environments. 

Lasers, particularly at the high output powers necessary for advanced data rates, are the components most likely to fail within an optical connectivity solution when subject to high temperatures, potentially taking down the entire link. Remote light sources have the advantage of less demanding thermal environments, extending their operational lifespan and dramatically reducing failure rates and system downtime. 

Furthermore, remote lasers can easily be removed, serviced or replaced without interfering with other system components like the co-packaged GPU and CPO or optical I/O chiplets. Integrated light sources may not be serviceable or need significant modifications to the expensive ASIC package, increasing costs and system downtimes.

The industry has established the External Laser Small Form-Factor Pluggable (ELSFP) specification, recognizing the importance of external lasers. This common form factor leverages the serviceability, replaceability, and ease-of-deployment benefits of pluggable modules and the cost, latency, and channel loss advantages of CPO solutions while uniting the ecosystem of suppliers and customers around a single form factor.

Factors to consider
From a cost and reliability perspective, the light source is often the most sensitive component of an optical connectivity solution. Designers and architects should prioritize a diversity of suppliers and standardized wavelength grids, like the O-band LR4 grid used for over two decades. This approach ensures a low design and supply risk and establishes an attractive high-volume cost structure essential for successfully deploying optical I/O.

The Continuous-Wave Wavelength Division Multiplexing Multi-Source Agreement (CW-WDM MSA) has brought together a broad spectrum of industry stakeholders, including laser suppliers, transceiver manufacturers, CPO and optical I/O connectivity suppliers, and others, to promote interoperability across solutions and reduce dependency on any single supplier or technology. Such standardization efforts are critical when enabling solutions for AI, HPC, and other high-value, large-volume applications.

Remote light sources' role

The advancement of AI technologies and the exponential growth of large language models (LLMs) demand new data transfer solutions between compute and memory elements to keep pace with the exponential increase in model sizes and token counts. Remote light sources are critical enabling technologies for optical I/O solutions that solve these bottlenecks.

Traditional networking systems rely heavily on switches for connectivity within large systems, which introduce latency and limit the size of high-bandwidth domains. The multi-wavelength, multi-port capabilities typically found in optical I/O solutions make it possible to offer direct, low-latency, high-bandwidth connections between several devices. This approach simplifies the system architecture by eliminating switches and enhances the speed and scalability of data exchange across the network, thus increasing the size of the high-bandwidth domain. 

One of the biggest challenges in AI architectures is the “memory wall,” the rapidly increasing memory-to-compute ratio that leads to operational efficiency bottlenecked by the amount of High-Bandwidth Memory (HBM) that can be packaged alongside processor chips. Remote light sources relieve this bottleneck when part of an optical I/O solution by enabling disaggregated memory clusters connected via ultra-low-latency, high-bandwidth links to GPUs.

Laser packaging techniques are now a critical building block toward overcoming the bottlenecks holding back AI’s potential. Remote light sources are essential in delivering the efficiency, scalability, and performance required for next-generation AI. They will be the workhorses of an infrastructure set to handle the growing demands of LLMs and new advanced computing paradigms. As AI continues to push the boundaries of what’s technologically feasible, adopting optical I/O solutions will be instrumental in enabling future AI applications.

Sponsored Recommendations

The Road to 800G/1.6T in the Data Center

Oct. 31, 2024
Join us as we discuss the opportunities, challenges, and technologies enabling the realization and rapid adoption of cost-effective 800G and 1.6T+ optical connectivity solutions...

Advances in Fiber & Cable

Oct. 3, 2024
Attend this robust webinar where advancements in materials for greater durability and scalable solutions for future-proofing networks are discussed.

On Topic: Fiber - The Rural Equation

Oct. 29, 2024
RURAL BROADBAND:AN OPPORTUNITY AND A CHALLENGE The rural broadband market has always been a challenge for service providers. However, the recent COVID-19 pandemic highlighted ...

Understanding BABA and the BEAD waiver

Oct. 29, 2024
Unlock the essentials of the Broadband Equity, Access and Deployment (BEAD) program and discover how to navigate the Build America, Buy America (BABA) requirements for network...