CXL Gains Momentum at FMS 2024

george

CXL Gains Momentum at FMS 2024

The CXL Consortium has been a regular at FMS (which changed its name from “Flash Memory Summit” to “Future of Memory and Storage” this year). At FMS 2022, the company announced version 3.0 of the CXL specification. Then at Supercomputing 2023, CXL 3.1 was introduced. Initially a standard for host-to-device connectivity, it slowly adopted other competing standards like OpenCAPI and Gen-Z. As a result, the specifications began to cover a wide range of use cases by building the protocol on the ubiquitous PCIe expansion bus. The CXL Consortium is comprised of giants like AMD and Intel, as well as a large number of startups trying to play in different segments on the device side. At FMS 2024, CXL was prominently displayed in multiple vendor booths.

The migration of server platforms from DDR4 to DDR5, along with the rise of workloads that required large amounts of RAM (but were not particularly sensitive to memory bandwidth or latency), opened up memory expansion modules as one of the first sets of widely available CXL devices. Over the past few years, we’ve seen product announcements from Samsung and Micron in this area.

SK hynix CMM-DDR5 CXL memory module and HMSDK

At FMS 2024, SK hynix showcased its 128GB DDR5-based CMM-DDR5 CXL memory module. The company also detailed its related Heterogeneous Memory Software Development Kit (HMSDK), a set of libraries and tools at both the kernel and user levels that aim to improve the ease of use of CXL memory. This is achieved in part by taking into account the memory pyramid/hierarchy and moving data between the server’s main memory (DRAM) and the CXL device based on usage frequency.

The CMM-DDR5 CXL memory module comes in an SDFF (E3.S 2T) form factor with a PCIe 3.0 x8 host interface. The internal memory is based on 1α DRAM technology, and the device promises DDR5-class bandwidth and latency within a single NUMA hop. Since these memory modules are designed for use in data centers and enterprises, the firmware includes RAS (reliability, availability, and serviceability) features along with secure boot and other management features.

SK hynix also demonstrated Niagara 2.0, a hardware solution (currently FPGA-based) that enables memory pooling and sharing—that is, combining multiple CXL memories to allow different hosts (CPUs and GPUs) to optimally share their capacity. The previous version only allowed capacity sharing, but the latest version also allows data sharing. SK hynix demonstrated these solutions at CXL DevCon 2024 earlier this year, but it appears that some progress has been made in finalizing the CMM-DDR5 specification at FMS 2024.

Microchip and Micron demonstrate CZ120 CXL memory expansion module

Micron introduced the CZ120 CXL memory expansion module last year, based on Microchip’s SMC 2000 CXL series memory controller. At FMS 2024, Micron and Microchip demonstrated the module on a Granite Rapids server.

Additional information about the SMC 2000 controller is also provided.

The CXL memory controller also includes support for DRAM die failures, and Microchip also provides diagnostic and debugging tools for analyzing failed modules. The memory controller also supports ECC, which is part of the SMC 2000 series enterprise-class RAS feature set. Its flexibility allows SMC 2000-based CXL memory modules using DDR4 to complement DDR5 main DRAM in servers that only support the latter.

Marvell Announces Structera CXL Product Line

A few days before FMS 2024, Marvell announced the new CXL product line under the Structera brand. During FMS 2024, we had the opportunity to discuss this new line with Marvell and gather additional information.

Unlike other CXL device solutions that focus on memory pooling and expansion, the Structera product line also includes a compute accelerator part in addition to the memory expansion controller. All of them are built on TSMC’s 5nm process technology.

Part of a compute accelerator, the Structera A 2504 (A for Accelerator) is a PCIe 5.0 x16 CXL 2.0 device with 16 integrated Arm Neoverse V2 (Demeter) cores running at 3.2 GHz. It features four DDR5-6400 channels with support for up to two DIMMs per channel along with linear compression and decompression. The integration of high-performance server-class ARM processor cores means that the CXL memory expansion portion scales the available memory bandwidth per core while also scaling the compute capabilities.

Applications such as Deep-Learning Recommendation Models (DLRM) can take advantage of the computational capabilities available on the CXL device. Scaling bandwidth availability also comes at the cost of reducing workload power consumption. This approach also contributes to disaggregation in the server, which allows for better thermal design as a whole.

The Structera X 2404 (X for eXpander) will be available as a PCIe 5.0 device (single x16 or dual x8) with four DDR4-3200 channels (up to 3 DIMMs per channel). Features such as in-line (de)compression, encryption/decryption, and hardware-assisted secure boot are also present in the Structera X 2404. Compared to the 100W TDP of the Structera X 2404, Marvell expects this part to consume around 30W. The main goal of this part is to allow hyperscalers to recycle DDR4 DIMMs (up to 6TB per expander) while increasing server memory capacity.

Marvell also has a Structer X 2504 part that supports four channels of DDR5-6400 (with two DIMMs per channel, up to 4TB per expander). The rest of the aspects remain the same as the DDR4 recycling part.

The company highlighted some unique aspects of the Structera product line — inline compression optimizes available DRAM capacity, and support for 3 DIMMs per channel for a DDR4 expander maximizes the amount of DRAM per expander (compared to competing solutions). The 5nm process reduces power consumption, and parts support multi-host access. The integration of Arm Neoverse V2 cores appears to be a first for the CXL accelerator, and enables computational delegation to improve overall system performance.

While Marvell has announced the specs for the Structer parts, it seems that sampling is at least a few quarters away. One of the interesting aspects of Marvell’s roadmaps/announcements in recent years has been its focus on creating products tailored to the requirements of high-volume customers. The Structer product line is no different – hyperscalers are hungry to recycle their DDR4 memory modules and are clearly itching to get their hands on the expander parts.

CXL is just starting its slow growth, and the hockey stick segment of the growth curve is definitely not in the near future. However, as more CXL-enabled host systems start to be deployed, products like the Structera line of accelerators start to make sense from a server performance perspective.

Source link

Leave a Comment

d0c d0c d0c d0c d0c d0c d0c d0c d0c d0c d0c d0c d0c d0c d0c d0c d0c d0c d0c d0c d0c d0c d0c d0c d0c d0c d0c d0c d0c d0c d0c d0c d0c d0c d0c d0c d0c d0c d0c d0c d0c d0c d0c