Return to site

How will CXL Affect Infrastructure Spending?

· Articles

Compute ExpressLink (CXL) is an architectural response to the insatiable demand of big data applications for more compute, memory, bandwidth, and storage, and CIOs’ perpetual need to improve operational and financial efficiency. CXL is a cache coherent interconnect for a servers processors, memory, and accelerators that implement industry standard protocols to remove the complexity associated with vertical scaling or offloading processing to GPUs and other accelerators. CXL fabrics use port-based routing to lower latencies and can scale up to 4096 nodes. Figure 1 shows that CXL is a remote direct data access (RDMA) variant that can provide processors and accelerators access to data or move data without CPU core involvement. Embedding CXL functionality into silicon reduces fabric latency and overheads and makes it practical to create global fabric-attached memory (GFAM) nodes or memory appliances. These GFAM nodes make it easier to justify investing more in costly DRAM memory because due to CXL memory sharing GFAM nodes will have higher utilization rates than direct attached server memory; that is their cost is amortized over a greater set of applications and use cases. Figure 2 shows the four memory system topologies created by CXL.

broken image

Figure 1: CXL Protocols

 

broken image

Figure 2: CXL Memory System Topologies

CXL has solved the “Last Mile Problem” by enabling infrastructure architects to disaggregate memory from servers and manage memory as resource pools in a composable infrastructure (CI) and enabling AIOPs to right-size the infrastructure to the problem. This horizontal scaling and resource sharing improves agility and security by eliminating the delays and risks associated with breaking large applications into small segments that fit within on-premises server limitations or moving large applications to the cloud. Storing data in CXL memory and GFAMs further improves throughput by giving CPUs, GPUs, and other accelerators more bandwidth and access to data without physically moving the data.

From a physical infrastructure perspective, CXL requires servers that support PCI-5 or higher. From an operating perspective obtaining the full operational and economic benefits of CXL interconnected disaggregated infrastructure will require managing the infrastructure as a mostly automated Converged Infrastructure (CI).

Operational Benefits

  • Dynamically sizing the infrastructure to the application is both faster, safer, and more cost-effective than partitioning large workloads into segments.
  • More memory increases the performance and scale of many workloads not just AI applications but also streaming, low latency compute, sorts, infrastructure storage server workloads, and IMDBs are obvious high-impact examples.
    o More memory makes sorts faster by increasing the number of items that can be sorted in memory which in turn reduces the number of passes through the data needed to finish the sort.
    o Dynamically adding cache capacity to storage servers only when supporting workloads with a high random read content improves performance without over configuring the servers
    o IMDB usage is constrained by cost and capacity considerations, both of which are addressed CXL attached memory
  • Simplify operations by:
    o Facilitating server configuration standardization.
    o Shrinking the number of servers under management. 
  • Using CXL-attached memory for page storage reduces the CPU wait times.
  • Offloading the overhead of distributed architectures from CPUs to CXL silicon makes more CPU cycles available for useful work.
  • Providing cloudlike capacities to on-premises infrastructure has the potential to reduce hybrid-cloud infrastructure traffic

Economic Benefits

  • Increase system utilization rates which avoids or defers costly emergency upgrades and/or purchases.
  • Standardizing servers simplifies acquisitions, asset management, and operations
  • Reduce lost opportunity costs by coupling big application deployments to infrastructure capabilities rather than specific server configurations.
  • Packaging DRAM and NVM memory into CXL attached appliances will:
    o Increase memory utilization rates and lower ownership costs
    o Decrease the overall spend on memory, or add additional capabilities
    o Put memory appliances on their own service life schedules which could be more like networking equipment than servers or storage – longer
    o Provide leverage when negotiating with server and storage vendors
  • Reduce the number of servers under management
  • Reduce the need for expensive four- and eight-socket servers

Infrastructure Implications

Figure 3 shows CXL introducing two new tiers of memory into the memory hierarchy: CXL memory and Disaggregated Memory. Optane is not included in Figure 3 because it was withdrawn from active marketing by Intel in September 2022, and Micron a year earlier. Replacing Optane with CXL memory eliminates DIMM slot contention and complexity, and makes it possible to increase the memory capacity of smaller servers, especially those supporting memory-constrained applications.  

broken image

When datasets and storage requirements are too large for server memory but too small for flash arrays disaggregated Memory packaged into memory appliances (a.k.a. GFAM) provides a very fast alternative to all-flash-arrays. CXL with 2-4 microseconds vs 100 – 200 microsecond minimum latencies typical of AFAs can be used for short term small storage requirements where previously a high speed NVMe array was used. Disaggregated memory’s advantage accrues primarily from the lower latency of DRAM vs NAND Flash, and CXL being an RDMA protocol. Readers interested in learning more about the technical details of CXL should use the links provided in the recommended reading section.

CXL Investment Implications

Embedding the benefits of CXL into composable infrastructures will reduce the cost of on-premises IT by increasing infrastructure utilization rates, reducing interconnect overheads, and automating provisioning with all its attendant benefits. As with all changes, incorporating CXL and CI into the physical infrastructure and operations will entail costs and risks. The ROI analysis of CXL should include the following ROI factors.

  • The cost of upgrading to PCIe 5.0 servers and CI operations
  • Cost of a new dedicated CXL switch network
  • The cost of CXL cards, CXL Memory, and Disaggregated Memory?
  • Savings from:
    o Shrinking server farms
    o Improvements in staff productivity
    o Elimination of human errors
    o Keeping applications on-premises

Bottom Line

The benefits of CXL and CI suggest that organizations should begin testing their high-impact applications on cloud infrastructures that have deployed these technologies or in advanced technology group sandboxes to quantify their benefits. Organizations should also begin reviewing their asset management strategies to enable early adaption of these technologies should they be warranted.

Recommended Reading

 

 

Stanley Zaffos & Valdis Filks
Advisors at Lionfish Tech Advisors, Inc.