How AI is Challenging Data Center Infrastructures

Author:
Aditya Jain, Sr. Director, onsemi

Date
09/20/2024

AI is going to be a game-changer within data centers, requiring computing power and energy levels that are orders of magnitude above anything we have seen so far

Click image to enlarge

Driven by emerging applications including cryptocurrency and artificial intelligence/machine learning (AI / ML), the energy consumed by data centers is very significant and set to grow rapidly to meet user demands. According to a recent report by the International Energy Agency (IEA), data centers consumed 460 TWh in 2022, representing around 2% of all electricity used globally. In the US, where one-third of the world’s data centers are located, consumption is 260 TWh or 6% of all electricity use.

Predicting the future is challenging and depends on how many power-hungry GPUs are deployed to cope with the demands of AI technology and, of course, adding further air conditioning to maintain the temperature in the data center. The IEA report suggests that, by 2026, data center consumption will grow to at least 650 TWh (40%) but it could be as high as 1,050 TWh (128%).

Supporting AI Trends in Data Centers

AI is an extremely power-hungry technology and the data centers that facilitate it need to have the capacity in terms of computing power and energy delivery to cope with this.

A recent study by RISE Research Institutes of Sweden illustrates just how large this rapid change will be due to the swift uptake of the technology. For example, ChatGPT achieved one million users within five days of its launch in November 2022. They were at 100 million users in two months, a milestone that took TikTok nine months and two-and-a-half years for Instagram.

For context, performing a search on Google uses just 0.28 Wh – the same as running a 60 W light bulb for 17 seconds.

By comparison, training GPT-4, with 1.7 trillion parameters and using 13 trillion tokens (word snippets), is a completely different proposition. To do this required multiple servers containing a total of 25,000 NVIDIA A100 GPUs, with each server consuming around 6.5 kW. OpenAI has stated that training took 100 days, using around 50 GWh of energy – at a cost of USD 100 million.

Data Center 48V Architecture

In the early days of data centers, a centralized power architecture (CPA) was used where the conversion from mains (grid) voltage to 12V (bus voltage) was done centrally. This was then distributed to the servers and converted to logic levels (5 or 3.3V) locally using relatively simple converters.

However, as power requirements grew, the currents on the 12V bus (and associated losses) became unacceptably high, forcing system engineers to revert to a 48V bus arrangement. This reduced the current by a factor of four, and the losses by the square of this, based upon Ohm’s Law. This arrangement became known as a distributed power architecture (DPA).

At the same time, voltages for processors and some other components were reducing, eventually to sub-volt levels leading to the need for multiple secondary rails. To address this, a two-stage conversion was introduced with a DC-DC converter (known as an intermediate bus converter – IBC) converting from 48V to a 12V bus, from which other voltages were derived as needed.

Click image to enlarge

Figure 2: Architecture of a Server Power System

The Need for Power Efficient MOSFETs

Power losses within a data center pose challenges for the operators. The first, and most obvious, is that they are paying for electricity which has no benefit in running the servers. The second is that any waste energy manifests itself as heat which then has to be addressed. With hyperscale AI servers reaching power needs of 120 kW (and sure to increase over time), even a 2.5% loss at 97.5% peak efficiency at 50% loading represents 1.5 kW of waste energy per server – equivalent to an electric heater running full time.

Dealing with the heat can involve thermal mitigation measures within the power conversion system such as heat sinks or fans. These make the power supply larger, taking up space that could be used for more computing power and, in the case of fans, consuming electricity which costs money. As temperatures need to be carefully controlled within data centers, excessive losses will also raise the ambient temperature which means that more air conditioning will be needed. This is a capital expense and operating cost – as well as a consumer of space.

Clearly, performing the conversion from mains (grid) voltage to the voltage needed to power the AI GPUs and other devices as efficiently as possible is of great benefit to a data center operator.

For this reason, much work has been done on power supply topologies over the years, bringing in techniques within the front-end PFC stage, such as totem-pole PFC (TPPFC) to make them more efficient. Additionally, diode rectifiers have been replaced with MOSFETs for efficiency, and techniques such as synchronous rectification have been introduced.

Enhancing the topology is only half of the battle. To optimize efficiency, all components must also be as efficient as possible – especially the MOSFETs that are essential to the conversion process.

When MOSFETs are used in switching power conversion, there are two primary forms of loss, conduction loss and switching loss. The conduction loss is due to the resistance between the drain and source (R_DS(ON)) and is present while the current is flowing. The switching loss is due to a combination of the gate charge (Q_g), output charge (Q_OSS) and reverse-recovery charge (Q_rr) which are replenished on every switching cycle. As the trend is towards higher switching frequencies to reduce the size of magnetic components, this loss can become significant as the frequency with which replenishment occurs increases.

Clearly, the lower the conduction and switching losses in a particular MOSFET, the better the overall conversion efficiency of the power system will be.

PowerTrench T10 MOSFETs

Synchronous rectification is now an essential technique in all high-performance, high-current, low-voltage power conversion applications, especially those found in the servers within data centers. In these applications, several MOSFET parameters including R_DS(ON), Q_g,Q_OSSandQ_rrwill directly affect conversion efficiency and device manufacturers are seeking ways of reducing them.

onsemi’s PowerTrenchT10 MOSFETs achieve ultra-low values of Q_gwith new a shielded gate trench design and produce devices with sub-1mOhm R_DS(ON).The latest PowerTrench T10 technology not only reduces ringing, overshoots and noise with its industry-leading soft recovery body diode also that reduces the Q_rr.This strikes a good trade-off balance between on-resistance performance and recovery behavior while also permitting lower loss fast switching with a good reverse recovery characteristic.

Overall, the parametric enhancements present within PowerTrench T10 devices deliver enhanced efficiency in low to medium voltage, high current switching power solutions. In general, the switching losses are improved by up to 50% over previous generation devices while conduction losses can see a 30%-40% reduction.

onsemihas introduced a series of 40V family and 80V family of the PowerTrench T10 technology. The NTMFWS1D5N08X (80V, 1.43mΩ, 5mm x 6mm SO8-FL package) and the NTTFSSCH1D3N04XL (40V, 1.3mΩ, 3.3mm x 3.3mm source down dual cool package) provide best-in-class of figure of merit (FOM) for power supply unit (PSU) and intermediated bus converter (IBC) in AI data center applications. They achieve 97.5% efficiency of PSU and 98% efficiency of IBC requirement for Open Rack V3 specification.

Click image to enlarge

Figure 3: Advantages of PowerTrench T10 MOSFETs

Summary

The AI revolution is upon us, and no one can be fully certain exactly what it will mean for data centersin terms of power delivery needs in the future. However, it is certain a new set of challenges has been presented. Real estate scarcity and power grid limitations are making it difficult to find new locations with sufficient capacity. Total critical IT power demand is skyrocketing, placing a heavy burden on electricity costs. To meet these demands, data center owners are not only building new facilities but also pushing existing ones to the limit, aiming for denser megawatt-per-square-foot configurations.

With power levels sure to exceed 100 kW, power conversion will be a key focus to deliver efficiencies that run cool, reliably enhancing power density and saving space in cramped modern data centers.

onsemi

How AI is Challenging Data Center Infrastructures

AI is going to be a game-changer within data centers, requiring computing power and energy levels that are orders of magnitude above anything we have seen so far

RELATED

New Tools Streamline Power System Design Process

Marvell Adds Active Copper Cable Linear Equalizers to Its Connectivity Portfolio

Infineon Presents First High-Density Trans-Inductance Voltage Regulator Power Modules Optimized for AI Data Centers

Teradyne Unveils Titan HP