How AI is changing the way we think about memory

SEP 01, 2021

With unprecedented demands on high-bandwidth memory (HBM), Richard Walsh of Samsung Semiconductor Europe considers what the industry needs to do for the next generation of memory technology.

The wonder that is HBM has been on a steady journey over the past few years. Performance, power efficiency and speed have all improved incrementally over time. Until recently, this progress has been perfectly fine to support the pace of technological change throughout much of the 2010s.

But we’re now at a point where things have got to change. Artificial intelligence-based applications like recommendation engines, image classification and natural language processing are everywhere — in phones, smart speakers, vehicles, wearables and homes. And that’s just AI. Machine learning, virtual reality, next-generation gaming and other intensive applications are here to stay.

All of these applications are now placing unprecedented demands on HBM, which simply cannot continue to improve at the pace these new technologies need. Not only do these applications need to process huge swathes of data, but they also need to do it faster and better, and the improvements need to come quickly. Algorithms demand high access rates to large data capacities.

In traditional systems, memory bandwidth and power consumption are ultimately limiting the performance and capabilities of AI and ML applications. So what does the industry need to do from here to support the next generation of technology?

PIM to the rescue

The idea of using processing-in-memory (PIM) technology with HBM has been talked about as a way to overcome technical restrictions for more than 30 years.

By placing a DRAM-optimised AI engine inside each memory bank (a storage sub-unit), an HBM-PIM adds processing power exactly where data is stored – paving the way for parallel processing while reducing the amount of travelling the data needs to do.

For IT managers, data centre system architects and GPU architects, this kind of architecture represents a great opportunity. For example, software engineers can write simple commands to leverage HBM-PIM’s programmable computing unit to improve the speed of localised repetitive workloads.

Wider still, HBM-PIM delivers more than double the system performance of traditional HBM while also cutting energy use by over 60%. And as a great extra bonus, HBM-PIM doesn’t need any hardware or software tweaks, meaning engineers can seamlessly embed it within existing systems.

Despite these obvious benefits, it’s not been an easy architecture for the industry to build. Until recently, the work chip manufacturers needed to put into overcoming the technical challenges a few years ago was too much – and as a result, progress has been slow over the years.

Why? The issue with PIM technology is that because of the way it links memory and logic, engineers have always faced a trade-off between storage density in a memory-optimised process and transistor performance in a logic-optimised process.

As a result, performance and capability of PIM devices have been on the low side compared with the technical hurdle and cost of the integration. And therefore the traditional von Neumann architecture has prevailed, which uses separate processor and memory units to carry out millions of intricate data processing tasks. But frustratingly, this sequential processing approach moves data move back and forth constantly, causing bottlenecks when handling ever-increasing volumes of data.

But there is good news. The proliferation of artificial intelligence and machine learning applications has reinvigorated investment in and the development of PIM technology – simply because these technologies are here to stay, and memory needs to adapt to accommodate them, rather than the other way around.

And it makes sense. PIM technology is the perfect technology to handle AI and ML workloads, with optimised kernels that reduce the movement of data by mapping data accesses with a high degree of spatial and temporal locality for concurrent processing in the (parallel) banks of a high-performance memory device. By working in this way, PIM addresses the typical bottleneck of CPU/GPU memory bandwidth, improving the AI and ML application’s performance and overall capability.

Market breakthroughs

Because of the increased investment and development of PIM technology, the market is now in 2021 seeing the first fully programmable HBM-PIM of its kind to combine high-performance, parallel data processing and DRAM all on the same piece of silicon.

These new HBM-PIMs, based on the JEDEC-standard HBM2 specification and enhanced with PIM architecture, are already proving hugely successful in meeting the demands of AI applications – and as a result, manufacturers are already planning to include PIM technology in future HBM3 technology.

On the timeline of semiconductor innovation, it seems the world has finally passed the point where bandwidth is the major limiting factor in AI and ML performance. Now we can really start to see these incredible technologies flourish.

Richard Walsh has been working within the memory division of Samsung Semiconductor Europe for the past 25 years, covering DRAM, NAND and NOR flash, among other technologies. He holds a Bachelor of Engineering degree in Electronics, Computer Hardware and Software from the University of Limerick.