IBM describes analog AI chip that might displace power-hungry GPUs

Power-sipper still in the research stage, but findings are interesting

IBM Research has developed a mixed-signal analog chip for AI inferencing that it claims may be able to match the performance of digital counterparts such as GPUs, while consuming considerably less power.

The chip, which is understood to be a research project at present, is detailed in a paper published last week in Nature Electronics. It uses a combination of phase-change memory and digital circuits to perform matrix–vector multiplications directly on network weights stored on the chip.

This isn’t the first such chip IBM has developed as part of its HERMES project, but the latest incarnation comprises 64 tiles, or compute cores, as opposed to a 34-tile chip it presented at the IEEE VLSI symposium in 2021. It also demonstrates many of the building blocks that will be needed to deliver a viable low-power analog AI inference accelerator chip, IBM claims.

For example, the 64 cores are interconnected via an on-chip communication network, and the chip also implements additional functions necessary for processing convolutional layers.

Deep neural networks (DNNs) have driven many of the recent advances in AI, such as foundation models and generative AI, but in current architectures the memory and processing units are separate.

A compute-in-memory chip based on resistive random-access memory

AI chip adds artificial neurons to resistive RAM for use in wearables, drones

READ MORE

This means that computational tasks involving constantly shuffling data between the memory and processing units, which slows processing and is a key source of energy inefficiency, according to IBM.

IBM’s chip follows an approach called analog in-memory computing (AIMC), using phase-change memory (PCM) cells to store the weights as an analog value and also perform computations.

Each of the 64 cores of the chip contains a PCM crossbar array capable of storing a 256×256 weight matrix and performing an analog matrix–vector multiplication using input activations provided from outside the core.

This means that each core can perform the computations associated with a layer of a DNN model, with the weights encoded as analog conductance values of the PCM devices.

The digital components are made up of a row of eight global digital processing units (GDPUs) that provide additional digital post-processing capabilities needed when processing networks with convolutional and long short-term memory (LSTM) layers.

The paper highlights how the PCM cells are programmed using digital-to-analog converters that generate programming pulses with variable current amplitudes and time durations. After this, the core can be used to perform matrix–vector multiplications by applying pulse-width-modulated (PWM) read voltage pulses to the PCM array, the output of which is digitized by an array of 256 time-based analog-to-digital convertors.

This is an oversimplification, of course, as the IBM paper published in Nature Electronics goes into exhaustive detail on how the circuitry within each AIMC operates to process the weights of a deep learning model.

The paper also demonstrates how the chip achieves the near-software-equivalent inference accuracy, said to be 92.81 percent on the CIFAR-10 image dataset.

IBM also claims the measured matrix–vector multiplication throughput per area of 400 giga-operations per second per square millimeter (400 GOPS/mm2) is more than 15 times higher than previous multicore chips based on resistive memory, while achieving comparable energy efficiency.

IBM does not appear to provide a useful energy efficiency comparison with other AI processing systems such as GPUs, but does mention that during tests, a single input to ResNet-9 was processed in 1.52 μs and consumed 1.51 μJ of energy.

IBM’s paper claims that with additional digital circuitry to enable the layer-to-layer activation transfers and intermediate activation storage in local memory, it should be possible to run fully pipelined end-to-end inference workloads on chips such as this.

The authors said that further improvements in weight density would also be required for AIMC accelerators to become a strong competitor to existing digital solutions such as GPUs.

The chips used in testing were fabricated using a 14nm process at IBM’s Albany Nanotech Center in New York, and run at a maximum matrix–vector multiplication clock frequency of 1GHz.

IBM isn’t the only company working on analog chips for AI. Last year, another research paper published in Nature described an experimental chip that stored weights in resistive RAM (RRAM). It was estimated that the chip in question would consume less than 2 microwatts of power to run a typical real-time keyword spotting task.

In contrast, the typical compute infrastructure used for AI tasks using GPUs is getting ever more power hungry. It was reported this month that some datacenter operators are now supporting up to 70 kilowatts per rack for infrastructure intended for AI processing, while traditional workloads typically require no more than 10 kilowatts per rack. ®

More about

TIP US OFF

Send us news


Other stories you might like