IBM takes the bits out of deep learning

IBM takes the bits out of deep learning

IBM Research has revealed plans to build a prototype chip specialized in artificial intelligence (AI). The chip, which IBM has dubbed the Artificial Intelligence Unit (AIU), is the first complete system on a chip from the IBM Research AI Hardware Centre.

In a blog about the new chip, IBM researchers wrote: “We’re running out of processing power. AI models are growing exponentially, but the hardware to train these behemoths and run them on servers in the cloud or on edge devices like smartphones and sensors hasn’t evolved as quickly.”

IBM’s plan, based on research from 2019, is to reduce the complexity of the chips used for AI processing. The researchers said that the flexibility and high precision of general-purpose computer processors (CPUs) made these chips well-suited for general-purpose software applications, but that flexibility also puts them at a disadvantage when it comes to training and running deep learning models. that require massively parallel AI operations.

IBM is pursuing two approaches with its alternative to conventional CPUs. First, it said it is developing an application-specific integrated circuit (ASIC) that uses significantly fewer binary bits (less precision) than the 32-bit arithmetic used in general-purpose computing. The main task of the ASIC involves matrix and vector multiplication, which IBM says are the primary calculations required in AI.

In a paper published in 2019, IBM researchers presented an approach to simplify the processing required to perform so-called “point” calculations used in deep learning algorithms. Such calculations involve multiplying two floating point numbers and accumulating the results in partial sums.

The researchers said much of the work involved in “deep learning with reduced accuracy” is accomplished by approximating the data in the multiplication part of the calculation. But the accumulation part is left at 32 bits.

According to IBM, it is not possible to reduce the accuracy of the accumulation part of the calculation, as this can lead to serious instability in training and degradation of model accuracy. In the paper, the researchers propose a theoretical approach to achieve extremely low-precision hardware for deep neural network (DNN) training. This is one of the areas of research IBM drew on when developing the AIU hardware.

In the blog post about the AIU, IBM said, “An AI chip doesn’t have to be as ultra-precise as a CPU. We don’t calculate trajectories for a spaceship to land on the moon, or estimate the number of hairs on a cat. We make predictions and decisions that don’t require anywhere near this granular resolution.”

Using the technique used, dubbed “approximation computing,” IBM said it can drop from 32-bit floating-point arithmetic to bit formats that hold a quarter as much information. “This simplified format dramatically reduces the effort required to train and run an AI model without sacrificing accuracy,” IBM claimed.

The IBM researchers’ second approach is that the AIU chip will be designed so that the circuits streamline AI workflows by sending data directly from one computing machine to the next.

Specialized processing units designed for AI workloads are nothing new. Companies like Nvidia and AMD have taken advantage of the specialized cores their graphics processing units (GPUs) offer to optimize machine learning. Ultimately, however, the GPU was essentially designed around the math involved in manipulating graphics with a highly parallel computing architecture. However, they use hundreds if not thousands of cores. For example, the Nvidia Titan V supercomputing GPU, introduced in 2017, had 21 billion transistors and 5,120 single-precision Cuda cores.

In theory, an ASIC can be designed to focus primarily on optimizing one type of workload. In the case of IBM, this is the training of deep learning networks for AI applications.

When looking at AI acceleration for training AI models in late 2017, McKinsey estimated that ASICs will account for 50% of workloads in data center computing by 2025, while GPUs will account for 40% by 2025. ASICs will account for 70% by 2025 % of workloads used.

But the line between ASICs and GPUs is blurring. For example, Nvidia’s DXG A100 AI training engine offers tensor cores within its GPU architecture.

Describing the AIU, IBM said, “Our full system-on-chip has 32 processing cores and contains 23 billion transistors—roughly the same number as in our z16 chip. The IBM AIU is also designed to be as easy to use as a graphics card. It can be plugged into any computer or server with a PCIe slot.”

IBM has positioned the AIU as “easy to plug in as a GPU card,” indicating it hopes to offer a viable alternative to GPU-based AI accelerators. “By 2029, our goal is to train and run AI models 1,000 times faster than we could three years ago,” the company said.

#IBM #takes #bits #deep #learning

Leave a Comment

Your email address will not be published. Required fields are marked *