Are the highly-marketed deep learning and machine learning processors just simple matrix-multiplication accelerators?


SOURCE: BBNTIMES.COM
SEP 01, 2021

Artificial intelligence (AI) accelerators are computer systems designed to enhance artificial intelligence and machine learning applications, including artificial neural networks (ANNs) and machine vision.

Most AI accelerators are just simple data matrix-multiplication accelerators. All the rest is commercial propaganda.

The main aim of this article is to understand the complexity of machine learning (ML) and deep learning (DL) processors and discover the truth about the so-called AI accelerators.

Unlike other computational devices that treat scalar or vectors as primitives, Google’s Tensor Process Unit (TPU) ASIC treats matrices as primitives,The TPU is designed to perform matrix multiplication at a massive scale.

At its core, you find something that is inspired by the heart and not the brain. It’s called a “Systolic Array” described in 1982 in “Why Systolic Architectures"?

And this computational device contains 256 x 256 8bit multiply-add computational units. A grand total of 65,536 processors is capable of 92 trillion operations per second.

It uses DDR3 with only 30GB/s to memory. Contrast that to the Nvidia Titan X with GDDR5X hitting transfer speeds of 480GB/s.

Whatever, it has nothing to do with real AI hardware.

Some General Reflections on Processing Units and Narrow AI Coprocessors

A central main processor is commonly defined as a digital circuit which performs operations on some external data source, usually memory or some other data stream, taking the form of a microprocessor implemented on a single metal–oxide–semiconductor integrated circuit chip (MOSFET).

It could be supplemented with a coprocessor, performing floating point arithmetic, graphics, signal processing, string processing, cryptography, or I/O interfacing with peripheral devices. Some application-specific hardware units include video cards for graphics, sound cards, graphics processing units and digital signal processors.

A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, and input/output (I/O) operations specified by the instructions in the program.

  • Microprocessor chips with multiple CPUs are multi-core processors.
  • Array processors or vector processors have multiple processors that operate in parallel, with no unit considered central. Virtual CPUs are an abstraction of dynamical aggregated computational resources.
  • Microprocessor chips with multiple CPUs are multi-core processors.

There are a lot of processing units, as listed below:

  • Processors Taxonomy
  • Central Processing Unit (CPU). If designed conforming to the von Neumann architecture, containing at least a control unit (CU), arithmetic logic unit (ALU) and processor registers.
  • Graphics Processing Unit (GPU)
  • Sound chips and sound cards
  • Vision Processing Unit (VPU)
  • Tensor Processing Unit (TPU)
  • Neural Processing Unit (NPU)
  • Physics Processing Unit (PPU)
  • Digital Signal Processor (DSP)
  • Image Signal Processor (ISP)
  • Synergistic Processing Element or Unit (SPE or SPU) in the cell microprocessor
  • Field-Programmable Gate Array (FPGA)
  • Quantum Processing Unit (QPU)

A Graphical Processing Unit (GPU) enables you to run high-definition graphics on your computer. GPU has hundreds of cores aligned in a particular way forming a single hardware unit. It has thousands of concurrent hardware threads, utilized for data-parallel and computationally intensive portions of an algorithm. Data-parallel algorithms are well suited for such devices because the hardware can be classified as SIMT (Single Instruction Multiple Threads). GPUs outperform CPUs in terms of GFLOPS.

From Fake AI Accelerators ASIC to Real AI Accelerators

The TPU and NPU go under a Narrow/Weak AI/ML/DL accelerator class of specialized hardware accelerator or computer system designed to accelerate special AI/ML applications, including artificial neural networks and machine vision.

Big-Tech companies such as Google, Amazon, Apple, Facebook, AMD and Samsung are all designing their own AI ASICs.

Typical applications include algorithms for training and inference in computing devices, such as self-driving cars, machine vision, NLP, robotics, internet of things, and other data-intensive or sensor-driven tasks. They are often manycore designs and generally focus on low-precision arithmetic, novel dataflow architectures or in-memory computing capability, with a typical NAI integrated circuit chip containing billions of MOSFET transistors.

Focus on training and inference of deep neural networks, Tensorflow uses a symbolic math library based on dataflow and differentiable programming

The former uses automatic differentiation (AD), algorithmic differentiation, computational differentiation, or auto-diff, and gradient-based optimization, working by constructing a graph containing the control flow and data structures in the program.

Again, the datastream/dataflow programming is a programming paradigm that models a program as a directed graph of the data flowing between operations, thus implementing data flow principles and architecture.

Things revolve around static or dynamic graphs, requesting the proper programming languages, such as C++, Python, R, or Julia, and ML libraries, such as TensorFlow or PyTorch.

What AI computing is still missing is a Causal Processing Unit, involving symmetrical causal data graphs, with the Causal Engine software simulating real-world phenomena in digital reality.

It is highly likely embedded in the human brain and Real-World AI.

Similar articles you can read