At some point, programmers have to choose the most effective option in terms of computing power to get the optimal result. And technology, being so good already, has made the option more than one in terms of the processing power – CPU, GPU, TPU and Quantum Chips someday. It is now the duty of the programmer to choose the best fit for his/her work.
This article is a bit technical and it’s geared towards giving you a primal understanding of how these works (CPU, GPU, and TPU) to make a more informed decision
Central Processing Unit (CPU)
A CPU is the electronic circuitry within a computer that carries out the instructions of computer program by performing basic arithmetic, logical, control and input/output (I/O) operations specified by the instructions.
A CPU is contained on a single Integrated Circuit (IC) ship. Some computers have this single chip containing two or more CPUs called “core”, hence the name multi-core processor.
Some CPU are called a scalar processor i.e. execution of one instruction at a time, but newer CPU called “Array processors/Vector processors” have multiple processors that operate in parallel with no unit considered central.
CPU is composed mainly of:
Arithmetic Logic Unit (ALU) that performs arithmetic (+,x, /, *) and logic operations (and, or etc.),
Processor Registers that acts as a bank of operands to be supplied to the ALU and store result of the ALU operations, and
The Control Unit, a giant switch that controls the flow of instructions from fetching from memory to execution of instruction.
Architecture of the CPU
The good old CPU works on the architectural design called PIPELINE, which means each process has to be worked on by the previous step before being passed on to the next – it proved to be a bad idea in later years.
Modern CPU are built to work based on a new architectural designed called PARALLELISM. Parallelism is the situation whereby a task/an instruction is ran in parallel to another with the same or different processes.
Workflow of the CPU
The primary job of a CPU is to execute a sequence of stored instructions that is called a PROGRAM.
Almost all CPUs follow the instruction cycle – the fetch (retrieving instruction from program memory), decode (instruction is converted to signals), and execute steps.
After the execute steps, the entire process is repeated for the next instruction whose address is contained in the processor register.
And that’s it on CPU, moving on to GPU
Graphics Processing Unit (GPU)
From the name, it obviously has to do with graphics, right?
It is also an electronic circuitry (chip) that performs rapid complex mathematical calculations (vector and matrix computations) and geometric calculations that are necessary for graphics rendering.
A GPU may be found integrated with a CPU on the same circuit, on a graphics card or in the motherboard of a Personal Computer or a server
Architecture of a GPU
The GPU possesses a parallel processing architecture allowing it to perform multiple calculations at the same time.
Workflow of a GPU
A GPU uses special programming to help it analyze and use data, it is such an independent processing unit that received instructions from the CPU and execute them.
At the hardware level, data received by the GPU from the CPU through a unique channel is processed base on the instruction set known by the GPU card/ graphics card.
The Application Programming Interface (API) – an interface between one piece of software and another – e.g. Direct X, Mantle or OpenGL sends the instructions to the GPU and which possess its own operating system charged in the video memory which is charged in from driver files and the GPU executes the instruction.
Basically, GPU is for graphics rendering because of heavy mathematical computation that needs to be done, get it?
Then we have
Tensor Processing Unit (TPU)
Both GPU and CPU are Integrated Circuit, but only the TPU is an Application Specific Integrated Circuit (ASIC), which from the name denotes that it is used for a specific application in this case neural network in deep learning.
If you don’t know before, neural network is a computational intensive technique used in deep learning. And deep learning is a subset of machine learning which is a way of building artificial intelligence into systems.
TPU was developed by Google specifically for neural network prediction and so it has been optimized at the hardware level for this type of task and helps it to runt at an effective rate.
Let’s talk more about the TPU, it’s kinda new, right?
Architecture and flow of work in the TPU
TPU uses the Complex Instruction Set Computer (CISC) design style as opposed to the Reduced Instruction Set Computer (RICS) style that is used in CPUs.
The RICS style is focused to define simple instruction (e.g. fetch, store, add and multiply).
But the CISC design on the other hand, focus on defining high-level instruction that run more complex task such as calculating multiply and add many times with each instruction.
The TPU has a Matrix Multiplier Unit which has a systolic array (a combination of multiple ALUs together reusing data result from reading the processor register once) mechanism that contains 256 x 256 = 65,536 ALUs – what!?
Meaning, if a CPU is running at 2.10 GHz it means it can produce 2,100,000,000 cycles per second, but a TPU of same frequency would produce 65, 536 x 2, 100, 000, 000 cycles per second.
What more could we ask?
And there you have it, the briefest insight on CPU, GPU, and TPU
The options are laid before us, it is up to us to choose the best fit for our task. And luckily for us the cloud is there and there are a whole lot of offerings for you to use their CPU, GPU or TPU on the cloud.
My advice choose wisely, but remember
A CPU can do virtually everything it may be damn slow, but it works just like a pocket knife.
Never use a GPU if you are not doing something involving mathematical computations – Gaming, Machine Learning etc – cos it is a sword, and
Use a TPU if and only if a GPU would not suffice your neural network predictions just like a ninja dart, only shot when needed.
Thank you for reading