IndustryAnalysisontheBaseLevelofAI：Deep-LearningDedicatedProcessingUnit-

Industry Analysis on the Base Level of AI：Deep-Learning Dedicated Processing Unit

source： Lv Zhengri, Electronic Information Team 2018-01-10

It has been almost 60 years since the conception of Artificial Intelligence (AI). With rapid growth of human beings’ computing ability, as well as dramatic breakthroughs of basic machine learning algorithms and massive accumulation of structural data, the AI industry re-booms in the second decade of the 21^st century. In the 13^th Five-Year Plan for the Country's Scientific and Technological Progress issued by the State Council, AI outstands as the priority among priorities. According to the Plan, the key points about developing natural human-machine interaction include intellisense and intelligent cognition, mix of virtual and actual reality and natural interaction, NLU and intelligent decision, making breakthroughs in humanoid intelligences based on big data analysis and forming demonstrative applications in multiple industries. .

Supported by rapid growth in IT domains including computers, Internet, IoT and robots, the AI industry can make up a quite comprehensive interior industry chain. Upstream of the industry chain is the base level of AI technologies, covering computing ability, data and basic algorithms; in the middle reaches is the technical level, involving basic applied algorithms and application development platforms; and downstream is the application level, focusing on combing general algorithms and hardware in consider of industry applications, concerning health, finance, transportation, security and protection and manufacture, etc. What follows in this passage mainly describes the industry situation of deep-learning dedicated chips in the base level of AI.

Current Situation

The market of deep-learning intelligent chips is still at an early stage of cultivation and development, which are mainly adopted in two categories: the first is applied in Intelligent Terminal hardware including smart phones, unmanned planes and SmartCams, etc.; the other is applied in server products equipped with high-performance deep-learning chips. Both of the two kinds of products are the basic hardware with the greatest shipment in the development of Internet and age of intelligence, and the shipment will keep increasing rapidly with the constant increase of market demand.

Classification

So far, dedicated accelerating chips of AI mainly have the following directions: GPU, FPGA and ASIC.

GPU

GPU rose in response to multi-touch vector operation of image computing. With many-core architecture, it is quite different from CPU in terms of interior structure. A GPU has a massively parallel architecture consisting of thousands of smaller, more efficient cores designed for handling multiple tasks simultaneously. With small-cache cores and simple ALU, a GPU can be applied to massive data computing in general-purpose computing. A CPU consists of a few cores optimized for sequential serial processing, which is good for computing and control tasks with complex calculation procedures relying on complex data.

In 2012, the researchers found that the GPU can meet the requirement of neural network training to some extent for its high parallelism, high storage and little control demand, which can satisfy the needs of deep-learning training and application to some extent coupled with algorithms like pre-training. This finding has greatly accelerated the development of the AI industry, and provided a feasible option for hardware acceleration of AI algorithms. In the foreseeable future, GPU will remain one of the major options for enterprises to accelerate deep-learning training.

FPGA

Within FPGA chips, there are a large number of digital gate circuits and memorizers integrated. Users don’t let run the program of FGPA directly, but define its internal structure on their own through recording configuration files. It is necessary to be flexible as the development of neural network is so fast. It only takes hundreds of milliseconds for FPGA to update the logical function, which can help the business evolution and protect the investment. The natural attribute of a digital circuit is parallelism. Equipped with data parallelism and pipeline parallelism, FPGA is no less excellent than GPU in performance.

The major advantage of FPGA in deep learning is that one can reconfigure in the field. With quite mature hardware technologies, it stands out in small-scale experiments and applications and enjoys fairly strong editability for algorithms, going without the cost of large-scale tape-out. However, the major shortage of FPGA is excessive low clockspeed, resulting in high costs of large-scale shipment, which might be enslaved to hardware manufacturers. To sum up, FPGA is one of the most suitable temporary schemes in deep-learning accelerating chips and one option for some institutions with developmental, experimental and scientific needs in the future.

ASIC

ASIC is one kind of dedicated chips customized for a particular use, rather than intended for general-purpose use. Its computing ability and efficiency can both be customized according to the needs of algorithms, so it has the following advantages compared with general chips: small size, low power consumption, high computing performance, high computing efficiency, and lower costs for greater shipment. In terms of energy consumption rate, ASIC obviously outperforms as for CPU-bound algorithms, the higher the data transfer and computing efficiency, the higher the energy consumption rate. However, the development risk of AISC is quite high as the time period from R&D to market is long, requiring a large market size for amortization. So far, what is leading the way with ASIC in the AI industry includes Google’s TPU and Cambricon’s DianNao collection.

Competition Barriers

The threshold of deep-learning accelerating chips is consisted of three kinds of barriers, namely, technological barriers, market barriers and financial barriers.

Technological Barriers

The industry of chip design and manufacture has always hold a high threshold in information technologies and sophisticated manufacturing, requiring powerful capacity for scientific research, long-time technology accumulation, complete patent systems and rich experience in field. As it is a technology-and-patent-oriented industry, industry leaders can always sweep the deck.

Ecological Barriers

Right before the New Technological Revolution, the industry had already been dominated by industry titans, furthermore, chips and hardware downstream and software upstream are strongly connected, as well as the whole system. High costs are needed to join in this ecosystem or even break it after its establishment as it is necessary to pour great time and money to step into this market and build one’s own ecosystem.

Financial Barriers

The industry of chip design bears high initial investment, long maturation period and slow recovery of the capital. Massive capital is needed from design to manufacture of chips, especially the pay for high-level talents. In case of the risk of failed tape-out, as well as considering market promotion and ecosystem construction, as a late-comer, China’s industry of integrated circuits has to be committed to large-scale investment to gain a foothold.

Players

Major Foreign Manufacturers

NVIDIA

GPU-accelerated deep learning has always been the foundation condition for many projects in AI innovative companies and academic institutions. NVIDIA’s market shares indicate that it will benefit from the high-speed development of AI technologies and industry.

As a leading-edge enterprise, it always spares no effort to push the development of GPU for deep learning. Tesla P100, based on 16nm Pascal architecture, is so far one of the deep-learning dedicated chips with the strongest training ability in the market. GPU is now in accelerated development beyond the Moore's Law. Compared to the predecessor Maxwell, Pascal's performance improved 10 times within 2 years and 65 times within 4 years. It is expected that GPU will remain one of the mainstream hardware for deep-learning acceleration in the future.

Intel

Intel has long been dominating the chip industry. In recent years, Intel has overwhelmingly carried its strategies out in the AI industry, building a complete industry chain of CPU, GPU, FPGA, ASIC, ADAS and 5G mobile telecommunications, which constitutes powerful computing and communication capability and completes an integrated computing and communication solution from perception to decision making, greatly empowered in intelligent hardware computing and automatic drive.

Intel has heavily invested in the AI industry for mergers and acquisitions, including purchasing Nervana with $350-million consideration to improve Intel’s learning platform; purchasing Altera, a titan of FPGA with $16.7-billion consideration; purchasing Mobileye, an intelligent drive chip company, with $15.4-billion consideration; also, Intel has purchased Italian semiconductor manufacturer Yogitech focusing on robots and driverless-car chips and Movidius, a fabless semiconductor company’s AI-related objects.

Google

In recent years, Google executes its AI First strategy in a determined manner, honoring Google one of the leaders in global AI industry. In the past 3 years, the most famous AI company DeepMind Google has purchased among others has increased the neural network function of Alphabet and applied it into various AI-driven projects. In terms of open source technologies, Google hold the first TensorFlow Dev Summit in February, 2017 and announced the release of version 1.0 of its TensorFlow. This system operates faster with greater compatibility and is more supportive to high-level programming languages. In terms of the R&D of chips, the first generation of TPU of independent research and development that Google published in 2016 has been widely adopted in Google’s deep-learning reasoning, including the machine-learning system RankBrain, image search, Google Translate, speech recognition, Google Street View and AlphaGo, symbolling ASIC’s first appearance as an AI chip as well as its large-scale applications. In neural network reasoning, the requirement for multi-precision floating-point format arithmetic is not so strict, so TPU designed by Google no longer focuses on floating-point arithmetic, but emphasizes on integer arithmetic. The published article shows that TPU is 13 times faster than NVIDIA K80 in terms of model inference while the power consumption is only 10.6% of the latter one. TPU is 30 to 80 times higher than CPU and GPU in terms of TOPS/Watt.

Major Domestic Manufacturers

Cambricon

Established in March, 2016, Cambricon is originated from the Intelligent Chip Research Group of Institute of Computing Technology, Chinese Academy of Sciences. Its intelligent chips are mainly adopted to accelerate deep-learning-related hardware. The Diannao collection has gained widespread recognition in the academic world upon its proposal. Since 2012, relevant theses published by this team have been awarded the Best Paper Awards in top international academic conferences on Computer Architecture for many times.

As the first AI chip company that successfully tapes out and owns mature products, Cambricon has two product lines of terminal AI processor IP products and high-performance cloud AI chips. The Cambricon-1A processor launched in 2016 is the first processor which is designed for commercial deep learning. This processor can be applied to smartphones, security monitoring, unmanned plane, wearable devices, intelligent drive and other terminal devices, outperforming all other traditional processors in terms of EER when operating mainstream intelligent algorithms. So far, Cambricon has signed the IP license agreement with HiSilicon of Huawei, whose Kirin 970 launched recently integrated Cambricon's NPU for AI acceleration.

In addition, Cambricon's server accelerating chips are also under the test together with cloud computing providers, gaining widespread recognition in the industry. Years of accumulation allows Cambricon to lead the way of the academic study and commercial application of intelligent processors, gaining hundreds of key patents in deep-learning chips covering all dimensions of deep-learning chips, which has built its own patent barriers.

Cambricon has completed the series A funding, SDIC Venture Capital being the lead investor, Alibaba Venture Capital, Lenovo Capital and Incubator Group being the co-investors. The investors acknowledge that Cambricon has leading key technologies in local intelligent chip market with high degree of industrialization and its team has strong pioneering spirit and technologies, as well as clear core competitiveness, highly praised by industry clients.

DeePhi Tech

DeePhi Tech was established by researchers from colleges and universities like Tsinghua University. It based on coordinated optimization of algorithms, software, and hardware to realize fast revolutions of products and users. It adopts FPGA to accelerate the hardware of deep-learning algorithms and applications and further set algorithms based on FPGA, accelerate the deep neural network algorithm, so to achieve better energy consumption rate than GPU. DeePhi Tech focuses mainly on unmanned planes and robots. Collaborating with ZEROTECH, DeePhi Tech so far has product applications in the market of commercial unmanned planes.

Horizon Robotics

Horizon Robotics was established by Yu Kai, former director of Baidu Deep Learning Laboratory, which is committed to offering high-performance, low power consumption, low cost, comprehensive and open, embedded AI solutions. As its core business focuses on intelligent drive and intelligent life, Horizon is a provider of deep-learning algorithms and solutions. Moreover, it has recently launched two AI dedicated accelerating chips.

Other Realization Paths

Spiking Neuron Network Chips

Spiking Neuron Network Chips are generally called as Brain Inspired chips, including IBM’s TrueNorth, Tsinghua University’s Tianjic and Zhejiang University’s Darwin. This kind of chips is inspired by the working mechanism of the brain where these neurons transmit messages as patterns of pulses: as neurons accept synapses, increasing the membrane voltage, and when the membrane voltage reaches a V threshold, neurons will output pulses. Brain Inspired chips are one of the potential development directions of AI chips in the future, but massive researches are needed to support its further growth.

Memristor Chips

A memristor is a non-linear resistor with memory. With the natural similarity —memory function —to biological neurons, memristors can simultaneously offer storage and logic, so they are regarded as an important device to simulate neurons. Memristor chips are one of the potential development directions of AI chips in the future, but they are still at an early stage, far away from true AI.

Reflect on the Development Trend of Industry

Policies

Governments of all countries show great passion on AI at this stage, because the progress of AI technologies and new industry applications improved many tricky problems. Out of strategic considerations, AI is much likely to be the development direction of information technologies and its progress will dramatically drive the improvement of manufacturing, aeronautics, military science and national defense and security certification. Those benefits are enough to promote governments of all countries to pay high attention and invest great money and manpower to this field. Among them, accelerating chips in the base level are the foundation for the realization and application of AI, so it is expected that governments of all countries will favor and protect them in policies and market access.

Trend of Technology Development

The trend of AI dedicated chips' development focuses mainly on customization, fabrication improvement and design perfection. Seen from the development of chips, when one type of demand is surging, the course used to be processed by general processors will start using dedicated processors with higher efficiency. For example, the development process of Bitcoin Mining has gone through CPU, GPU to FPGA and now chips based on ASIC. Therefore, despite the first mover advantage of GPU in the AI industry, chips might evolve towards customization as long as there is enough specific needs in the market, among them, more dedicated chips focusing on automatic drive and image processing will possibly emerge in great numbers.

Fabrication improvement has always been the guarantee of chips' energy consumption and power ratio. As the fabrication of chip manufacturers improves, so does the fabrication of deep-learning dedicated processors, thus to increase computing speed and decrease energy consumption. The design of chips is a technology-and-experience-oriented industry. The constant progress of design, technical standards and instruction sets will promote the computing ability and efficiency of chips in all manners.