一、3D packaging will be the main process.
From: Wisdom things
Recently, Yang Rui, research director of Taiwan Province Institute of Industrial Technology in China, predicted that TSMC will dominate the chip manufacturing industry for another five years, after which 3D packaging will become the main process challenge.In the past decade, various computing workloads have developed rapidly, but Moore's Law has been repeatedly spread and will come to an end. Faced with more diverse computing application requirements, in order to "plug" more functions into the same chip, advanced packaging technology has become the key innovation path to continuously optimize the performance and cost of the chip.TSMC, Intel and Samsung are all accelerating the deployment of 3D packaging technology. In August of this year, all three chip manufacturing giants showed up, making this battlefield more and more smoky.
▲Intel Packaging Technology Roadmap
Through the advanced packaging layout of the three chip manufacturing giants, we can see that in the next year, 3D packaging technology will be an important killer beyond Moore's Law.
First, advanced packaging: cram more functions into one chip.
Previously, 2D planar packaging technology was mostly used for chips, but with the increasing demand of heterogeneous computing applications, 3D packaging technology, which can integrate chips of different sizes, processes and materials, has become a necessary choice with higher performance and flexibility.
According to the progress of the latest 3D packaging technology, Intel Lakefield adopts 3D packaging technology Foveros, TSMC's 3D packaging technology SoIC will be mass-produced in 2021 as originally planned, and Samsung's 3D packaging technology has been applied to 7nm EUV chips.
Why move towards advanced packaging technology? There are two main reasons. First, most of the performance limitations of processors come from memory bandwidth, and second, productivity increases.
On the one hand, the development speed of memory bandwidth is far lower than that of processor logic circuit, so there is the problem of "memory wall".
In the traditional PCB package, it is difficult to improve the wiring density and signal transmission rate, so the memory bandwidth increases slowly. However, the advanced packaging has a short wiring density, a large room for signal transmission rate improvement, and can greatly increase the interconnection density, so advanced packaging technology has become one of the main methods to solve the memory wall problem.
On the other hand, the architecture of high-performance processors is becoming more and more complex, and the number of transistors is also increasing, but the advanced semiconductor technology is still expensive and the productivity is not satisfactory.
In semiconductor manufacturing, the smaller the chip area, the higher the yield. In order to reduce the cost of using advanced semiconductor technology and improve the yield, an effective method is to cut the big chip into several small chips, and then use advanced packaging technology to connect them together.In this context, the three chip giants represented by TSMC, Intel and Samsung are actively exploring 3D packaging technology and other advanced packaging technologies.
Second, TSMC's 3D packaging combination boxing
At the end of August this year, TSMC launched the 3DFabric integration technology platform, aiming at speeding up the innovation of system-level solutions and shortening the time to market.TSMC 3DFabric can integrate various logic, memory devices or special chips with SoC, providing smaller chips for high-performance computers, smart phones, IoT edge devices and other applications, and can improve bandwidth, delay and power efficiency by integrating high-density interconnection chips into package modules.3DFabric consists of TSMC front-end and back-end packaging technologies.The front-end IC technology is TSMC SoIC technology, which was first announced in 2018, and supports CoW(Chip on Wafer) and WoW(Wafer on Wafer) bonding methods.
▲ A is the SoC； before chip division; B, C and D are various partition chiplets and re-integration schemes supported by TSMC SoIC service platform.
By using silicon via (TSV) technology, TSMC's SoIC technology can achieve a bump-free bonding structure, so that small chips with different sizes, processes and materials can be re-integrated into an integrated chip similar to SoC, making the final integrated chip smaller in area and the system performance better than the original SoC.
TSMC's back-end technologies include CoWoS(Chip on Wafer on Substrate) and InFO(Integrated Fan-out) series of packaging technologies, which have been widely used. For example, Fujitsu A64FX processor carried by Japanese supercomputer "Fuyue", which ranks first in the global TOP 500 supercomputer list this year, adopts TSMC CoWoS packaging technology, and Apple's mobile phone chip adopts TSMC InFO packaging technology.
In addition, TSMC has several specialized back-end fabs, which are responsible for assembling and testing silicon chips, including 3D stacked chips, and processing them into packaged devices.
One of the great benefits of this is that customers can adopt more mature and lower-cost semiconductor technologies in modules that do not change frequently, such as analog IO, RF, etc., and adopt the most advanced semiconductor technologies in core logic design, which not only saves costs, but also shortens the time to market of new products.
TSMC 3DFabric integrates advanced logic and high-speed memory devices into the package module. In a given bandwidth, the wider interface of HBM enables it to run at a lower clock speed, thus reducing power consumption.
In terms of data center scale, the cost savings of these logic and HBM devices are considerable.
Third, Intel uses the "decomposition design" strategy to play a differentiated advantage.Similar to TSMC, Intel has already laid out advanced packaging technologies with multiple dimensions in the field of packaging.
On August 13th, 2020 Intel Architecture Day, Intel released a brand-new Integrated Fan-out technology, and the test chips using this technology were streamed in the second quarter of 2020.Compared with the Thermocompression bonding technology used by most packaging technologies, the hybrid bonding technology can reduce the bump pitch to less than 10 microns, providing higher interconnection density, higher bandwidth and lower power.
▲Intel hybrid bonding technology
Intel has previously introduced standard package, 2.5D embedded multi-interconnect bridge (EMIB) technology, 3D packaged Foveros technology, Co-EMIB technology combining EMIB with Foveros technology, Omni-directional Interconnection (ODI) technology and multi-mode I/O(MDIO) technology, etc. These package interconnection technologies can bring greater scalability and flexibility when superimposed on each other.
According to Song Jiqiang, president of Intel Research Institute, "The development of packaging technology is like building a house. At first, we built a cottage room, then we built a quadrangle, and finally we built a high-rise building. In the case of Foveros 3D, what it achieves is that when building tall buildings, it can make the lines transmit at low power and high speed. "He believes that Intel's advantage in packaging technology is that it can know earlier how this house will be built in the future, that is to say, it can better design future chips.
Facing the future trend of heterogeneous computing, Intel introduced the strategy of "Decomposition Design", which combined with new design methods and advanced packaging technology to split the key architecture components into individual chips still in unified packaging.
That is to say, the original whole SoC chip will be "broken into parts", first made into several major parts, such as CPU, GPU, I/O, etc., and then the fine-grained SoC will be further improved, and the previous idea of combining according to functionality will be changed to combining according to chip IP.
The advantage of this idea is that it can not only improve the efficiency of chip design, reduce the production time, but also effectively reduce the number of bugs caused by previous complex designs.
"The original scheme that must be made on one chip can now be converted into multi-chips. In addition, you can not only use Intel's multi-node process technology, but also the technology of partners. " Song jiqiang explained.After these decomposed widgets are integrated, they have high speed, sufficient bandwidth, low power consumption and great flexibility, which will become a big difference advantage of Intel.
Fourth, Samsung's first 3D packaging technology can be used in 7nm process.Besides TSMC and Intel, Samsung is also accelerating the deployment of its 3D packaging technology.
On August 13th, Samsung also announced that its 3D packaging technology is "eXtended-Cube", referred to as "X-Cube" for short. It is interconnected by TSV and can be used in 7nm or even 5nm processes.According to Samsung, at present, its X-Cube test chip can stack the SRAM layer on the logic layer, which can separate the SRAM from the logic part, thus freeing up more space to stack more memory.
▲X-Cube test chip architecture of Samsung
In addition, TSV technology can greatly shorten the signal distance between bare chips, improve data transmission speed and reduce power consumption.
Samsung said that the 3D packaging technology has achieved a significant leap in speed and efficiency, and will help meet the strict performance requirements of cutting-edge applications such as 5G, AI, AR, VR, HPC, mobile and wearable devices.
Conclusion: Three chip giants storm advanced packaging.
It can be seen that in 2020, the war around 3D packaging technology will continue to escalate, and TSMC, Intel and Samsung, the three advanced chip manufacturers, will step up their efforts to explore a broader space for chip innovation.
Although the core details of these technical methods are different, the same goal is achieved by all means, in order to continuously improve the chip density, realize more complex and flexible system on a chip, and meet the increasingly rich application requirements of customers.
With the approaching limit of manufacturing process and the continuous diversification of application requirements, in the future, chip manufacturers will not only solve the technical challenges such as heat dissipation, but also promote the integration of advanced packaging technologies from different manufacturers.
二、Chip giant decides advanced packaging
From: semiconductor industry observation
Beaufort, a modern French strategist who is famous for his book Introduction to Strategy, once said, "The essence of strategy is prevention rather than cure, and future and preparation are more important than present and implementation. The same is true of the semiconductor industry. When the process miniature curve predicted by Moore's Law begins to passivate, the chips with different process properties are packaged together through multi-chip packaging, and the products that meet the market demand are launched in the shortest time, which becomes a prominent technology whose importance continues to rise. These advanced chip packages have also become the necessary weapons for supercomputers and artificial intelligence. Besides, just talking about nVidia and AMD's dedicated GPU for high-performance computing, Google's second-generation TPU, and countless "artificial intelligence chips", we can see the existence of HBM memory everywhere. After all, there is no comprehensive semiconductor process in the world. The Moore's Second Law, which observes that the cost of advanced process fabs doubles every four years, also highlights the cruel reality that the unit cost of transistors is getting higher and higher. AMD's processor has been "Chiplet-like" from 7 nm process, and it has to divide and rule the CPU core of 7 nm process and I/O memory controller of 12 nm process.
Development of advanced packaging technology in the ascendant
Therefore, both TSMC and Intel are trying their best to increase the code, and related products are springing up one by one, and AMD is writing "X3D package integrating 2.5D and 3D" in its future product plan (although it probably directly follows TSMC's existing technology) to achieve a memory bandwidth density that is ten times higher than that of current products. Let's review what "2.5D" package is for you. TSMC's CowOS (Chip-on-wafer-on-substrate), which has more than 60 actual import cases, is the most well-known technology in this field, including Fujitsu A64FX, which recently topped the Top500 supercomputer list. Kaby Lake-G, which uses Intel's own EMIB (Embedded Multi-Die Interconnect Bridge) to "pile up" Kaby Lake processor and AMD Vega graphics core, was once a hot topic.
Different from the "2D" SiP(System-in-Package), 2.5D is packaged between the SiP substrate and the chip, and a Silicon Interposer is inserted, and the upper and lower metal layers are connected Through-Silicon Via (TSV), thus overcoming the difficulty that the SiP substrate (like multi-layer printed circuit board) is difficult to make high-density wiring, thus limiting the number of chips. The 3D package of "Diediele" is not difficult to understand. TSMC defeated Samsung in the order battle for A10 processor of iPhone 7 by relying on the package thickness InFO(Integrated Fan-Out) which can be reduced by 30%, which ended the embarrassing situation that consumers had to worry about getting the Samsung version A9 when purchasing iPhone 6S (unfortunately, the author was one of the victims). However, the heat dissipation means and heat management of 3D packaging are also formidable challenges in the semiconductor industry.
Intel's corresponding 3D packaging technology is Foveros. The recently published "Hybrid x86 Architecture Processor", named Lakefield, is a stack of 10 nm process (code P1274) computing chips, 22 nm process (code P1222) system I/O chips and PoP(Package-on-Package) packaged memory with "one big and four small cores", and the standby power consumption is only 2mW. Co-EMIB, announced by Intel in July 2019, connects multiple 3D Foveros packages with 2.5D EMIB, and "integrates it into a single chip with more functions". ODI (Omni-Directional Interconnect), which is an extension of the concept of EMIB, is used to bridge the gap between EMIB and Foveros, and to provide more flexibility for many bare chip connections in the package. The details are here. It is also an indispensable technology to connect the bus between several bare chips in the package.
In 2017, Intel officially named "Silicon Bridge", which connects EMIB to bare chips, as "Advanced Interface Bus" and licensed it publicly and freely. In 2018, it donated AIB to the Defense Advanced Research Projects Agency (DARPA) as a patent-free bare chip interconnection standard, and MDIO(Multi-Die I/O) was under AIB. Tsmc's corresponding technology is lipincon (low-voltage-in package-interconnect), and its specifications are different from those of Intel.
System-on-a-chip for supercomputers is not a patent of IBM and Fujitsu.
Readers who have been concerned about ARM instruction set compatible processors and supercomputers for a long time are no strangers to "Fuyue" of Japan Institute of Physical Chemistry, which was previously built with Fujitsu A64FX processor. TSMC's 7-nanometer process and CoWoS 2.5D package of four 8GB HBM2 memories is the most representative "special system single chip for supercomputer" at present, which makes people think of IBM BlueGene /L more than ten years ago.
At the beginning of the 21st century, NEC, which was dominated by the "Earth Simulator" for more than two years, is the newest member of its SX vector processor SX-Aurora TSUBASA, which is also the heart of TSMC's supercomputer with 16 nm process and 2.5D package of six 8GB HBM2 memories. Intel's Xeon Phi series is a well-known representative. Eight 2GB MCDRAM(Multi-Channel DRAM) are packaged in 2.5D package, which can be set as cache memory, main memory or a mixture of both. Although Xeon Phi family was cut off two years ago, interrupting the "super multi-core x86" route since Larrabee, Intel decided to cut it all down and practice again, and build a "traditional GPU" step by step as the foundation of future high-performance computing and artificial intelligence applications, but the importance of heterogeneous multi-chip packaging is still increasing. At least Raja Koduri, who was poached by Intel from AMD and led the development of GPU, said so himself, and there is no doubt.
However, AMD is not absent, and it seems that it has the momentum of coming from behind. Moreover, this is not a whim. Long-term research was started before 2010, and it has been more than ten years since then, and it is "very likely" that it will blossom and bear fruit in the name of EHP (Exa Scale Heterogeneous Processor). X3D, which integrates 2.5D and 3D packaging, is the key to achieve EHP. Exa means 1,000 times of Peta, and it is also the next competitive indicator of supercomputers in recent years. For example, the El Capitan supercomputer of the National Nuclear Safety Administration of the United States, which is scheduled to adopt AMD Zen 2-generation EPYC processor, has a theoretical calculation efficiency of more than 2ExaFlops. Since AMD acquired ATI in 2007, APU, which integrates processor and graphics core, has been struggling all the time, and it is difficult to find suitable product specifications and market positioning. Either CPU is not good enough, GPU is not strong enough, or neither of them is up or down. It was only in the Zen 2 generation that it was completely reborn.Over the years, AMD has gradually become marginalized in the supercomputer market. In June this year, there were only 10 AMD CPUs and one AMD GPU left in the Top500, and powerful new weapons were needed to "break through the blockade of Intel and nVidia". EHP, as the "supercomputer APU", has become the new direction of AMD silently. Starting from Canada ATI's application for the patent of dummy tsv to improve process uniformity and heat dissipation in 2010, AMD has accumulated Cache Data Consistency of Memory Operations (2016), Thermal Management of 3D Die Stack (2017), GPU Architecture with Extreme Bandwidth and Scalable Energy Consumption (2017), Array of Memory Operations (2018), Achievements such as "Loop Out Prediction (2018) to Improve the Efficiency of Idle Mode" to "Dynamic Memory Management of Hybrid CPU and GPU" (2018), It is confirmed that the "server-specific APU" disclosed by AMD at the financial analyst conference in 2015 and the "achieving exascale capabilities through heterogeneous computing" project published by IEEE Micro in July of that year are not fake. What's more, AMD is now the master of the company, and one of the world's highest-paid female CEOs is known for her pragmatism. According to the published information, the general specifications of EHP are as follows, but it is bound to change with the evolution of technology:
EHP also has external memory outside the chip package, such as NVRAM(Non-Volatile RAM, such as Intel/／Micro's 3D Xpoint and developing SST-MRAM) and PIM (Processing-In-Memory, built-in bit operation circuit in memory). The related dynamic memory management and cache data consistency are also the technical thresholds that AMD needs to overcome. As for the completeness of the software environment, it will be the core factor for AMD to catch up with nVidia.
32 CPU cores (8 4-core CCDs at that time).
There are 8 32 GPU CU, totaling 256 CU and 16,384 streaming processors (Vega, which was scheduled to be the fifth generation of GCN at that time, seems to advance to CDNA).
Eight 4GB HBM2 memory stacks.
When the clock is 1GHz, the theoretical efficiency of double floating-point precision is 16TeraFlops, such as 100,000 supercomputers, which is 1.6ExaFlops, and the estimated power consumption is 20MW.
In July, 2015, AMD published a special article on IEEE Micro, indicating that 32 CPU cores, 320 GPU CUs with 1GHz clock (20,480 streaming processors), 3TB/s memory bandwidth and 160W power consumption are the configurations with the best energy consumption ratio. In short, the actual products will definitely change.
The terrible rumors that the technical assets of EHP and X3D will "push themselves to others" to the Zen 3 generation EPYC processor "Milan" (such as 10 CCDs with 80 cores or HBM2 as L4) have never stopped.
Co-screening: nVidia is not full and idle.
Recently, nVidia, whose company's market value surpassed Intel's in one fell swoop because of its "bright future", has an almost unbreakable dominant position in the fields of high-performance computing, artificial intelligence and self-driving. In addition to the book hardware specifications, CUDA's application environment ecology, which has been developed for more than ten years, and GPU virtualization that far surpasses Intel and AMD's (which makes the benefits of deploying cloud personal computers by AMD GPUs obviously inferior to nVidia's, and the virtual GPUs of cloud service providers are the same. Compare the number of loadable clients, and you will know how big the difference is) and more "less humane" are the real foundations that support nVidia's share price.Turning back to the issue of multi-chip packaging, even if the high-level GPU is mainly "trained", nVidia's chip research cases for "inference" are all moving towards "multi-chip package extensibility".
But have you ever thought about a more interesting possibility: since nVidia's high-level GPUs are so big, why don't you just "incidentally" pack a high-performance ARM (or RISC-V) instruction set compatible processor, which is no longer an "accessory" of Intel and AMD processors, and turn the GPU into a "self-bootable supercomputer system single chip"?As a matter of fact, nVidia GPU has built-in several microcontrollers called Falcon(Fast Logic Controller), which are used to assist GPU computing, such as supporting image and graphics decoding to security mechanism, or reducing the burden of CPU executing drivers, such as the scheduling that could not be done before because the Deferred Procedure Call (DPC) of Windows operating system timed out.In 2016, nVidia first developed the first generation Falcon microcontroller by using Rocket, an open source RISC-V instruction set compatible processor of Berkeley University. In 2017, the second generation product was expanded to 64-bit, and new customized instructions were added by itself. The RC18 inference chip, which is made up of the above 27 packaged chips, is also the core of RISC-V. It can perform 128 trillion inferences per second, and its power consumption is only 13.5W W.Then, in the future, what will happen if nVidia moves "more work" to RISC-V core in GPU, especially the "lower layer" where the driver involves a large amount of confidential information at the bottom of GPU, or is concealed by GPU virtualization? This involves another little-known potential requirement: the official open source driver.
Implication: Impact of GPU Driver Open Source
On the table, the issues that can't be seen or few people are inked, and the degree of importance is often far beyond the imagination of bystanders.Whether it's supercomputers or artificial intelligence (especially auto-driving), based on security considerations, customers of chip manufacturers more or less want to check all codes, including drivers of course, which is the main reason why open source of GPU drivers is so important. However, this is a black box that hides a large number of trade secrets. How to meet customers' needs and keep secrets from leaking out, and generously release the "official open source driver", namely nVidia, AMD and even Intel, which is about to "return the GPU front", has already faced long opportunities and challenges.
The development of technology follows the demand of applications, which may also determine whether AMD's attempt to counter-attack the high-performance computing market by means of "supercomputer APU" will be the key to success.
Disclaimer: This article is reproduced from "Filter". This article only represents the author's personal views, not those of Sacco Micro and the industry. It is only for reprinting and sharing, and supports the protection of intellectual property rights. Please indicate the original source and author when reprinting. If there is any infringement, please contact us to delete it.
Company Tel: +86-0755-83044319
QQ: 3518641314 Manager Li
QQ: 332496225 Manager Qiu
Address: Room 809, Block C, Zhantao Technology Building, No.1079 Minzhi Avenue, Longhua New District, Shenzhen