Image Source: HPCWire
A few weeks ago, I wrote about semiconductors and how an unassuming man in his 50s turned the industry on its head. Something that resulted in a geopolitical song and dance that put Taiwan both at the centre and on the sidelines simultaneously.
This post, while not of a similar ‘success-against-odds’ persuasion, is adjacent to that.
It might be remiss to assume any knowledge about semiconductors as common, but we are familiar enough with the subject to understand that ‘small is good’ when it comes to chips.
Besides the obvious size advantages to the devices in which these serve as the ‘building blocks’ as well as being more energy efficient, it additionally means more transistors can be packed into a tinier space.
This is why the semiconductor industry was pretty damn motivated to uphold and continue the self-fulfilling prophecy that is Moore’s Law – originally a decade-long forecast in 1965 that the number of components per integrated circuit would double every year. It was later revised to suggest that it would double every 2 years for the following decade. Interestingly, there were no empirical observations to inform Gordon Moore of the revision. He just said so.
(This was when he was the Director of Research & Development at Fairchild Semiconductor Corporation after he left Shockley’s Traitorous Eight. This is an interesting rabbit hole on its own – one that takes you to the origins of Silicon Valley, how it gave way to the formation of over 400 companies, including Intel and AMD, and how it led to immense contributions in the space race between the US and Soviet Union. Do give it a read here)
Back to Moore’s Law.
You know how you set a hairy, audacious target for yourself, and laughingly get around to giving it a shot. You are sincere in your endeavour, but you are indifferent about the outcome, because you know there can only be one and not of the good sort. A little ways into it, somewhere between an honest work ethic and doing it for kicks, you are successful and pleasantly surprise yourself. Curiosity piques. You want to see how much further you can push your luck, if you can call it that.
Well, that’s what an entire industry did for the last four decades to keep up with Moore’s Law. Setting audacious goals has meant good things for humanity from time to time, and this falls squarely in that category.
Here’s a highlight reel where a few analysts speak about the impact of Moore’s Law in 2015, at 50 years since it was first introduced by Gordon Moore.
I am going to quote Jon Peddie from the video (overlooking his slight condescension towards the end of the clip) about his views on what it’s going to be like in the coming 50 years – ‘faster, smaller, cheaper’.
Except, that was not exactly how it played out –
Large chip manufacturing enterprises, such as Intel, have delayed their rollout of smaller transistors in the past, and have allowed more time to pass between their chip generations. In other words, chip manufacturers are slowing their chip development schedules and rollouts. Industry leaders are also abandoning strategic roadmaps that are linked to Moore’s Law and future projections of more robust computer systems that are estimated to rollout with each passing year.End of Moore’s Law – What’s next for the future of computing, Brainspire
When keeping up with Moore’s Law started becoming extremely difficult with every generation of chips, it began to have a widespread impact on investments and innovation within the industry. The virtuous cycle of the semiconductor industry of – Smaller Transistors -> Improved Cost and Performance -> Greater Market Growth -> Higher Investments -> Even smaller transistors – started to see cracks.
A low growth rate will not affect our personal devices. If anything, the rate at which devices become obsolete will slow down, which is not such a bad thing, because planned obsolescence and indiscriminate consumption.
The biggest impact will be on how fast supercomputers evolve and the telling effect it will have on humanity’s ability to solve some of our greatest challenges.
[…] the growth rate in supercomputer performance was not only predictable but constant: about 80 percent annually. There were fits and starts from year to year, but in three-year increments, the growth rate stayed firm. In the interval from 2002 to 2013, performance growth multiplied by a staggering 1000x.
“Then, in 2013, that collapsed very rapidly and very instantaneously,” he continued. “Since 2013, we see again exponential growth, but at a rate that is now much lower” — approximately 40 percent per year.After Moore’s Law: How Will We Know How Much Faster Computers Can Go?, Data Center Knowledge
Why are supercomputers and its computing power that important to humanity? Well, because they are necessary to conduct complex calculations that have real-world implications right from running virtual nuclear tests to forecasting ever-changing weather patterns to reducing pollution from airplanes to the very current COVID-19 response strategies. Here is an article that talks about a few ways supercomputers have changed our lives.
So, yes – a drop in supercomputing speed is a cause for concern and this has a direct link to the slowing down of the semiconductor industry’s ability to keep up with Moore’s Law. Intel dropped the ball with delays in its delivery of specially designed chips for Aurora, the first exascale supercomputer in the world to be installed in the US (There is an interesting sub-plot brewing between US and China about who gets there first – read here and here)
A possible solution to this that can dramatically improve computing speeds comes in the form of a contrarian approach taken by certain chip architects who looked elsewhere when a majority of their colleagues worked tirelessly to shrink transistors on a chip.
And this has been in the works for a little over half a century.
Chip architects had long wondered if a single, large-scale computer chip might be more efficient than a collection of smaller ones, in roughly the same way that a city—with its centralized resources and denser blocks—is more efficient than a suburb. The idea was first tried in the nineteen-sixties, when Texas Instruments made a limited run of chips that were a couple of inches across.
Texas Instruments figured out workarounds, but the tech—and the demand—wasn’t there yet.
An engineer named Gene Amdahl had another go at the problem in the nineteen-eighties, founding a company called Trilogy Systems. It became the largest startup that Silicon Valley had ever seen, receiving about a quarter of a billion dollars in investment.World’s largest computer chip, Illinois News Today
Trilogy Systems introduced redundancies into the chips to improve the yield problem that TI experienced. While it improved the yield, it slowed down the chip. Before there could be any further advancements, Amdahl ran into a different set of problems – personal, professional, and nature-induced. He possibly had the worst luck, and the venture never took off. It crashed and burned magnificently, leading to the coining of the name ‘crater’ for those companies that absorb a lot of capital only to implode later leaving nothing for investors.
The magnificence of the fall of Trilogy Systems ensured there was no sustained interest in larger chips. The pace at which personal computing devices exploded in the 1990s and the 2000s effectively decided where the demand was and, naturally, funding for innovation went in that direction.
This was the dominant narrative until neural networks and deep learning became indispensable to solving complex human problems. In 2015, Andrew Feldman began working on the design of a wafer-scale chip to specifically help ‘train’ deep learning models quicker, and founded Cerebras. This was off the back of a $334 million exit for his server manufacturing company that he cofounded. Once he had a plausible solution, he went out and raised funding. It was also around the same time companies such as Nvidia started customising their chips for deep learning and artificial intelligence.
The world has figured out that A.I. and A.I. chips are now infrastructure. It is at the heart of enabling the next two decades of fundamental change to mankind.World’s largest computer chip, Illinois News Today
In 2019, Cerebras launched its first supercomputer, CS-1, with its first-generation wafer-scale chip, specifically meant to reduce ‘training’ time for AI models from months to minutes. It is the largest chip ever manufactured. It is only 75% of the size of an A4 sheet, yet is 56x the size of the largest GPU in the market. It has over 1.2 trillion transistors, while the maximum on the most powerful GPUs are < 60 billion. What Cerebras has managed to achieve with its wafer-scale chip is beyond mind-blowing.
And Cerebras launched its second generation of wafer-scale chips in their latest supercomputer, CS-2, in April this year and are deploying them this quarter. It has 2.6 trillion transistors.
Here’s a snippet of the orders of magnitude by which it exceeds its closest competition, if you can call it that –
Why is there a need for this monster of a chip?
Short answer – the drag on innovation due to poor performance of existing computing solutions and an ever-increasing demand for high performance computing.
Deep learning has emerged as one of the most important computational workloads of our generation. Its applications are widespread and growing. But deep learning is profoundly computationally intensive. Between 2015 and 2020, the compute used to train the largest models increased by 300,000x. In other words, AI compute demand is doubling every 3.5 months.
Because of this voracious demand, AI is constrained by the availability of compute; not by applications or ideas. Testing a single new hypothesis — training a new model — can take weeks or months and can cost hundreds of thousands of dollars in compute time. This is a significant drag on the pace of innovation, and many important ideas are ignored simply because they take too long to test.Cerebras Systems: Achieving Industry Best AI Performance Through A Systems Approach, White Paper 03
What Cerebras is doing for supercomputing is phenomenal. It managed to achieve cluster-scale performance without the penalties of building large clusters and the limitations of distributed training.
Here are a couple of examples of how it is integrated into existing supercomputers and successfully transformed them into computing beasts.
The U.S. National Energy Technology Laboratory reported that its CS-1 solved a system of equations more than two hundred times faster than its supercomputer, while using “a fraction” of the power consumption. “To our knowledge, this is the first ever system capable of faster-than real-time simulation of millions of cells in realistic fluid-dynamics models,” the researchers wrote. They concluded that, because of scaling inefficiencies, there could be no version of their supercomputer big enough to beat the CS-1.
Kim Branson, who leads GlaxoSmithKline’s A.I. team, said that the company had used a CS-1 to do many tasks, including analyzing DNA sequences and predicting the outcomes of mutations, as part of a collaboration with Jennifer Doudna, the Berkeley biochemist who shared a Nobel Prize last year for her work on crispr. Branson found that, in the DNA-sequencing work, the CS-1 was about eighty times faster than the sixteen-node cluster of G.P.U.s he’d been using.World’s largest computer chip, Illinois News Today
It will be important to mention at this point that Cerebras’ approach is one way to meet the demand. That’s the thing about Artificial Intelligence. It ushered in the opportunity for nimbleness in chip design to meet different computing requirements. It is a significant shift away from the rigidity that came with fixed computing power and the consequent limited application of off-the-shelf chips, including the ones customised for performance computing.
More than two hundred startups are designing A.I. chips, in a market that by one estimate will approach a hundred billion dollars by 2025. Not all of the chips are meant for data centers; some will be installed in hearing aids, or doorbell cameras, or autonomous carsWorld’s largest computer chip, Illinois News Today
And all this was made possible when we moved beyond the gold-standard for semiconductor manufacturing.
Smaller was better when it came to chips and computing power. Absolutely safe to say that is no longer the case.