Smarter, not harder: Beyond brute force compute

Businesses are getting more out of their existing infrastructure and adding cutting-edge hardware to speed up processes. Soon, some will look beyond binary computing entirely.

Mike Bechtel

United States

Bill Briggs

United States

As technology has become a bigger differentiator for enterprises, businesses have built ever-more computationally complex workloads. Training artificial intelligence models, performing complex simulations, and building digital twins of real-world environments requires major computing resources, and these types of advanced workloads are beginning to strain organizations’ existing infrastructure. Typical cloud services still provide more than enough functionality for most business-as-usual operations, but for the cutting-edge use cases that drive competitive advantage, organizations now require highly optimized and specialized computing environments.1

Optimizing code bases for the hardware they run on is likely the first step toward speeding up business applications. An area that’s long been overlooked, this optimization can provide significant performance gains. Beyond that, emerging hardware geared specifically for training AI and other advanced processes is becoming an enterprise mainstay. Graphics processing units (GPUs), AI chips, and, one day, quantum and neuromorphic computers are beginning to define the next era of computing.

Most advances in computing performance have focused on how to get more zeros and ones through a circuit faster. That’s still a fertile field, but, as we’re starting to see, it may not be for much longer. This is leading researchers and tech companies to look for innovative ways to navigate around—rather than through—constraints on computing performance. In the process, they could be laying the groundwork for a new paradigm in enterprise computation in which central processing units (CPUs) work hand in hand with specialized hardware, some based in silicon, others, potentially not.

Now: Past performance not indicative of future returns

The last 50 or so years of computing—and economic—progress have been shaped by Moore’s Law, the idea that the number of transistors on computer chips, and therefore performance, roughly doubles every two years.2

However, chipmakers are increasingly running into physical constraints. At a certain point, there are only so many transistors a piece of silicon can hold. Some observers believe Moore’s Law is already no longer valid.3 This is contested, but at the very least, the end of the runway may be coming into view. Chips are getting both more power-hungry and harder to cool, which hampers performance,4 so even as chip manufacturers add more transistors, performance doesn’t necessarily improve.

All this comes at a bad time: Businesses are increasingly moving toward computationally intensive workloads. Industrial automation is ramping up, with many companies developing digital twins of their real-world processes. They’re also increasingly deploying connected devices and the Internet of Things, both of which create huge amounts of data and push up processing requirements. Machine learning, especially generative AI, demands complex algorithms that crunch terabytes of data during training. Each of these endeavors stands to become major competitive differentiators for enterprises, but it’s not feasible to run them on standard on-premises infrastructure. Cloud services, meanwhile, can help bring much-needed scale, but may become cost-prohibitive.5

The slowing pace of CPU performance progress won’t just impact businesses’ bottom lines. NVIDIA CEO Jensen Huang said in his GTC conference keynote address that every business and government is trying to get to net-zero carbon emissions today, but doing so will be difficult while increasing demand for traditional computation: “Without Moore’s Law, as computing surges, data center power use is skyrocketing.”6

After a certain point, growing your data center or increasing your cloud spend to get better performance stops making economic sense. Traditional cloud services are still the best option for enabling and standardizing back-office processes such as customer relationship management, enterprise resource planning (ERP), enterprise asset management, and human capital management. But running use cases that drive growth, such as AI and smart facilities, in traditional cloud resources could eventually eat entire enterprise IT budgets. New approaches, including specialized high-performance computing, are necessary.7

New: Making hardware and software work smarter, not harder

Just because advances in traditional computing performance may be slowing down doesn’t mean leaders have to pump the brakes on their plans. Emerging approaches that speed up processing could play an important role in driving the business forward.

Simple

When CPU performance increased reliably and predictably every year or two, it wasn’t the end of the world if code was written inefficiently and got a little bloated. Now, however, as performance improvements slow down, it’s more important for engineers to be efficient with their code. It may be possible for enterprises to see substantial performance improvements through leaner code, even while the hardware executing this code stays the same.8

A good time to take on this task is typically during a cloud migration. But directly migrating older code, such as COBOL on a mainframe, can result in bloated and inefficient code.9 Refactoring applications to a more contemporary code such as Java can enable enterprises to take advantage of the modern features of the cloud and help eliminate this problem.

The State of Utah’s Office of Recovery Services recently completely a cloud migration of its primary case management and accounting system. It used an automated refactoring tool to transform its code from COBOL to Java and has since seen performance improvements.

“It’s been much faster for our application,” says Bart Mason, technology lead at the Office of Recovery Services. “We were able to take the functionality that was on the mainframe, convert the code to Java, and today it’s much faster than the mainframe.”10

Situated

Using the right resources for the compute task has helped Belgian retailer, Colruyt Group, embark on an ambitious innovation journey that involves automating the warehouses where it stores merchandise, using computer vision to track and manage inventory levels, and developing autonomous vehicles that will one day deliver merchandise to customers.

One way to manage the compute workload is to leverage whatever resources are available. Brechtel Dero, division manager at Colruyt Group, says thanks to the proliferation in smart devices, the company had plenty of computation resources available.11 However, many of these resources were in operational technologies and weren’t tied to the company’s more traditional digital infrastructure. Developing that connective tissue was initially a challenge. But Dero says Colruyt benefitted from a supportive CEO who pushed for innovation. On the technical side, the company operates a flexible ERP environment that allows for integration of data from a variety of sources. This served as the backbone for the integration between information and operations technology.

“It’s about closing the gap between IT and OT, because machines are getting much smarter,” Dero says. “If you can have a seamless integration between your IT environment, ERP environment, and machines, and do it so that the loads and compute happen in the right place with the right interactions, we can make the extra step in improving our efficiency.”12

Specialized

Smarter coding and better use of existing compute resources could help enterprises speed up many of their processes, but for a certain class of problems, businesses are increasingly turning to specialized hardware. GPUs have become the go-to resource for training AI models, a technology that is set to drive huge advances in operational efficiency and enterprise innovation.

As the name suggests, GPUs were originally engineered to make graphics run more smoothly. But along the way, developers realized that the GPUs’ parallel data-processing properties could streamline AI model training, which involves feeding terabytes of data through algorithms, representing one of the most computationally intensive workloads organizations face today. GPUs break problems down into small parts and process them at once; CPUs process data sequentially. When you’re training an AI algorithm on millions of data points, parallel processing is essential.13 Since generative AI has gone mainstream, the ability to train and run models quickly has become a business imperative.  

Large tech and social media companies as well as leading research, telecom, and marketing companies are deploying their own GPUs on their premises.14 For more typical enterprises, however, using GPUs on the cloud is likely to be the most common approach. Research shows cloud GPUs reduce AI model training costs by six times and training time by five times compared with training models on traditional CPUs on the cloud (figure 1).15 Most leading chip manufacturers are offering GPU products and services today, including AMD, Intel, and NVIDIA.

However, GPUs aren’t the only specialized hardware for training AI models. Amazon offers a chip called Inferentia, which it says aims to train generative AI, including large language models. These chips are built to handle large volumes of data while using less power than traditional processing units.16

Google also is in the AI chip game. It offers a product it calls Tensor Processing Units, or TPUs, which it makes available through the Google Cloud service. These processors fall under the category of application-specific integrated circuits, optimized to handle matrix operations, which underlie most machine learning models.17

Specialized AI chips are likely to continue to gain prominence in enterprise settings in the coming months as businesses realize the value of generative AI. Increased adoption of AI may strain most organizations’ existing data center infrastructure, and the higher performance of custom chips compared with general-purpose resources could become a major competitive differentiator.

This doesn’t mean enterprises will reap these benefits overnight. Historically, there’s always been a lag between the wide availability of specialized hardware and the development of standards and ecosystems necessary for using hardware to its fullest. It could be years before enterprises move at pace to adopt these innovations. Enterprises can develop ecosystem partnerships to prepare for emerging technologies and have ready the skills needed to take advantage of these innovations as soon as the business case is ripe.

By

Mike Bechtel

United States

Bill Briggs

United States

Endnotes

  1. Shankar Chandrasekaran and Tanuj Agarwal, The secret to rapid and insightful AI-GPU-accelerated computing, Deloitte, 2022.

    View in Article
  2. Brittanica, “Moore’s law: Computer science,” accessed October 31, 2023. 

    View in Article
  3. David Rotman, “We’re not prepared for the end of Moore’s Law,” MIT Technology Review, February 24, 2020. 

    View in Article
  4. A16Z podcast, “AI hardware, explained,” podcast, July 27, 2023.

    View in Article
  5. Ranjit Bawa, Brian Campbell, Mike Kavis, Nicholas Merizzi, Cloud goes vertical, Deloitte Insights, December 7, 2021.

    View in Article
  6. [1] Jensen Huang, “NVIDIA GTC 2024 keynote,” speech, NVIDIA, accessed October 31, 2023.

    View in Article
  7. Christine Ahn, Brandon Cox, Goutham Balliappa, Tanuj Agarwal, The economics of high-performance computing, Deloitte, 2023.

    View in Article
  8. A16Z podcast, “AI hardware, explained.”

    View in Article
  9.  Stephanie Glen, “COBOL programming skills gap thwarts modernization to Java,” TechTarget, August 10, 2022.

    View in Article
  10. Interview, Bart Mason, technology lead, Utah Office of Recover Services, July 28, 2023.

    View in Article
  11. Interview with Brechtel Dero, division manager, Colruyt Group, August 18, 2023. 

    View in Article
  12.  Ibid.

    View in Article
  13. Anh et al., The economics of high-performance computing.

    View in Article
  14. NVIDIA, “NVIDIA hopper GPUs expand reach as demand for AI grows,” press release, March 21, 2023.

    View in Article
  15. Anh et al., The economics of high-performance computing.

    View in Article
  16. Amazon Web Services, “AWS inferentia,” accessed October 31, 2023. 

    View in Article
  17. Google Cloud, “Introduction to cloud TPU,” accessed October 31, 2023. 

    View in Article
  18. Cem Dilmegani, “Quantum annealing in 2023: Practical quantum computing,” AIMultiple, December 22, 2022. 

    View in Article
  19. Deloitte, “Quantum annealing unleashed: Optimize your business operations,” video webinar, August 3, 2023. 

    View in Article
  20. Interview, Katie Pizzolato, director of theory and quantum computational science, IBM Quantum, October 16, 2023.

    View in Article
  21. Victoria Corless and Jan Rieck, “What are neuromorphic computers?Advanced Science News, March 13, 2023. 

    View in Article
  22. Filipp Akopyan et al., TrueNorth: Design and tool flow of a 65 mW 1 million neuron programmable neurosynaptic chip, IBM, October 1, 2023.

    View in Article
  23.  Intel Labs, “Neuromorphic computing and engineering, next wave of AI capabilities,” accessed October 31, 2023. 

    View in Article
  24. Bert Jan Offrein, “Silicon photonics,” IBM, accessed October 31, 2023; Microsoft, “AIM (Analog Iterative Machine),” accessed October 31, 2023.  

    View in Article

Acknowledgments

The authors would like to thank the following members of the office of the chief technology officer without whom this report would not have been possible: Caroline Brown, Ed Burns, Abhijith Ravinutala, Adrian Espinoza, Heidi Morrow, Natalie Haas, Stefanie Heng, Kelly Raskovich, Nathan Bergin, Raquel Buscaino, Lucas Erb, Angela Huang, Sarah Mortier, and Nkechi Nwokorie.

Additionally, the authors would like to acknowledge and thank the extended team and collaborators: Deanna Gorecki, Ben Hebbe, Lauren Moore, Madelyn Scott, and Mikaeli Robinson.

The authors also wish to thank the many subject matter leaders across Deloitte who contributed to the research, the Deloitte Insights team, the Marketing Excellence team, and the Knowledge Services team.

Cover image by: David McLeod