The predictive insights organizations can gain from the oceans of data they generate are currently limited by the availability of experts who can crunch this data. Automation may change that.
As data proliferates in organizations, there is an increasing need to understand its implications through the generation of insights. Insight generation through business intelligence and analytics has been available for almost half a century, but it typically required the help of trained analysts. The insights needed by decision-makers within an organization were constrained by the number of analysts, and without easy access to analytics, these analysts were often forced to rely on experience and intuition. To make matters worse, difficult-to-use technologies made it challenging for most business people to find and analyze the data they needed to generate insights.
Explore the Analytics collection
Visit the AI and cognitive technologies collection
Subscribe to receive related content
Over the past several decades, multiple technologies have been used to democratize the creation of insights, including interactive statistical packages, spreadsheets, easy-to-use visual analytics tools, and the like. But we don’t think they are enough for today’s complex technology and data environment.
The rapid increase in the amount of data and the power of sophisticated algorithms to analyze it means that new interventions are required to deliver new levels of insight. Previous democratization technologies were mostly capable of generating descriptive analytics insights about the past. Companies increasingly want to generate predictive models that provide insights about what might happen to their businesses in the future and prescriptive analytics that guide employees and customers to take actions that drive business results. Achieving these goals requires a level of statistical and data science sophistication that is still relatively rare within organizations, and that limits the number of useful insights that a company can produce.
Or at least it used to be a limiting factor. Predictive analytics—which is the same as more straightforward forms of statistical machine learning—can now be performed largely on an automated basis. Many of the key tasks required for machine learning—including data preparation, “feature engineering” or variable transformation, trying out different algorithm types, creation of program code or APIs for model deployment, and even creation of explanations of what factors are particularly important in a model—can increasingly be done by machines. Automated machine learning software is now available from AI-oriented firms such as Google, established analytics firms such as SAS, and startups such as DataRobot and H2O.ai.
Automated machine learning (often called AutoML) can certainly enhance the work of professional analysts and data scientists by automating workflow and dramatically increasing the speed with which a variety of overall hypotheses and individual model attributes can be tested. The rise of analytics and big data has led to many new or rediscovered algorithms. Most statistical analyses in the past relied heavily on linear regression analysis. More recently, logistic regression has become much more popular for making predictions of binary outcomes that are frequently used to drive day-to-day business activities. Now, a wide range of algorithms is available to the machine learning modeler. Data and algorithms are expanding rapidly, but human capabilities—even those of quantitative professionals—are not. AutoML is a way to enhance the productivity and effectiveness of even the best-trained analytical professional or data scientist.
At a large US property and casualty insurance company, for example, modeling productivity for data scientists was the primary objective in adopting AutoML. Thus far, notes the head of data science support, “It has been a very helpful throughput tool.” The insurance giant uses AutoML to get a quick reading on the ROI of alternative machine learning projects. “We get some data, turn DataRobot (an AutoML tool from a Boston-based startup) loose on it, and see what the prediction accuracy is for the model. It’s so quick that we can figure out the value of an analysis without taking a lot of time to assess it,” notes the manager. The company can learn what the key parameters of the model are, what algorithm is best-suited to the problem, and what the likely ceiling is on model accuracy. If it seems to be a promising analysis, the company will take it further—typically using nonautomated machine learning tools—and perhaps put it into production.
At Sumitomo Mitsui Card Company (SMCC), the largest credit card company in Japan, AutoML has been applied both to risk modeling and customer insight/marketing applications. In the risk modeling area, some analysts and data scientists were doing machine learning manually, but it could take up to half a year to build and validate a model. The use of AutoML cut that time to hours or a few days. Hiroki Shiraishi, who leads a group providing machine learning infrastructure to SMCC’s business units, notes that the company wanted to accelerate the process of analyzing credit card data, and there were not enough skilled analysts to meet the need. Therefore, increasing modeling productivity was a key objective.
The greatest benefits in expanding insights, however, can come from broadening the population that can perform sophisticated machine learning analyses.1 Data scientists are typically difficult to hire and retain, and can be a limiting factor to insight generation even with greater productivity. In addition, business analysts with only moderate quantitative skills often understand the business and customer needs better than many data scientists. For these reasons, companies are attempting to expand the population of users of machine learning beyond data scientists. While some AutoML tools, such as Google’s Cloud AutoML and H2O.ai’s Driverless AI, are more oriented to more traditional data scientists (that is, individuals with PhDs in statistics and/or computer science), there are several platforms (such as DataRobot’s AutoML tools) that are oriented to both data scientists and quantitatively oriented business analysts.
For example, at 84.51, a subsidiary of Kroger that performs sophisticated data and analytics work for the grocer, the initial focus for AutoML was improving the productivity of data scientists. But the group has also used the automated tools to expand the number of people who can do machine learning. 84.51 has been growing its data science function to meet demand for modeling and analytics to solve complex business problems. It has been a challenge to find data scientists with the array of skills needed to work with business partners to engineer solutions and to develop and deploy models using current best methods. 84.51° employs tools such as DataRobot to “expand the bench.” Some experienced data scientists were concerned that they were moving to a world in which knowledge of algorithms and methods had no currency—a common issue with AutoML—but the company’s leaders emphasized that the new tools empowered people to get things done more efficiently, and there is now no pushback. 84.51 now regularly hires “insights”-focused data scientists—people who don’t have as much experience with machine learning, but who are skilled at communicating and presenting results, and who have high business acumen. Aided by AutoML, a substantial number of use cases and steps within traditional model development (such as use case identification and exploratory analyses) fit within their capabilities.
There is an even stronger focus on expanding the user base with AutoML at Royal Bank of Canada (RBC). It is investing in artificial intelligence and machine learning, currently employing over 200 data scientists working across the bank. Samer Nusier, the bank’s director of portfolio management and credit strategy, explained that many of the bank’s serious data scientists prefer to develop and tune their models using traditional methods. He, however, is an advocate of the “citizen data scientist” supported by AutoML. He notes that of the three traditional data science skills—math, computer science, and business domain knowledge—the math and computer science work are increasingly being done by tools like AutoML. When business analysts who understand the data and customer behavior create the models, they can be as useful as models created by data scientists. “It gives them superpowers,” he notes. Nusier feels that “purple people”—those who understand both some analytics and are business experts—can be equally valuable if supported by AutoML.
The proliferation of roles that can perform advanced analytics means that companies will need to clarify who does what and establish a governance model that balances capabilities, benefits, and risks. It probably wouldn’t be feasible, for example—at least at the moment—for a business analyst to employ a deep learning neural network model for image or speech recognition. Providing secure access to the volumes of appropriately cleansed and frequently updated data required for analyses is often another initial step. However, for straightforward machine learning models involving regression-oriented tools, there may no longer be any need to employ a data scientist. Automated machine learning tools, which will undoubtedly continue to advance in capability, can make possible the generation of advanced analytical insights at a much faster and broader level than ever before. The ability of an organization to take advantage of the curiosity, talent, and ingenuity at all levels of the company to increase performance is the underlying business driver and will be a central tenet to establishing and maintaining a competitive advantage going forward.