Without data science, companies can't get full value from data, and there aren't enough data scientists to go around. But automation and training are giving companies access to data science without having to wage a war for talent.
As cognitive and IoT technologies generate ever larger and more varied data sets, companies face the challenge of unlocking the value of that data. And those that are failing to effectively apply data science may be putting themselves at a competitive disadvantage. Data scientist is one of the hottest job titles today, and battles for talent are fierce. Whether the talent shortage is real or overhyped, companies should investigate a mix of new tools, staffing models, and training strategies.
The title data scientist generally refers to a professional with a graduate degree in computer science and expertise in mathematics, statistics, computer programming, and business knowledge.8 These specialists tend to handle a variety of tasks critical to enterprise analytics projects such as collecting, cleansing, and organizing large and varied data sets; designing and testing various algorithms; building and deploying machine learning-based solutions; analyzing data for patterns; and communicating findings to business stakeholders.
Since nearly every major company is actively looking for data science talent, the demand has rapidly outpaced the supply of people with required skills.9 (Based on current demand and supply dynamics, the United States alone is projected to face a shortfall of some 250,000 data scientists by 2024.10) Data science and analytics jobs typically take 45 days to fill, five days longer than the US market average, according to one study.11 The skills gap and longer hiring times can cause project delays and higher costs, hindering enterprises’ data analytics efforts. But a number of recent trends may change how companies acquire and apply data science capabilities, presenting savvy companies with some options for alleviating the talent bottleneck.
Most vendors in the data science and analytics market have made tool simplification a top goal; they are aiming to broaden and accelerate the adoption of data science and analytics capabilities. And an array of training resources is helping professionals with diverse backgrounds gain relevant data science skills. For the foreseeable future, elite data scientists will be in high demand. But five factors are beginning to democratize data science, helping to put this critical capability in the hands of more professionals and potentially alleviating a crippling talent shortage.
Automated machine learning. By some estimates, data scientists spend around 80 percent of their time on repetitive and tedious tasks that can be fully or partially automated.12 These tasks might include data preparation, feature engineering and selection, and algorithm selection and evaluation. Various tools and techniques designed to automate such tasks have been introduced by both established vendors13 and startups.14 Automating the work of data scientists helps make them more productive and more effective. Organizations can make aggressive use of data science automation to empower and leverage oversubscribed talent.
App development without coding. Low-code and no-code software development platforms offer graphical user interfaces, drag-and-drop modules, and other user-friendly structures to help both IT and nontechnical staff accelerate AI app development and delivery. For example, using a no-code platform, salespeople can build a machine learning-based tool themselves to provide product recommendations to customers based on cross-sell opportunities. These platforms have the potential to make software development up to 10 times faster than traditional methods.15 Apart from building their own solutions, key technology vendors16 have acquired startups to offer or strengthen their low-code and no-code platforms. One analyst firm has estimated that the market for these platforms is growing at 50 percent annually.17
Pre-trained AI models. Developing and training machine learning modules is a core activity of data scientists. Now, key AI software vendors18 as well as several startups19 have launched pre-trained AI models, effectively packaging machine learning expertise and turning it into products. These solutions can slash the time and effort required for training,20 or even start producing specific insights right away.21 Mostly pre-trained models are available for use cases related to image, video, audio, or text analysis such as sentiment analysis,22 sales opportunity workflow automation,23 customer service,24 automated equipment inspection,25 and interactive advertising.26 We can expect more pretrained models to come to market in coming months.
Self-service data analytics. Increasingly, business or nontechnical users have tools at their disposal that can deliver data-based insights without involving analytics specialists, including data scientists. Self-service analytics tools offered by many business intelligence and analytics vendors27 now include features to augment data analytics and discovery. Some automate the process of developing and deploying machine learning models. Features such as natural language query and search, visual data discovery, and natural language generation help users automatically find, visualize, and narrate data findings like correlations, exceptions, clusters, links, and predictions. These capabilities empower business users to perform complex data analysis and get quick access to customized insights without relying on data scientists and analytics teams.
Accelerated learning. Data science and AI-related training courses and boot camps are proliferating.28 These training programs are aimed at professionals with basic mathematics and coding backgrounds and can impart basic data science skills in a period ranging from a couple of days to a couple of months. Such courses are intended to enable professionals to bring basic data science skills to projects quickly.
Many organizations don’t recognize the mix of talent and skills required to be successful when applying data science. Some put great faith in data scientists but fail to reckon with the importance of business and functional expertise to the success of a project. A properly staffed initiative may include design-thinking skills to help conceptualize a solution, functional domain knowledge to help identify high-value use cases and shape the solution, business skills to articulate a compelling business case, data engineering skills to provide access to the right data in the form needed, and, for AI projects, AI skills to drive execution of a variety of AI technologies. Success depends on more than technology talent—it requires the right mix of skills and expertise.29
Eventually, the democratization of data science will enable greater collaboration between business and data science experts in building data-centered solutions. Some companies have started effectively expanding their data science efforts by providing data science automation tools to a mix of professionals including data scientists, data engineers, statisticians, and business users.30 Others find that breaking down the data science role into a collection of more specialized roles with overlapping skills makes it easier to get the mix of skills required to staff projects.
To benefit from the democratization of data science and analytics, enterprises need to first address certain challenges. Since a lot of the technological advancements in this area have happened recently, enterprises may encounter resistance to using these solutions. Business users may not be ready to trust them, preferring to continue relying on intuition and traditional decision-making processes. Technical experts, by contrast, may resist changing their workstyle and automating tasks they think of as requiring expert craftsmanship.
On the other hand, embracing the democratization of data science may present a different set of challenges. Without proper onboarding and training, users provided access to data science automation and self-service tools may fail to derive relevant insights or misinterpret or misapply the results in decision-making. Wide adoption of these tools will necessitate instituting governance procedures that run the risk of becoming bottlenecks. Inadequate data controls and governance practices in enterprises may lead to creation of information silos, bad analysis, and lack of accountability. Thus, companies need to prepare to address these challenges before moving forward with data science democratization.
Companies seeking to develop data science capabilities are facing a tight market for talent. To avoid being blocked by a labor shortage, they should consider a multipronged approach, including employing automated tools and pre-trained models, empowering nontechnical users with no-code tools and self-service analytics, and investing in training their own staff in data science by selecting a high-quality, accelerated training option from among the many currently available.
Companies should also explore hybrid staffing models for their data science projects. Rather than overburdening the data scientists with all the analytics work, they can assemble combinations of experts such as data engineers, statisticians, and business analysts and equip them with relevant data science automation and self-service tools. Subject matter experts who can “speak data” to data scientists while “speaking business” to executives can be valuable additions to the teams working on data science projects.31 This helps to foster a culture of collaboration between data science experts and business users, enabling data scientists to focus more on advanced and complex processes while reducing time to access actionable insights for business users.
Those enterprises that seek to build armies of data scientists may continue to struggle to hire the desired talent, end up overspending on salaries, and get stuck with excess human capital in coming years. Those that leverage new automation, self-service, and training solutions may be able to mitigate the data scientist shortage without going on a hiring binge.