How to build a data science and machine learning roadmap in 2022

Hear from the CIO, CTO and other C-level and senior executives on data and AI strategies at the Future of Work Summit on January 12, 2022. Learn more

By closing the gap between their organization’s choice of investing in data science and machine learning (DSML) strategies and the needs of business units for results, data and analytics will dominate the priorities of leaders in 2022. Despite the growing enthusiasm for DSML’s core technology, getting results from its strategy is elusive for ventures.

Market forecasts reflect the initial optimization of the enterprise for DSML. Global revenue for the Artificial Intelligence (AI) market, including software, hardware and services, will grow 15.2% to $ 341.8 billion in 2021 and accelerate to 18.8% growth in 2022, reaching $ 500 billion by 2024. In addition, 56% of global enterprise executives said their number of adoption of DSML and AI is increasing, increasing to 50% in 2020, according to McKinsey.

Gartner notes that organizations relying on the DSML initiative rely on low-cost, open-source and public cloud service provider offerings to build their knowledge, skills and testing use cases. The challenge remains how to best produce a model for deployment and management on a scale.

DSML is delivering unequal value in the enterprise today

Data scientist teams in financial services, healthcare and manufacturing tell VentureBeat that their enterprise’s DSML strategy is most effective when they anticipate and plan for unequal initial results by the business unit. Teams also say that producing models on a scale using MLOps is fundamentally different from creating mainstream built-in applications with DevOps. He adds that the more complex the operating model of a business unit, the greater the MLOps learning curve. The contribution of DSML to business units varies depending on the availability of reliable data and how clearly defined problem statements are.

O’Reilly found that “Enterprise AI will not mature unless development and operations groups engage in a practice such as continuous deployment until results are repeated (at least in a numerical sense), and unless ethics, security, privacy. And security is not a primary concern.

Kagal suggested that 80.3% use linear or logistical regression algorithms, followed by decision trees and random forests (74.1%) and gradient boosting machines (59.5%). While the enterprise is only scratching the potential surface of DSML, the adoption process has slowed down by many factors that need to be improved in 2022.

How and where DSML will improve in 2022

Properly accessing the basic components of the DSML platform accelerates decision-making accuracy, speed, and quality. As the latest Gartner Magic Quadrant shows, DSML platform providers are making progress in providing a more flexible, scalable infrastructure with governance designed to support the needs of multiple individuals on a combined basis with extensibility. Enterprises that McKinsey considers “high performers” use more cloud infrastructure than their peers, with 64% of their AI workloads running on public or hybrid cloud, compared to 44% of their peers. In addition, McKinsey noted that the group relies on public cloud infrastructure to access a wide range of AI capabilities and technologies.

DSML strategies are going to be increasingly adopted by organizations in 2022, and the following are areas where organizations and platform providers can work together to improve outcomes by covering these areas in their roadmap for 2022:

  • Adaptive ML demonstrates the potential for improvement in cybersecurity, remote site security, quality management in manufacturing, and fine-tuning industrial robotics systems.

Find the adaptive ML to find the growing adoption in the spectrum of use cases defined by how quickly their contextual data, terms and actions are changing. For example, the combination of cyber risk and remote site risk assessment in the adaptive ML model is a case in point that utility companies are using today in production. The biggest benefit of adaptive ML can come from the product, where combining telemetry data from visual IoT sensors with adaptive ML-based applications can quickly identify defective products and pull them out of the product line. Reducing the hassle of returning defective products to customers can increase customer loyalty by reducing costs. Given that manufacturers are facing severe labor shortages, the combination of adaptive ML technologies with robotics can help manufacturers still consistently satisfy customer needs for products. Adaptive ML also supports autonomous self-driving vehicle systems and collaborative, smart robots that quickly learn how to complete simple tasks together through repetition. DSML platform vendors are known for their expertise including Kogitai, Google, Guavas, IBM, Microsoft, SAS, Tazi and others.

  • Collaborative workflow support across DSML platforms becomes table stacks to compete in the market.

Data scientists tell VentureBeat that collaborative workflows are not designed to flex the workarounds of the DSML platform and adapt to their needs, as model development can cost weeks of time. Collaboration tools and workflows need to move beyond simple Q&A forums and provide more efficient cross-model data and code repositories that each collaborator can safely use throughout the enterprise. There must also be support for data and model visualization and the option to export the model. Requirements for collaboration to meet the requirements of the Data Scientist include communication and code sharing at each stage of the modeling process, data descent and model tracking and version control and model descent analysis. DSML platform vendors offering collaborative workflow support include Domino, Dataiku, Google, Microsoft, SAS, TIBCO, RapidMiner and others.

  • MLOps will have a breakout year as organizations gain more experience of scaling models for faster deployment while tracking business outcomes for more results.

Reducing the cycle time for creating and launching new models is one of the main criteria for how DSML projects are evaluated in enterprises today. Each DSML platform vendor offers a version of its MLOps support. Enterprises considering DSML strategy need to review how each platform of interest handles model creation, operation, maintenance, reuse of model and code, updates and governance. Look at each DSML platform vendor to continue fine-tuning to provide more model scalability and security in 2022. DSML platform vendors will rely on MLOps Differentiator, which includes model classification, version control, model maintenance, monitoring and code and model. The best reuse DSML platforms also ensure that their MLOps workflow has the option to link back to measure business results for financial decision makers and line-to-business owners using relevant metrics and key performance indicators (KPIs).

  • Privacy concerns will force every organization that makes sensor-connected products and the services that support them to use synthetic data to model, test and refine.

The current and next generation of devices connected with embedded sensors for capturing biometric data is one of the most challenging machine learning models to create today. Startups building AI-based worker safety systems find it necessary to create and fine-tune synthetic data so that they can predict, for example, when, where and how accidents may occur. The Wall Street Journal provides a compelling glimpse into how effective synthetic data is in the development of AI and ML models and how widespread it is. The article explains how American Express modifies its fraud prediction models using a generative adverbial network, a technology widely used to create synthetic data of randomized fraud patterns. Autonomous vehicle companies also rely on synthetic data to train their models, including Aurora, Cruze and Vemo, all of which use synthetic data to train the perception system that guides their cars.

  • DSML platform providers need to scale and automate the entire ML workflow.

Providers have multiple generations of model development tools, and their experience demonstrates the maturity of the workflow they can support. The goal for 2022 is to integrate zero confidence in the MLOps workflow while maintaining the flexibility to improve model deployment and management and customize the workflow. AutoML will adopt more as the enterprise seeks to accelerate their ML workflow, with data scientists skilled in high demand with its techniques. Automating the ML workflow will further trim the cycle time for further reuse of ML code components, model testing and validation, and increase the productivity of data science teams in the process.

  • Transfer learning will be rapidly adopted today in enterprises with a scale and DSML strategy employed in production.

The essence of transfer learning is to reuse existing trained machine learning models to initiate the development of new models. It is especially useful for data science teams working with supervised machine learning algorithms that require labeled data sets to provide accurate analysis. Instead of launching on a newly supervised machine learning model, data scientists can use transfer leveling to quickly customize the model for a given business goal. In addition, transfer learning modules are becoming more relevant in process-oriented industries that rely on computer vision as it provides for labeled data. Leading DSML platform providers offering transfer learning include Alteryx, Google, IBM, SAS, TIBCO and others.

  • Organizations need to first focus on use cases and metrics and understand that exceptional model accuracy cannot provide business value.

One of the most common challenges when creating supervised machine learning models, especially when there is an abundance of telemetry data from sensors and endpoints, is the tendency to keep tweaking models for a greater degree of accuracy. Telemetry data may be scattered from the manufacturing shop floor and it varies among cycle count, frequency and speed of a given machine, among many other factors. It’s easy to know what the real-time telemetry data from the shop floor says about machines, but the primary goal needs to be focused on pulling back to see what the data says about the shop floor productivity and its impact on margins. .

DSML strategy should be based on business results

Organizations pursuing DSML strategies need to go ahead with a clear roadmap in 2022 of what they want to accomplish first in the perspective of a business case, anchored in measurable customer outcomes. The speed and variety of innovations that DSML platform providers plan to announce over the next twelve months will revolve around five key areas. These include the democratization of ML model creation and the provision of model building and fine-tuning for more business professionals. Second, multi-person support for DSML platforms will improve over the next twelve months, supporting greater adoption. Third, automating ML workflows to end-to-end automation will help accelerate the MLOps cycle in 2022, driving the fourth factor of the improved line of business reporting linked to model operations. Fifth, the enterprise wants a faster time-to-value for their DSML investment, and the DSML platform vendor landscape will need to measure their value with greater accuracy and real-time insights to capture customers and attract new ones.


VentureBeat’s mission is to become a digital town square for technical decision makers to gain knowledge about transformative technology and practices. Our site delivers essential information on data technologies and strategies so you can lead your organizations. We invite you to access, to become a member of our community:

  • Up-to-date information on topics of interest to you
  • Our newsletters
  • Gated idea-leader content and discounted access to our precious events, such as Transform 2021: Learn more
  • Networking features and more

Become a member

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *