A New Resource Hiding in Plain Sight? Machine Learning in the Data Warehouse

A New Resource Hiding in Plain Sight? Machine Learning in the Data Warehouse

New research from the IT think tank 451 Alliance shows that data warehouse vendors have begun to embed machine learning capabilities directly into their systems.

In an approach we’ll call ‘in-database machine learning,’ the system brings the machine learning to the data instead of the other way around – avoiding the perils that come with moving data around.

How to Be Data-Driven: A Guide to the Importance of Cultural and Organizational Change
How to Be Data-Driven: A Guide to the Importance of Cultural and Organizational Change
Download report >>

Running machine learning on a data warehouse is a logical step. After all, the goal of the data warehouse has always been to support businesses by facilitating data analysis.

If you see the data warehouse as an analytics system, then machine learning is an extension of those analytical capabilities.

The continued growth of analytical systems, which we define as analytic data databases and distributed data-processing frameworks, is a key driver for why machine learning is being paired with data warehouse systems.

As shown below, the Analytic Data Platforms segment revenue is expected to exceed $32bn by 2022 with a combined CAGR of 9.5%, of which analytic databases is expected to have a CAGR of 6%.

Forecasted Revenue for Analytics Data Platforms Market
451 Research

As enterprise data warehouses contain significant amounts of data – perhaps several years’ worth – they’re well-structured for machine learning. Operating within the data warehouse eliminates the need analyze only a portion of the data. Plus, this embedded approach means there’s one less system to maintain.

Data warehouses with embedded machine learning have optimized their algorithms to run in massive parallel processing environments. These algorithms can often be invoked with SQL commands, which makes them accessible to a much larger user base. Data Scientists who don’t specialize in ML-specific programming languages can access the insights buried in these smart data warehouses.

What else are data warehouses doing? Some of them are also integrating ML tools that enable model testing, maintenance validation and deployment. While these additions aren’t mandatory, they minimize the need for end users to turn to other systems for these capabilities.


The 451 Alliance is an invitation-only think tank for IT executives, technologists, and tech-adjacent professionals. Do I qualify?