January 2, 2024

What is a Data Stack?

John Wesel
11 Jan 2022
5 min read
A Data Stack refers to a collection of technologies that come together to form a company’s data storage, transformation, and analytics capabilities. A properly functioning data stack enables everyone in an organization to easily access relevant data quickly to make better decisions.

The Traditional Data Stack

Decentralized systems produce data silos

The traditional data stack (seen above) was often implemented to produce reports to supplement the data produced from an existing centralized ERP system. As cloud computing and SaaS software has evolved many companies no longer have all of their data in a centralized ERP system but have many specialized systems that each have their own data. As these decentralized systems start to produce useful data, companies want to easily combine this data with their centralized ERP data to create new insights.

The Modern Data Stack

A centralized modular systems allows for data to stay organized and be accessed by anyone.

The modern data stack has several elements to help better consolidate data for analysis.

Data Warehouse Layer: The foundational element of the data stack is the modern data warehouse. A modern data warehouse is optimized for storing large amounts of data from multiple sources. It is tuned to return data quickly to analytics tools for analysis.

Extraction Layer: The next layer is the extraction layer. Previously IT team would often try to perform the extraction and transformation of data in one step. This method locked business rules(transformations) with the IT team that belonged in the hands of data analysts.In this new model, the data is extracted from the source system and stored unaltered in the data warehouse in a raw state. Since the data is unaltered from the source system it also makes it much easier to audit for wrong or missing data.

Transformation Layer: The next layer is the transformation layer. This layer has all of the business logic and rules. Rules such as how data from different sources relate to each other as well as formulas to calculate things like gross margin, average order value, or customer lifetime value.

Analytics Layer: This is the final layer. This connects to the transformation layer but not the raw extraction layer. This allows for presenting the business with a clean, organized, curated number of datasets for analysis. It also allows for removal of sensitive information such as personally identifiable information (PII)

The Post-Modern Data Stack

The future is clean organized data that can actually be used in Machine Learning and AI

The post-modern data stack is the next phase of data stack development, but it is in a very early stage for most companies. The primary additions are Machine Learning and AI capabilities. The capabilities will help companies improve demand forecasting, lead scoring, fraud detection along with many other possibilities.