The data warehouse pulls data from the organization's operational data sources and disseminates that data to various data marts to which reports then connect. In a traditional data warehouse implementation, the data must also pass through an Extract, Transform and Load (ETL) mechanism as it passes through the data warehouse.
An operational data source refers to the databases that support the ongoing, day-to-day operations of the business. These databases are usually focused on processing transactions like order entry, billing, HR and other activities that require time-sensitive, real time transaction processing.
A data warehouse typically supports and supplies the data for various data marts in the organization. These data marts can serve a particular department’s reporting and analysis needs or contain data related to a specific subject.
Bill Inmon is recognized as the "father of the data warehouse." In 1991, he published his first book on data warehousing, Building the Data Warehouse. It contains one of the most widely used definitions of a data warehouse.
“A Data Warehouse is a subject-oriented, integrated, time-variant, nonvolatile collection of data in support of management decisions.”
- Subject-oriented: Focuses on natural data groups, not applications boundaries.
- Integrated: Provides consistent formats and encodings.
- Time-variant: Data is organized by time and is stored in diverse time slices.
- Non-volatile: No updates are allowed; only load and retrieval operations.
In today’s organizations, a data warehouse generally has come to be a collection or repository of integrated, detailed historical data that supports centralized data storage and strategic decision-making in an organization. Data volumes tend to grow exponentially in data warehouses since data warehouses store all historical data at a very detailed level, whether there is a current need for it or not. This usually requires very high performance data servers to implement. This type of storage is very useful, among other reasons, if a company needs audit information to see a particular point in time, for example, for compliance purposes.
To understand how a data warehouse works, it's important to describe a simple approach to reporting and then show the data warehouse approach in comparison.