What is DataStage? Datastage is an ETL tool which extracts data, transform and load data from...
A Data Warehousing (DW) is process for collecting and managing data from varied sources to provide meaningful business insights. A Data warehouse is typically used to connect and analyze business data from heterogeneous sources. The data warehouse is the core of the BI system which is built for data analysis and reporting.
It is a blend of technologies and components which aids the strategic use of data. It is electronic storage of a large amount of information by a business which is designed for query and analysis instead of transaction processing. It is a process of transforming data into information and making it available to users in a timely manner to make a difference.
In this Data Warehouse (DWH) tutorial, you will learn more about-
The decision support database (Data Warehouse) is maintained separately from the organization's operational database. However, the data warehouse is not a product but an environment. It is an architectural construct of an information system which provides users with current and historical decision support information which is difficult to access or present in the traditional operational data store.
You many know that a 3NF-designed database for an inventory system many have tables related to each other. For example, a report on current inventory information can include more than 12 joined conditions. This can quickly slow down the response time of the query and report. A data warehouse provides a new design which can help to reduce the response time and helps to enhance the performance of queries for reports and analytics.
Data warehouse system is also known by the following name:
The Datawarehouse benefits users to understand and enhance their organization's performance. The need to warehouse data evolved as computer systems became more complex and needed to handle increasing amounts of Information. However, Data Warehousing is a not a new thing.
Here are some key events in evolution of Data Warehouse-
A Data Warehouse works as a central repository where information arrives from one or more data sources. Data flows into a data warehouse from the transactional system and other relational databases.
Data may be:
The data is processed, transformed, and ingested so that users can access the processed data in the Data Warehouse through Business Intelligence tools, SQL clients, and spreadsheets. A data warehouse merges information coming from different sources into one comprehensive database.
By merging all of this information in one place, an organization can analyze its customers more holistically. This helps to ensure that it has considered all the information available. Data warehousing makes data mining possible. Data mining is looking for patterns in the data that may lead to higher sales and profits.
Three main types of Data Warehouses (DWH) are:
1. Enterprise Data Warehouse (EDW):
Enterprise Data Warehouse (EDW) is a centralized warehouse. It provides decision support service across the enterprise. It offers a unified approach for organizing and representing data. It also provide the ability to classify data according to the subject and give access according to those divisions.
2. Operational Data Store:
Operational Data Store, which is also called ODS, are nothing but data store required when neither Data warehouse nor OLTP systems support organizations reporting needs. In ODS, Data warehouse is refreshed in real time. Hence, it is widely preferred for routine activities like storing records of the Employees.
3. Data Mart:
A data mart is a subset of the data warehouse. It specially designed for a particular line of business, such as sales, finance, sales or finance. In an independent data mart, data can collect directly from sources.
Earlier, organizations started relatively simple use of data warehousing. However, over time, more sophisticated use of data warehousing begun.
The following are general stages of use of the data warehouse (DWH):
Offline Operational Database:
In this stage, data is just copied from an operational system to another server. In this way, loading, processing, and reporting of the copied data do not impact the operational system's performance.
Offline Data Warehouse:
Data in the Datawarehouse is regularly updated from the Operational Database. The data in Datawarehouse is mapped and transformed to meet the Datawarehouse objectives.
Real time Data Warehouse:
In this stage, Data warehouses are updated whenever any transaction takes place in operational database. For example, Airline or railway booking system.
Integrated Data Warehouse:
In this stage, Data Warehouses are updated continuously when the operational system performs a transaction. The Datawarehouse then generates transactions which are passed back to the operational system.
Four components of Data Warehouses are:
Load manager: Load manager is also called the front component. It performs with all the operations associated with the extraction and load of data into the warehouse. These operations include transformations to prepare the data for entering into the Data warehouse.
Warehouse Manager: Warehouse manager performs operations associated with the management of the data in the warehouse. It performs operations like analysis of data to ensure consistency, creation of indexes and views, generation of denormalization and aggregations, transformation and merging of source data and archiving and baking-up data.
Query Manager: Query manager is also known as backend component. It performs all the operation operations related to the management of user queries. The operations of this Data warehouse components are direct queries to the appropriate tables for scheduling the execution of queries.
End-user access tools:
This is categorized into five different groups like 1. Data Reporting 2. Query Tools 3. Application development tools 4. EIS tools, 5. OLAP tools and data mining tools.
DWH (Data warehouse) is needed for all types of users like:
Here, are most common sectors where Data warehouse is used:
In the Airline system, it is used for operation purpose like crew assignment, analyses of route profitability, frequent flyer program promotions, etc.
It is widely used in the banking sector to manage the resources available on desk effectively. Few banks also used for the market research, performance analysis of the product and operations.
Healthcare sector also used Data warehouse to strategize and predict outcomes, generate patient's treatment reports, share data with tie-in insurance companies, medical aid services, etc.
In the public sector, data warehouse is used for intelligence gathering. It helps government agencies to maintain and analyze tax records, health policy records, for every individual.
Investment and Insurance sector:
In this sector, the warehouses are primarily used to analyze data patterns, customer trends, and to track market movements.
In retail chains, Data warehouse is widely used for distribution and marketing. It also helps to track items, customer buying pattern, promotions and also used for determining pricing policy.
A data warehouse is used in this sector for product promotions, sales decisions and to make distribution decisions.
This Industry utilizes warehouse services to design as well as estimate their advertising and promotion campaigns where they want to target clients based on their feedback and travel patterns.
The best way to address the business risk associated with a Datawarehouse implementation is to employ a three-prong strategy as below
Here, are key steps in Datawarehouse implementation along with its deliverables.
|1||Need to define project scope||Scope Definition|
|2||Need to determine business needs||Logical Data Model|
|3||Define Operational Datastore requirements||Operational Data Store Model|
|4||Acquire or develop Extraction tools||Extract tools and Software|
|5||Define Data Warehouse Data requirements||Transition Data Model|
|6||Document missing data||To Do Project List|
|7||Maps Operational Data Store to Data Warehouse||D/W Data Integration Map|
|8||Develop Data Warehouse Database design||D/W Database Design|
|9||Extract Data from Operational Data Store||Integrated D/W Data Extracts|
|10||Load Data Warehouse||Initial Data Load|
|11||Maintain Data Warehouse||On-going Data Access and Subsequent Loads|
Advantages of Data Warehouse (DWH):
Disadvantages of Data Warehouse:
There are many Data Warehousing tools are available in the market. Here, are some most prominent one:
MarkLogic is useful data warehousing solution that makes data integration easier and faster using an array of enterprise features. This tool helps to perform very complex search operations. It can query different types of data like documents, relationships, and metadata.
Oracle is the industry-leading database. It offers a wide range of choice of data warehouse solutions for both on-premises and in the cloud. It helps to optimize customer experiences by increasing operational efficiency.
3. Amazon RedShift:
Amazon Redshift is Data warehouse tool. It is a simple and cost-effective tool to analyze all types of data using standard SQL and existing BI tools. It also allows running complex queries against petabytes of structured data, using the technique of query optimization.
Here is a complete list of useful Datawarehouse Tools.
What is DataStage? Datastage is an ETL tool which extracts data, transform and load data from...
What is ETL? ETL is an abbreviation of Extract, Transform and Load. In this process, an ETL tool...
What is Data warehouse? A data warehouse is a technique for collecting and managing data from...
What is Database? A database is a collection of related data which represents some elements of the...
With many Continuous Integration tools available in the market, it is quite a tedious task to...
In this tutorial on the difference between Data lake vs. Data warehouse, we will discuss the key...