found id Corporate Information Factory (CIF) Resources by Bill Inmon, Inmon Data Systems

Corporate Information Factory

> home > view content

Cost Justification In The Data Warehouse

The resources required for a data warehouse are notoriously difficult to justify for a variety of reasons. One reason for the difficulty in cost justification is that the benefits of data warehouse are unable to be discerned until the warehouse is built.  Since the DSS analyst operates in a discovery mode, there is no real way of knowing what the benefits of the warehouse will be until the warehouse is up and running. This is the first reason why constructing a data warehouse requires some amount of faith.

But even after the warehouse is built, identifying and quantifying what the benefits is not easy.  From the business person's perspective, data warehousing leads to increased market share, increased profits and revenue, and decreased expenses.

While these business benefits are very real and very appetizing to the business person, there is a problem in precisely ascertaining the effect of the data warehouse on the achievement of those corporate parameters.  The problem with measuring the business effects of data warehouse is that there are many factors that contribute to increased market share and increased revenue, not just data warehouse.  Trying to measure the effect of just the data warehouse on those variables is like trying to pick out a single violin from the sound made by an orchestra.  While any given violin is undoubtedly important to the success of the orchestra, in the final analysis there are many other instruments that also contribute to the music made.

Quantifying the benefit of a data warehouse is then a difficult thing to do. A simplistic and intuitive way to measure the worth of a data warehouse is to ask a maintenance programmer whether a data warehouse is worthwhile. When the maintenance programmer estimates how much work a warehouse can save, the answer is that unquestionably a warehouse is a good thing to do. Or ask an end user who makes use of a data warehouse whether a warehouse is worthwhile, and you get the same response. But the best the programmer and the end user can do is to give you an intuitive feel that a warehouse is a good thing to do. Placing quantified terms on the worth of the warehouse is very difficult to do.

There is however, a simple way to illustrate the worth of a warehouse and at least start to quantify just how much a warehouse is worth. Consider the legacy environment and the architected environment which contains an operational component and a data warehouse component .

In one environment there is the classical legacy application environment. In the other environment is the architected environment.  Both environments have a new report being written from them.  There is about a two order of magnitude difference in cost between the two reports. The report from the legacy environment costs $1000 to build and execute while the report from the data warehouse environment costs $10 to build and execute.  And yet these reports are exactly the same!

Why does the report from the legacy environment cost so much more than the report from the data warehouse environment to build and execute?

There are several good reasons for the discrepancy.

The first cost to be considered is that of execution of the report. The computer resources for the legacy environment cost more than the computer resources for the data warehouse environment.  Furthermore the legacy environment is much tighter in terms of scheduling since there are lots of other activities going on in the legacy environment. The only significant activity going on in the data warehouse environment is that of loading data and executing reports.

In addition, the legacy environment is typically sensitive to online response time while the data warehouse environment is not. Because of this difference in sensitivities, there is much greater latitude for the running of a report in the data warehouse environment.

Another execution difference that is relevant to the costs of executing a report between the two environments is that the legacy environment contains much data that will be used for other than reporting. This data "gets in the way" of efficient reporting, and causes a report to run much longer in the legacy environment.

Still another important execution difference between the two environments is that the internal data structures of the legacy environment were never designed to optimize reporting. The internal data structures of the legacy environment were designed to optimize other kinds of processing. The internal data structures of the data warehouse on the other hand were designed to optimize reporting. It is no surprise then that the costs of execution in the data warehouse are less than the costs of execution in the legacy environment.

Yet another reason why there are execution differences between the environments is that the technology that houses the legacy environment is designed for a wide variety of things. As such, legacy technology is not optimized on performance of reporting. But the technology that houses the data warehouse environment is optimized for reporting.

But report execution is not the only cost difference between the legacy environment and the data warehouse environment. The other major difference between the two environments is that of the development work required in the building of the report in the first place.  In this regard there are MANY substantial differences between the legacy environment and the data warehouse environment.

The data in the legacy environment must be integrated before it can be effectively used for reporting while the data in the data warehouse environment is already integrated. When a DSS analyst goes to use data from the warehouse, the DSS analyst can use the data immediately with no thought of integration. Not so the DSS analyst who wishes to use data from the legacy environment. The DSS analyst who wishes to use data from the legacy environment must first integrate the data. And the task of integration in many cases is a gargantuan task.

The second reason why developing reports from the data warehouse environment is efficient is that since there is metadata in the data warehouse environment, the DSS analyst can quickly and efficiently tell what data is available and where it is inside the data warehouse. The DSS analyst spends his/her time on building reports, not in chasing data down. In the legacy environment there typically is not much organized metadata.  Since there is not much organized metadata, the DSS analyst spends an inordinate amount of time tracking data down, not in building reports. It takes a DSS analyst MUCH longer to start the actual development process in the legacy environment because of the need to research data.

In addition, because the data warehouse environment contains metadata, if a report is already built the DSS analyst can find that out quickly. And if a report is already built, the DSS analyst wastes no time rebuilding it. But in the legacy environment where there is little or no metadata, the DSS analyst has no real way of knowing whether a report has been built or not. The DSS analyst in the legacy environment may well be rebuilding a report that already exists. This wasteful phenomenon greatly increases the cost of reporting in the legacy environment.

Another reason why reporting in the data warehouse environment is easy to do is that there is a wealth of historical data in the warehouse environment. The historical foundation is exactly what the DSS analyst needs for many kinds of reporting and analysis. In the legacy environment there is a paucity of history. When the DSS analyst needs to make a report in the legacy environment that uses historical data, the DSS analyst has quite a task on his/her hands. In some cases the historical data can be resurrected. In other cases the historical data is beyond the pale. In either case it is a struggle for the DSS analyst in the legacy environment to come to grips with the need for historical data.

Yet another important difference between the two environments that affects the cost of developing reports is that of the amount of summary data that resides in the data warehouse versus the amount of summary data found in the legacy environment. In general, there is very little or no summary data in the legacy environment. In the data warehouse, it is no big trick for the DSS analyst to find and reuse summary data that someone else has created. A report for summary data that already exists is very easy to do for the DSS analyst. But there is very little summary data in the legacy environment. And since there is very little metadata in the legacy environment, the DSS analyst could not find the summary data even if it had been created in the legacy environment. For these reasons the DSS analyst has a challenge to create and report on summary data in the legacy environment.

The obstacles to developing a report are such that in the legacy applications environment the report might not even be able to be produced, while building a report is a normal occurrence in the data warehouse.

But the cost of the building and executing a report is only one aspect of the differences relative to reporting between the two environments.

Another important difference between the two environments other than cost of development and execution is that of speed of development.  A report in the legacy environment usually takes a long time to report.  But the report in the data warehouse is able to be created very quickly.

And finally, there is the issue of changing the report. The very nature of creating a report is being in a mode of discovery. The DSS analyst that creates the report often has new ideas once the report becomes reality. The creation of a report is often merely the first act of the process of discovery.  The legacy environment is cumbersome to work with in the first place. But once the report is built, the legacy environment is equally cumbersome to maintain.

The data warehouse environment is ideal for the discovery process. Once a report is built in the warehouse it is easy to maintain. So the initial building of the report is only the start of the cost comparison process.