found id
The Web-based eBusiness environment has tremendous potential; it is a remarkably powerful medium for delivery of information. But there is nothing intrinsically powerful about the Web other than its ability to deliver information. In order for the Web-based eBusiness environment to deliver its full potential, it requires an infrastructure in support of its information processing needs. The infrastructure that best supports the Web is called the Corporate Information Factory (CIF), which is centered around a data warehouse.
Figure 1: The basic infrastructure supporting the Web-based eBusiness environment.
Figure 1
The heart of the Corporate Information Factory is the data warehouse where corporate granular, integrated, historical data resides. It serves many functions, the most important is making information available cheaply and quickly. Stated differently, without a data warehouse the cost of information goes sky high and the length of time required to get information is exceedingly long. If the Web-based eBusiness environment is to be successful it is necessary to have information that is cheap to access and immediately available.
How is it that the data warehouse greatly lessens the cost of getting information? And how does the data warehouse greatly accelerate the speed with which information is available? These issues are not immediately obvious when looking at the structure of the Corporate Information Factory.
In order to explain how the data warehouse accomplishes its important functions, consider the seemingly innocent request for information in a manufacturing environment where there is no data warehouse. A financial analyst wishes to find out what corporate sales were for last quarter. Is this a reasonable request for information? Absolutely yes, it is. Now, what is required to get that information? Figure 2 explains that many different places have to be consulted in order to get the desired information.
Figure 2
Some of the data is in IMS. Other data is in VSAM. Yet other files are in ADABAS. The key structure of the European file is different from the key structure of the Asian file. The parts data uses different closing dates than the truck data. The body design for cars is called one thing in the cars file and another thing in the parts file. In order to get the information desired, much analysis is required and ten programs are needed to access and integrate the data. It takes six months to deliver the information and costs $250,000.
These numbers are typical for a mid to large sized corporation. In some cases these numbers are very much understated.
The real issue is not the costs and length of time required for accessing data, the real issue is how many resources are required for accessing many units of information. Figure 3 illustrates that seven different types of information have been requested.
Figure 3
The costs that were described for Figure 2 now are multiplied by seven (or whatever number of units of data are required). As the analyst is developing the procedures for getting the unit of information required, there is no thought given to getting information for other units. Therefore, each time a new piece of information is required, the process described in Figure 2 is repeated. This raises the cost of information dramatically.
Now suppose the organization had a data warehouse. What is the cost and length of time for fulfilling a request for seven units of information? Figure 4 illustrates this scenario.
Figure 4
Once the data warehouse is built, it is able to serve multiple requests for information. The granular, integrated data that resides there is ideal for being shaped and reshaped. One analyst can look at the data one way, another analyst can look at the same data in another way. The infrastructure needs to be created only once. In order to get a unit of data - such as consolidated sales - the financial analyst may spend a half hour. Or if the data is difficult to calculate it may require a day. Depending on the complexity and how costs are calculated, it may cost from between $100 to $1000 to access the data. Compare these costs to those where there is no data warehouse and it becomes obvious why a data warehouse makes data available quickly and cheaply.
Of course, the real difference between having a data warehouse and not having one is how many times the infrastructure required for accessing the data must be built. With a data warehouse you have to build the infrastructure only once. With no data warehouse you have to build at least part of it every time you want new data.
No organization needs just one piece of data; all require many forms of data. The need for new forms and structures of data is being recreated every day. When it comes to looking at the larger picture - not the cost of data for a single item, but the cost of data for all data - the data warehouse greatly alleviates the burden placed in the information systems organization. Figure 5: The difference between having a data warehouse and not having one when it comes to finding multiple types of data.
Figure 5
Looking at Figure 5 it now becomes obvious that a data warehouse truly does lower the cost of getting information and greatly accelerates the rate at which data can be found.
Organizations have a habit of not looking at the larger picture. Many organizations focus only on the here and now - they look only up to next Tuesday and not an hour beyond it. So what do the short-sighted organizations see? They see the comparison between the data warehouse infrastructure and the need for a single unit of information. Figure 6 shows this comparison.
Figure 6
When looking at the diagram in Figure 6, the short-term approach of not building a data warehouse appears to be attractive. The organization thinks only on the immediate horizon, where it is less expensive just to dive in and get data from applications without building a data warehouse. There are a hundred excuses the corporation has for not looking to the long-term:
The data warehouse is so big
We heard that data warehouses don't really work
All we need is some quick and dirty information
I don't have time to build a data warehouse
If I build a data warehouse and pay for it, one of my neighbors will use the data later on and they don't have to pay for it, and so forth.
As long as a corporation insists on having nothing but a short-term focus, they indeed will never build a data warehouse. But the minute the corporation takes a long-term focus, they see an entirely different picture, as illustrated in Figure 7.
Figure 7
Figure 7 shows that when the long-term needs for information are considered, the data warehouse far and away is less expensive than the series of short-term efforts. And the length of time for access to information is an intangible whose worth is difficult to measure. No one argues that information today, right now, is much more effective than information six months from now. In fact, six months from now I will have forgotten why I wanted the information in the first place. You simply cannot beat a data warehouse for speed and ease of access of information.
The Web environment then is a most promising environment. In order to unlock its potential, information must be freely and cheaply available. The supporting infrastructure of the data warehouse provides that foundation and is at the heart of the effectiveness of the Web environment.