found id Corporate Information Factory (CIF) Resources by Bill Inmon, Inmon Data Systems

Corporate Information Factory

> home > view content

Charge Back in the Data Warehouse Environment

As data warehouse matures as a discipline, the amount of money spent on data warehousing becomes significant. And with the rise in costs comes an awareness that charge back for DSS data warehouse processing is a necessary and most beneficial practice. Charge back - if it does nothing else - raises the awareness of the end-user as to the resources that are being consumed.

Charge back in the data warehouse environment is based on two basic measurements - requests submitted and total rows of data returned. There are other system factors that can be measured, such as CPU time, line transmission time, response time, and so forth. But the basic parameters of requests submitted and total rows of data returned suffice to measure the billable activity flowing through the data warehouse environment.

As a rule the activities occurring inside the data warehouse are gathered and reported by an activity monitor or a data usage tracker. The data usage tracker sits either at the end-user workstation gathering statistics about query activities or at the data warehouse server as the query activity passes into and out of the server. The activity is gathered and collected by department, so there is a need to attach a department identification to each request and each set of returned data as the data flows back to the end user.

Charge back processing needs to be able to be tailored across the corporation by means of a charge back algorithm. Some of the factors that the customized charge back algorithm needs to be able to take into consideration include:

  • the time of day the request is submitted,
  • the day of the week the request is submitted,
  • the week of the month the request is submitted,
  • the priority of the request,
  • whether the request can be run overnight, and so forth.

One of the most important features of a tailorable charge back algorithm for the data warehouse DSS environment is that of "thresh-holding". Thresh-holding is the practice of being able to create one set of charges for one level of activity and another set of charges for another set of activity. For example, the data warehouse administrator may specify that the first fifty requests per month are free and the first ten thousand rows returned are also free. But after that there may be a charge of $5.00 per request and a charge of $.01 per row of data accessed.

Thresh-holding is important in the data warehouse environment because it is necessary to allow the end user to experience the discovery process that is associated with DSS and data warehouse. As long as the end user thinks that query activities are going to be expensive, the end user may not venture into the data warehouse. But if the first few queries are free, then the end user can delve into the data warehouse with no fear of spending too much. Once the end user becomes a frequent user, then the charges can become regular. But by this time, the end user is through the discovery process. The effect of thresh-holding then is to "prime the pump" in terms of DSS data warehouse usage.

Charge back in the data warehouse environment can be done in terms of real money or "funny money". Even when charge back is done in terms of funny money, it has the effect of raising the consciousness of the end user as to how resources are being spent.

One of the long term beneficial effects of charge back in the data warehouse environment is that the end user's awareness of resource consumption causes the end user to access data at higher and higher levels of summarization. When most end users see a data warehouse they gravitate toward doing their query activity at the lowest level of granularity possible. Very quickly this low level of granularity becomes their security blanket. Over time the end users learn that data resides at different levels of summary within the data warehouse. For many kinds of processing, it is much cheaper and much faster to access and analyze data at higher levels of summary. Charge back is one of the most important factors leading the end user to this important conclusion.