Corporate Information Factory

Ask Bill (Q & A)

Below is a new collection of our readers' questions, followed by Bill's answer. If you have a data warehousing or Corporate Information Factory question for Bill Inmon, please contact us. We will do our best to address all questions, but do not guarantee an answer. Those we do answer will be posted in this section, without identifying names or circumstances. It may take several weeks before a question is posted.

 

Q: I have been studying your model for the Corporate Information Factory and the Web and there is one area which I don't see covered. In our business, we find that we want to integrate Web-based transactions with paper-based transactions from legacy systems. We want a single set of reports which consolidates both and we want to generate a consolidated extraction file from both for further back-end processing, such as for making credit card payments to third parties, etc. I would have expected that the Web would feed the ODS that is used by other applications. Why did you choose not to have the Web feed the ODS?

A: It is possible to feed the ODS directly from the Web environment but it is usually not a good idea. Here's why:

  1. The granularity in the ODS is at a much higher level than that found in the Web site.
  2. The ODS typically contains "profile" data. The profile data is an amalgamation of a lot of historical data. Typical of a profile record is:
    • Jane Smith
      • Age - 35
      • Gender - female
      • Professional
      • Likes
        • Art
        • Champagne
        • Caviar
        • Old restored Chevy's
      • Lives in the suburbs

    It took at lot of detailed data to make this profile record. To put the Web data directly into the ODS profile record requires that the granularity of the Web data be raised considerably.

  3. The data warehouse holds detailed historical data. The ODS holds capsulized, profile data. If you put the Web data directly into the ODS, there is a good chance that important detailed Web data will never find its way to the data warehouse.

Q: I have been reviewing Bill's articles on the ODS and need some clarification. In several articles, Bill states that: "The ODS allows the user to have... update capabilities." Does this statement mean that a user can update data directly in the ODS (bypassing a transaction system). If the answer to this is yes, can you please give me an example when this might be warranted? If the answer is no, can you please elaborate as to the context of this statement?

A: Although it is unusual, it is possible to update data directly into an ODS. As an example, years ago CIBC built a risk management data warehouse and system. When Canada had the election to secede Quebec from Canada, the results of the election were fed real-time into the system. In doing so, CIBC was able to check on the currency position of the bank on the fly, literally, as the election results were coming in.

I haven't seen many ODS where direct update is done, but it is legitimate.


Q: I just need to know what we should do when our users have complicated or multi-table joins and ask us to create a new table to include all the joined tables data? We have been getting requests to continually add columns from the Atomic level tables into these new tables (also Atomic level) just a subset of the same fields. It is duplicating the fields across these tables....Any suggestions?

A: There is nothing wrong with duplicating data under the right circumstances. Those circumstances are:

  • The data is used frequently in the form that has been duplicated
  • The volume of data being duplicated is of modest size
  • There is only one place where the data is always copied from. That is to say, there is an ardently enforced system of record.

Under these conditions duplicating data can be a good thing to do.


Q: We'd like your comments on our hypotheses for our thesis topic: " Why are there so few EDW in Sweden among larger Swedish enterprises?"
  1. There are not enough competition in Sweden
  2. The companies are too short-sighted in their investments
  3. The companies organizations are too functionally orientated
  4. They are not mature enough in terms of IT
  5. They are not used to creating a common business case across boundaries
  6. The idea and engagement seldom comes from the top leaders

A: I can't speak directly for Sweden as I have had the pleasure of being there only once for less than 24 hours. However, I can confirm your suspicions.

Data warehousing has taken off where there is competition. Banks, telephone companies, manufacturers, retailers and more around the world have adopted data warehousing. They have found that data warehousing leads to greater revenue and the ability to attract and hold onto market share. Data warehouses speak to the very essence of competition and make he organization a much healthier one.

Where there is little or no competition, data warehousing has been slow to be adopted, as witness the government circles. In government, people are rewarded for expanding their budget and for increasing the number of people that work for them. In this environment the data warehouse doesn't add very much.

And many companies are short sighted. They don't want to make their company more efficient and prosperous. Eventually the market place catches up to these companies and it is always painful when it does.

And as you point out, maturity is a big factor. Many IT organizations have what I call a "heads down" approach. They look at the minutiae. I think this stems from choosing IT managers from programmers. By definition, programmers have to dwell in details. Programmers simply don't spend much if any time looking at the larger picture. Until you step back, look up instead of down and see the larger picture of what is going on, you never know that you need a data warehouse.


Q: We're building a huge data warehouse and I'm trying to find stats on the largest currently existing data warehouse. Do you have any?

A: I am aware of some large data warehouses that are out there. Some of the largest ones that I am aware of include:

  • ATT call level detail, Pisacataway
  • US meteorological service
  • Wal-Mart
  • Boeing Computers inventory warehouse
  • US IRS
  • US intelligence community

In general, people find that when they start to build really large warehouses, the bulk of the data goes on non disk storage and only the actively accessed data goes on high performance disk storage. The nature of the warehouse changes when you do this, all for the better.


Q: There is a wealth of information available on your site. However, I was unable to find information relating to DW metrics, and metrics that would relate to NPV, IRR, ROI, etc. Could you please help me in this regard?

A: We too noticed that there is not much in the way of metrics for our industry. We are busy building surveys that do exactly that. As soon as we have enough survey information collected to warrant an announcement, we will make that information available to the community.


Q: Are companies still building new data warehouses? Or mainly re-working existing data warehouses? What is the length of time for building a data warehouse or mart? At what cost? Are companies moving towards the "pre-packaged" DW's?

A: We too noticed that there is a dearth of "real" information in the marketplace. To that end we are surveying the world. This takes some time, however. Once we have gathered enough information from our surveys, we will make it publicly available.