found id
Just when you think there is nothing new under the sun, out pops a jackrabbit and teaches you a lesson in being short- sighted. Just the other day I learned something new about database design when I thought I had seen it all. After spending a professional lifetime in the database design arena and being a participant in the industry's database debates, I thought there was nothing new left to see. But I was wrong.
In order to appreciate what's new in database design, a review of the past is in order. Figure 1 is more or less representative of the different types of database design that have been popular over the years.
Database design began with Ed Yourdon and Tom DeMarco and structured analysis and design. In the very early days of the computer industry it was thought that database design was a by-product of functionality. Get the process right and the database design will follow. For the systems that were being built in the early days, this "master file" approach worked just fine. Given the chaos of development that preceded structured analysis and design, the technique for database design - "master files" - was a significant improvement. However, once very large systems came on the horizon and once a design beyond anything very large appeared, the master file approach soon proved to be inadequate.
The world was not a finite place with a fixed set of requirements. The minute that the structured analysis approach was extended over anything but a finite set of requirements, the system design yielded very complex and questionable results. In short, the "database as a by-product of function - master file" approach went by the way side.
Looking across the enterprise at an integrated approach to database design were James Martin, Clive Finklestein and others with the ER (entity relationship) approach. The ER techniques yielded a subject-oriented approach to database design. In the subject-oriented approach, data was integrated across the enterprise into subject-oriented databases. The ER approach was an elegant approach that solved many of the shortcomings of the master file approach.
Closely related to the subject-oriented ER approach was the relational approach, fostered by Ted Codd. The relational approach was an academic approach in which data was organized according to the "natural" relationships that occurred among different attributes of data. The relational approach coincided quite nicely with the ER approach of Martin. In some respects, the ER approach was the same as the relational approach except that the ER approach was a top-down approach while the relational approach was a bottom-up approach. There was indeed much affinity between the two approaches and today, for all practical purposes, the two approaches are intellectually merged.
Concurrent with the ER and relational approaches to database design was the design work that was being done for online transaction processing systems. The epitome of the OLTP data base design was the "root only" database. The term "root only" refers to the fact that data is compacted into a single physical structure so that it is very, very efficient to access. Any given unit of data in a root only database can be accessed in a single I/O. The result is very high performance transaction processing. Data is denormalized and compacted into a simple structure so that all the data that is needed is all in the structure when accessed. Root only databases are hyper-efficient to access, but the database design of a root only database is quite different from the database design of anything else. Lots of different kinds of data are held tightly in anticipation of a need for minimal I/O upon access. The result is very good transaction performance. If there is an original proponent of the root only database design approach it would have to be Jim Sheetz
After the world discovered data warehouse and the need for DSS, another form of database design was advanced. That form of design is called "star join" design. Star join design is optimal for the flexible access of data. Star joins entail structures called fact tables and dimensions. Unlike a root only design that is very streamlined and simple in demeanor, a star join design is very ornate, with many different complex facets. On occasion, several stars can be merged together to create what is known as a snowflake. The way that the designer knows to create a star join is by the gathering and assimilation of requirements. The requirements of how data is to be used and accessed shape the star join and determine what is the optimal design and what is not the optimal design. Ralph Kimball is undoubtedly the driving force behind the star join approach to database design.
So each of the approaches to database design have an optimal structure for their intended purpose. Each was designed to fulfill a unique function in an efficient manner. Some design approaches are more efficient at doing one thing or the other and other design approaches are optimal at doing something else. Whether they realize it or not, each design approach represents a trade- off.
By optimizing one aspect of their environment, other aspects are de-optimized. A brief summary of what is optimized and what is de-optimized might look like:
|
Database Design |
Optimizes |
De-optimizes |
|
Master file design |
Functional design |
Iintegration, transaction performance |
|
Integrated, subject-oriented design |
Integration of data, flexibility |
Online transaction processing |
|
Relational design |
Rules for structuring of data, flexibility |
Online transaction processing |
|
Root only design |
Online transaction processing |
Flexibility needed for DSS |
|
Star join design |
Flexibility and accessibility of DSS |
Update and OLTP access |
And now there is something new under the sun. I first saw this approach at a start up company called Kalido on a recent trip to London. In order to frame the context for Kalido, consider yourself to be a person building a data warehouse for a global corporation. There are two very basic problems you face:
Data warehouses contain data over time. This means that over time, change is inherent to your data and data structure. This is true for any data warehouse, global or otherwise, and
Given that a global warehouse represents data from many different countries, change is constant in a global data warehouse.
These two factors are simply a statement of reality. The data warehouse designer for a global data warehouse must address these issues if there is to be any hope of success.
So what is wrong with using a database design that is traditional? Can't you build a global database using a traditional approach to database design? Of course you can, but if you do, change will eat you alive.
Consider this: You have built and populated a global data warehouse using traditional database design techniques. Now you find that the government of Peru has reorganized. There is a new way of taxing oil and the government of Peru issues an edict that classifications of oil be changed. The data that is already in your global data warehouse has to be taken out of the warehouse, reclassified, and reentered in the data warehouse. This is a lot of work and causes your global data warehouse administrators to spend a lot of weekends at the office. At approximately the same time the state of Yucatan in Mexico decides that oil exploration needs to be encouraged. In order to make that encouragement, the state of Yucatan creates an oil depletion allowance. In order to remain consistent, all the data in the global data warehouse that has been entered must be revisited to determine if the oil depletion allowance applies to it. The global data warehouse administrators spend more sleepless nights pulling data out of the global data warehouse, massaging it, and putting it back in.
About a week later OPEC decides that for Europe there is to be a slowdown in production. The legal quotas that have been in place for three years now all need to be changed. Then a new oil tanker comes into existence that can carry oil at a cheaper rate. Oil transport rates need to be changed. Then there is a revolution in Peru and the rules change again. Then oil is discovered in the Red Sea. When will it all end? (The answer is - it won't.)
There is a never ending series of changes around the world, all of which, with classical database design techniques, mean that the database administrators and database designers for a global data warehouse environment have a fully- employed future. In fact, the company that is serious about building and maintaining a global data warehouse has its hands more than full.
So enter an innovative approach to database design taken by Kalido. The database designers at Kalido have taken the approach that classical database design techniques just won't work for a global data warehouse. And they are right. What the database designers at Kalido have done is to optimize the design of data on a different set of parameters. They have recognized that change is the beast to slay for a global data warehouse. So they have done essentially two things. They have:
Separated unstable data from stable data. Data like transaction data that is stable as long as it is recorded properly is separated from data that is unstable, such as reference and organization chart data, and
For the unstable data, Kalido has added a dimension of time variancy. Time variancy means that unstable data does not have to be converted every time there is a change in the world. Instead, with time variancy the change is noted by adding a new definition at a newer moment in time. But going back in time is still accurate and a viable thing to do.
The net result is that a global data warehouse can be built which does not:
Require an eternal maintenance effort by an army of database administrators,
Allows new changes to be accommodated simply and with little fanfare,
Allows older data to remain intact and as valid as it was the day it was first recorded.
In order to achieve this rather remarkable state of affairs, Kalido has had to reinvent database design. The old structures simply do not suffice.
So the next time someone says that the last book has been written on database design, introduce them to the problems of a global data warehouse.