Building the data warehouse 1st Chapter summary

The data warehouse requires an architecture that begins by looking at the whole and then works down to the particulars. Certainly, details are important throughout the data warehouse. But details are important only when viewed in a broader context.

The Evolution

ecision support systems (DSS)

The magnetic tapes were good for storing a large volume of data cheaply, but the drawback was that they had to be accessed sequentially.

===============================================================

The proliferation of master files and redundant data presented some very insidious problems:

The need to synchronize data upon update

The complexity of maintaining programs

The complexity of developing new programs

The need for extensive amounts of hardware to support all the master files

================================================================

Disk storage was fundamentally different from magnetic tape storage in that data could be accessed directly on a DASD.

The Advent of DASD

Disk storage was fundamentally different from magnetic tape storage in that data could be accessed directly on a DASD.

===============================================================

The purpose of the DBMS was to make it easy for the programmer to store and access data on a DASD. In addition, the DBMS took care of such tasks as storing data on a DASD, indexing data, and so forth.

PC/4GL Technology

Enter the Extract Program

The extract program is the simplest of all programs. It rummages through a file or database, uses some criteria for selecting data, and, on finding qualified data, transports the data to another file or database.

The extract program became very popular for at least two reasons:

Because extract processing can move data out of the way of high performance online processing, there is no conflict in terms of performance when the data needs to be analyzed en masse.

When data is moved out of the operational, transaction-processing domain with an extract program, a shift in control of the data occurs. The end user then owns the data once he or she takes control of it. For these (and probably a host of other) reasons, extract processing was soon found everywhere.

The Spider Web

The larger and more mature the organization, the worse the problems of the naturally evolving architecture become.

Problems with the Naturally Evolving Architecture

The naturally evolving architecture presents many challenges, such as:

Data credibility:

No time basis of data

The algorithmic differential of data

The levels of extraction

The problem of external data

No common source of data from the beginning

Productivity:

Locate and analyze the data for the report.

Compile the data for the report.

Get programmer/analyst resources to accomplish these two tasks.

Inability to transform data into information:

The systems found in the naturally evolving architecture are simply inadequate for supporting information needs. They lack integration and there is a discrepancy between the time horizon (or parameter of time) needed for analytical processing and the available time horizon that exists in the applications.

A Change in Approach

Primitive data and derived data:

============================================================

Following are some other differences between the two.

Primitive data is detailed data used to run the day-to-day operations of the company. Derived data has been summarized or otherwise calculated to meet the needs of the management of the company.

Primitive data can be updated. Derived data can be recalculated but cannot be directly updated.

Primitive data is primarily current-value data. Derived data is often historical data.

Primitive data is operated on by repetitive procedures. Derived data is operated on by heuristic, non-repetitive programs and procedures.

Operational data is primitive; DSS data is derived. Primitive data supports the clerical function. Derived data supports the managerial function.

=============================================================

Primitive data and derived data are so different that they do not reside in the same database or even the same environment.

The Architected Environment


These different levels of data are the basis of a larger architecture called the corporate information factory (CIF).

The operational level of data holds application-oriented primitive data only and primarily serves the high-performance transaction-processing community.

The data warehouse level of data holds integrated, historical primitive data that cannot be updated. In addition, some derived data is found there.

The departmental or data mart level of data contains derived data almost exclusively. The departmental or data mart level of data is shaped by end-user requirements into a form specifically suited to the needs of the department.

While data in the data mart certainly relates to data found in the operational level or the data warehouse, the data found in the departmental or data mart environment is fundamentally different from the data found in the data warehouse environment, because data mart data is denormalized, summarized, and shaped by the operating requirements of a single department.

The individual level of data is where much heuristic analysis is done.

==============================================================

Some people believe the architected environment generates too much redundant data. Though it is not obvious at first glance, this is not the case at all. Instead, it is the spider web environment that generates the gross amounts of data redundancy.

=============================================================

Note that the records in the data warehouse do not overlap. Also note that there is some element of time associated with each record in the data warehouse.

Data Integration in the Architected Environment


Who Is the User?

It is important to peer inside the head of the DSS analyst and view how he or she perceives the use of the data warehouse. The DSS analyst has a mindset of “Give me what I say I want, and then I can tell you what I really want.” In other words, the DSS analyst operates in a mode of discovery. Only on seeing a report or seeing a screen can the DSS analyst begin to explore the possibilities for DSS.

The Development Life Cycle

the operational environment is supported by the classical systems development life cycle (the SDLC). The SDLC is often called the “waterfall” development approach because the different activities are specified and one activity—upon its completion—spills down into the next activity and triggers its start.

The CLDS starts with data. Once the data is in hand, it is integrated and then tested to see what bias there is to the data, if any. Programs are then written against the data. The results of the programs are analyzed, and finally the requirements of the system are understood. Once the requirements are understood, adjustments are made to the design of the system, and the cycle starts all over again for a different set of data. Because of the constant resetting of the development life cycle for different types of data, the CLDS development approach is usually called a “spiral” development methodology.

Patterns of Hardware Utilization

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 204,189评论 6 478
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 85,577评论 2 381
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 150,857评论 0 337
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,703评论 1 276
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,705评论 5 366
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,620评论 1 281
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 37,995评论 3 396
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,656评论 0 258
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 40,898评论 1 298
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,639评论 2 321
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,720评论 1 330
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,395评论 4 319
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 38,982评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,953评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,195评论 1 260
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 44,907评论 2 349
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,472评论 2 342

推荐阅读更多精彩内容