原文:What goes around comes around pdf
The following are quotes and notes taken from the cited paper above. If in a hurry, it suffices to skim all the "lessons learned". However, the content is entertaining and well worth a read.
IMS
Takeaways
Lesson 1: Physical and logical data independence are highly desirable
Lesson 2: Tree structured data models are very restrictive
Lesson 3: It is a challenge to provide sophisticated logical reorganizations of tree
structured data
Lesson 4: A record-at-a-time user interface forces the programmer to do manual query optimization, and this is often hard.
Notes
- Hierarchical data model (tree)
- Record type: collection of named fields with associated datatypes.
- Instance of record type obeys definition.
- Each record type has a key (some subset of named fields).
- Each record type has unique parent record key.
- Each record has a hierarchical sequence key (concatenate keys of ancestors and key of current record).
- Data manipulation language.
IMS supported four different storage formats for hierarchical data. Basically root records can either be:
- Stored sequentially
- Indexed in a B-tree using the key of the record
- Hashed using the key of the record
Dependent records are found from the root using either
- Physical sequentially
- Various forms of pointers.
The ability of a data base application to continue to run, regardless of what tuning is
performed at the physical level will be called physical data independence.
IMS supports a certain level of logical data independence, because DL/1 is actually defined on a logical data base, not on the actual physical data base that is stored.
CODASYL
Takeaways
Lesson 5: Networks are more flexible than hierarchies but more complex
Lesson 6: Loading and recovering networks is more complex than hierarchies
Notes
- Network data model. Not tree. Graph.
- Record-at-a-time data manipulation language. Enters database at entry point, navigates via sets.
- No physical data independence or logical data independence.
- Trades increased complexity for the possibility of easily representing non-hierarchical data. CODASYL offers poorer logical and physical data independence than IMS.
Relational
Takeaways
Lesson 7: Set-a-time languages are good, regardless of the data model, since they offer much improved physical data independence.
Lesson 8: Logical data independence is easier with a simple data model than with a
complex one.
Lesson 9: Technical debates are usually settled by the elephants of the marketplace, and often for reasons that have little to do with the technology.
Lesson 10: Query optimizers can beat all but the best record-at-a-time DBMS application programmers.
Notes
- Relational algebra
- SQL Won due to:
a) the success of the VAX
b) the non-portability of CODASYL engines
c) the complexity of IMS logical data bases
Entity-Relationship Model
Takeaways
Lesson 11: Functional dependencies are too difficult for mere mortals to understand.
Notes
- Databases are collections of instances of entities.
- Entities have attributes.
- Entities have relationships with each other.
- Do database design by constructing an initial collection of tables and then normalizing them.
First, real DBAs immediately asked “How do I get an initial set of tables?” Normalization theory had no answer to this important question. Second, and perhaps more serious, normalization theory was based on the concept of functional dependencies, and real world DBAs could not understand this construct.
R++
Takeaways
Lesson 12: Unless there is a big performance or functionality advantage, new constructs will go nowhere
Notes
- In the 1980's, database papers looked like this:
- Consider an application, call it X
- Try to implement X on a relational DBMS
- Show why the queries are difficult or why poor performance is observed
- Add a new “feature” to the relational model to correct the problem
- Here are the significant features that came up:
- set-valued attributes
- aggregation (tuple-reference as a data type): pointers instead of foreign keys. weird.
- generalization: i.e. inheritance hierarchies
Semantic Data Model
Notes
- Focus on classes and inheritance.
- They failed because "they were a lot of machinery that was easy to simulate on relational systems".
- Also offered no performance improvement.
OO (Object Oriented)
Takeaways
Lesson 13: Packages will not sell to users unless they are in “major pain”
Lesson 14: Persistent languages will go nowhere without the support of the programming language community.
Notes
To address the engineering market, an implementation of persistent C++ had the
following requirements:
- no need for a declarative query language. All one needed was a way to reference
large disk-based engineering objects in C++. - no need for fancy transaction management. This market is largely one-user-at-a-time processing large engineering objects. Rather, some sort of versioning system
would be nice. - The run-time system had to be competitive with conventional C++ when operating on the object. In this market, the performance of an algorithm using persistent C++ had to be competitive with that available from a custom load program and conventional C++
In our opinion, there are a number of reasons for this market failure.
- absence of leverage. The OODB vendors presented the customer with the opportunity to avoid writing a load program and an unload program. This is not a major service, and customers were not willing to pay big money for this feature.
- No standards. All of the OODB vendor offerings were incompatible.
- Relink the world. In anything changed, for example a C++ method that operated on persistent data, then all programs which used this method had to be relinked. This was a noticeable management problem.
- No programming language Esperanto. If your enterprise had a single application not written in C++ that needed to access persistent data, then you could not use one of the OODB products
What is the idea of a persistent programming language?
- "one where the variables in the language could represent disk-based data as well as main memory data and where data base search criteria were also language constructs"
- it requires the compiler for the programming language to be extended with DBMS-oriented functionality
What does this mean?
O2 supported an object-oriented data model, but it was not C++. Also, they embedded a high level declarative language called OQL into a programming language. Hence, they proposed what amounted to a semantic data model with a declarative query language, but marketed it as an OODB.
Object-Relational
Takeaways
Lesson 14: The major benefits of OR is two-fold: putting code in the data base (and
thereby blurring the distinction between code and data) and user-defined access methods.
Lesson 15: Widespread adoption of new technology requires either standards and/or an elephant pushing hard.
Notes
- Motivated by storing geographic data. Searching for all points within a rectangle is a 2-dimensional search.
- GIS queries are difficult to say in SQL and perform badly on B-Trees.
- Basically the outcome of the whole OR era was better support for UDFs.
- the OR proposal added user defined things to SQL: data types, operators, functions, access methods.
Semi Structured Data
Schema Last
- This means having a self-describing schema.
- semantic heterogeneity: information on a common object does not conform to a common representation; difficult for query processing as there is no structure on which to base indexing decisions.
- designed for semi-structured data
XML Data Model
- XML records can be hierarchical, as in IMS
- XML records can have “links” (references to) other records, as in CODASYL, Gem and SDM
- XML records can have set-based attributes, as in SDM
- XML records can inherit from other records in several ways, as in SDM
- union types: an attribute in a record can be of one of a set of possible types
- set-at-a-time query language