Advanced Data Modeling

The Extended Entity-Relationship Model is a more complex high-level model that extends an E-R diagram to include more types of abstraction, and more clearly express constraints. All of the concepts contained within an E-R diagram are included in the EE-R model, along with additional concepts that cove more semantic information. These additional concepts include generalization/specialization, union, inheritance, and subclass/superclass.

The extended entity-relationship (EER) model is a language for definition of structuring (and functionality) of database or information systems. It uses inductive development of structuring. Basic attributes are assigned to base data types. Complex attributes can be constructed by applying constructors such as tuple, list or set constructors to attributes that have already been constructed. Entity types conceptualize structuring of things of reality through attributes. Cluster types allow to generalize or to combine types into singleton types. Relationship types associate types that have already been constructed into an association type. The types may be restricted by integrity constraints and by specification of identification of objects defined on the corresponding type. Typical integrity constraint of the extended entity-relationship model are participation, lookup and general cardinality constraints. Entity, cluster and relationship classes contain a finite set of objects defined o

n these types. The types of an EER schema are typically depicted by an EER diagram.



Graphical representation for the extended ER model. We can represent a saturated set of entity, cluster and relationship types by ER diagrams. One possible kind of diagram is displayed in Figure 1. Entity types are represented graphically by rectangles. Attribute types are associated with the corresponding type. Attributes identifying a type are underlined. Relationship vertices are represented graphically by diamonds. Clusters are represented by diamonds labelled with root illustrated ⊕ or simply as a common input point labelled by ⊕ to a diamond. This style of drawing diagrams is one of many variants that have been considered in the literature. The main difference of representation is the style of drawing unary types. Tools often do not allow to introduce cluster types and relationship types of order higher than 1. In this case, those types can be objectified, i.e. represented by a new (abstract) entity type that is associated through binary relationship types to the components of the original type. In this case, identification of objects of the new type is either inherited from the component types or is provided through a new (surrogate) attribute. The first option results in the introduction of so-called weak types. The direct translation of these weak types to object-relational models must be combined with the introduction of rather complex constraint sets. Typically, this complexity can be avoided if the abstract entity type is mapped together with the new relationship types to a singleton object-relational type. This singleton type is also the result of a direct mapping of the original higher-order relationship type.

Video: Creating an EE-R Diagram

(Video courtesy of youtube)



The main application area for extended ER models is the conceptualization of database applications. Database schemata can be translated to relational, XML or other schemata based on transformation profile that incorporate properties of the target systems.


What is Entity Clustering?

Clustering, in the context of databases, refers to the ability of several servers or instances to connect to a single database. An instance is the collection of memory and processes that interacts with a database, which is the set of physical files that actually store data.

Clustering offers two major advantages, especially in high-volume database environments:

  • Fault tolerance: Because there is more than one server or instance for users to connect to, clustering offers an alternative, in the event of individual server failure.
  • Load balancing: The clustering feature is usually set up to allow users to be automatically allocated to the server with the least load.

Clustering takes different forms, depending on how the data is stored and allocated resources. The first type is known as the shared-nothing architecture. In this clustering mode, each node/server is fully independent, so there is no single point of contention. An example of this would be when a company has multiple data centers for a single website. With many servers across the globe, no single server is a “master.” Shared-nothing is also known as “database sharding.”

Contrast this with shared-disk architecture, in which all data is stored centrally and then accessed via instances stored on different servers or nodes.

The distinction between the two types has become blurred recently with the introduction of grid computing or distributed caching. In this setup, data is still centrally managed but controlled by a powerful “virtual server” that is comprised of many servers that work together as one.

Entity-clustering is a hierarchical decomposition of a macro-level view of the data model into finer and finer views, eventually resulting in the full, detailed data model. The first of the following Figs. Shows the completed data model with dashed lines drawn around possible entity clusters, and the second shows the final result of transforming this into an EER diagram of only entity clusters and relationships.

Entity clusters are formed:

  1. By abstracting a supertype and its subtypes
  2. By combining directly related entity types and their relationships (SELLING_UNIT, ITEM, MATERIAL, MANUFACTURING)
  3. An entity cluster can also be formed by combining a strong entity and its associated weak entity types