Close
Open
(No of votes: 0)

Data governance, Part II

ENTERPRISE information management (EIM) is about the administration of data. Adrienne Tannenbaum said it best: "(EIM) is a function, typically dedicated to an organisation in IT, for maintaining, cataloging, and standardising corporate data." This is done with the help of data stewards under the umbrella of a data strategy, and by establishing data-related standards, policies, and procedures. Michael Brackett's definition of EIM goes a step further by suggesting that "(EIM) activities should be integrated with business planning." This is an important point because EIM activities require active business participation.

EIM has its origins in data administration (DA), which is a formal discipline for managing data as a business asset. Note that the DA function is not the same thing as the technical database administration (DBA) function. DA was formalised in 1980 after a major shift had occurred in the approach to system analysis and design. Thanks to Dr Peter Chen's entity-relationship modelling technique, and the subsequent introduction of relational database management systems (RDBMS), data analysts started to separate the data from the processes and to "catalogue" the data in an enterprise data model while capturing business metadata in data dictionaries (known today as metadata repositories).

Over the decades, data administration changed its name to data resource management (DRM) and information resource management (IRM). Today it is known under the name of enterprise information management (EIM).

Single version of the truth

The ultimate goal of data governance is to achieve the "single version of the truth," which means to have a reliable inventory of unique and unambiguous data elements where each data element has a unique name (no synonyms or homonyms), is well defined (unambiguous), and contains data values that conform to an approved data domain. Achieving the single version of the truth is the main responsibility of an EIM group. Working with the data stewards from the business units, they use two powerful techniques: Enterprise data modelling (using normalisation rules) and data administration (DA) principles.

Enterprise data modelling

The greatest benefits of an enterprise data model (EDM) are gained from building the 360-degree view of a business. The difficulty in building this view is that the current data chaos in most organisations is so immense that it may take significant time and effort to rationalise the existing redundant data into an integrated, non-redundant EDM. Therefore, it is neither possible nor desirable to construct the EDM all at once. Instead, the EDM evolves over time (one project at a time) and may never be completed. It does not need to be completed because the objective of this process is not to produce a finished data model but to discover and resolve data discrepancies resulting from different views of the same data among different business units.

Data integration

When building an enterprise data model, you integrate (gather, rationalise, and standardise) subsets of the corporate data that is stored redundantly in many different databases. Many people confuse data integration with data consolidation. Consolidating data simply means gathering data elements that identify or describe the same business entity, like customer data or product data, from multiple source files or source databases and storing them in one table or in a set of dependent tables. Integrating data goes far beyond that. In addition to consolidating data, integration enforces data uniqueness - the building blocks of the "single version of the truth" that enable you to reuse the same data without the need to duplicate it and without the additional development and maintenance costs of managing the duplications.

Data integration requires several actions during enterprise data modelling:

• Examine the definition, the semantic intent, and the domain values of each logical entity to find potential duplicates of business entities that would otherwise not be discovered because the entities are known under different names in the systems. For example: Customer and client;

• Ensure that each entity instance has one and only one unique identifier (primary key), which, in turn, is never reassigned to a new entity instance even after the old instance expired and was deleted from the database; and

• Use the six normalisation rules to put "one fact in one place", that is one attribute in one, and only one, owning entity. This means that an attribute can be assigned to only one entity as either an identifier of that entity or as a descriptive attribute of that and no other entity. This modelling activity ensures that each attribute is captured once and only once, and that it remains unique within the data universe of the organisation. Hence, the "single version of the truth"; and,

• The last and most important activity of data integration is to capture the business actions (or business transactions) that connect the business entities in the real world. These business actions are shown as data relationships among the entities. It is paramount to capture them from a logical business perspective (not from a reporting pattern or database access perspective) because these relationships are the basis for all potential access patterns, known and unknown, now and in the future.

Normalisation

Normalisation is the most effective technique applied during enterprise data modelling, which ensures that each attribute remains unique within the data universe of the organisation.

The six normalisation rules (1NF, 2NF, 3NF, BCNF, 4NF, 5NF) are used to put "one fact in one place." This means that every attribute must be unique (it must have one and only one semantic meaning), and it can be assigned (or placed) into one and only one entity as either an identifier (primary key) of that entity or as a descriptive attribute of that - and no other - entity. Hence, the term "single version of the truth." While much of the same data must often be stored redundantly in multiple files and databases for performance reasons, the EDM and the business metadata should contain only a single version of each unique entity and attribute.

Data administration principles

An EDM is not merely a pictorial representation of an organisation's data assets. Its ultimate value comes from applying stringent DA principles during the enterprise data modelling process.

For example, there are formal rules for writing data definitions, for creating data names, and for defining valid data content (data domain). The EIM group, with input from the data stewards, defines the formal rules and applies them during enterprise data modelling.

Data definitions

A definition should be short, precise, and meaningful (a short paragraph). It must thoroughly describe the data element name and, optionally, it may contain an example. Michael Brackett's book The Data Warehouse Chaos offers examples of a poor data definition and a better data definition for the attribute "Well Depth Feet."

The definition "The depth of the well in feet" is very poor because it is not clear how the depth is measured. A much better definition is "The total depth of the well in feet from the surface of the surrounding ground to the deepest point dug or drilled regardless of the depth of the well casing."

Data names

Using "favourite" data names or blindly copying informal names from existing systems is not an acceptable standard. Instead, a data name is derived from its definition. Therefore, an attribute is first fully defined before it is named. There are a number of data naming conventions, the most popular being the "prime - qualifier - class word" convention. It prescribes that every attribute (data element) must have one prime word, one or more qualifiers (qualifiers can apply to both prime and class words), and end in one class word. Class words are predetermined by the EIM group and are documented on a published list (e.g., date, text, name, code, number, identifier, amount, count, etc).

Furthermore, every attribute must be fully qualified in order to avoid homonyms and to avoid limitations on naming future attributes, and it must be fully spelled out. An example of a standardised attribute name is: Chequeing Account Monthly Average Balance. The main component (prime word) is "Account" which is further qualified by the word "Chequeing" to indicate the type of account. The class word indicating the type of data value contained in this attribute is "Balance" which is further qualified by the words "Monthly" and "Average" to indicate the type of balance.

Data domains

All attributes must be atomic, which means they cannot be further decomposed. For example, the attribute "Customer Name" is not atomic because it can be decomposed into "Customer First Name," "Customer Initial," "Customer Last Name." Every attribute must also have a predefined data domain, which refers to data values that are allowed in accordance with the data name (specifically the class word), its data definition (semantic meaning), its business rules, and its data quality rules. Data domains can be expressed as a list of values, a range of values, a set of characters, or pattern masks.

Data quality rules

One of the most important benefits from enterprise data modelling is the conscious and purposeful application of data quality rules. Data quality rules apply to entities, data relationships, and attributes.

Entities

The identity rules apply to the primary keys, which are called unique identifiers in enterprise data modelling terminology. The reference rules apply to the foreign keys, which are the physical implementations of data relationships on an enterprise data model. The inheritance rules apply to supertype/subtype structures on an enterprise data model. The cardinal rules apply to the cardinality as well as to the optionality notations on the enteprise data model.

Data relationships

The relationship dependency rules apply only to optional data relationships. There are three dependency rules that dictate whether an optional data relationship must be instantiated. The relationship state dependency rule applies to data relationships between two entities where the state (status) of one entity determines whether or not a data relationship to another entity should be instantiated. The relationship mutual dependency rule mandates that if one data relationship exists between two entities, then another data relationship must also exist. The relationship mutual exclusivity rule mandates that if one data relationship exists between two entities then another data relationship cannot exist.

Attributes

The attribute domain rules apply to the content (domain) of the attributes. The attribute dependency rules apply to domains of dependent attributes. There are four dependency rules that dictate what the content of an attribute should be. The attribute state dependency rule applies to two or more dependent attributes where the state (status) of one determines the values of the others. The attribute mutual dependency rules come in two flavours: Derived and constrained.

The attribute mutual dependency derived rule applies to two or more dependent attributes where the value of one is determined by a calculation that uses the domains of the others. The attribute mutual dependency constrained rule applies to two or more dependent attributes where the value of one is determined by a business rule and/or by the value of other attributes. The attribute mutual exclusivity rule mandates that if a valid value exists for one attribute then another attribute cannot contain any value (must be Null).

Conclusion

A data governance programme has to be administered by a specialized group called Enterprise Information Management (EIM), whose charter it is to create a "single version of the truth," which means to standardise and integrate your company's data assets. The two main techniques used to achieve this goal are enterprise data modelling and data administration principles. If you are interested in learning more about Enterprise Information Management, attend my Data Governance seminar at the Grand Millennium Hotel, Kuala Lumpur, on Nov 18. Contact Behlul Shaljani at (03) 7880-9894 or send e-mail to behlul@kbase.com for more details.

(Larissa T. Moss is founder and president of Method Focus Inc. She has over 30 years of IT experience, with over 20 years in Data Warehousing and Business Intelligence.)

Related Stories:
Managing your data as a business asset

Comments
blog comments powered by Disqus
  • E-mail this story
  • Print this story