Data Dictionary Requirements

Overview

A data dictionary, or metadata repository, as defined in the IBM Dictionary of Computing, is a
"centralized repository of information about data such as meaning, relationships to other data, origin,
usage, and format.”

The primary goal of our data dictionary is to allow users to better understand the data in the data
warehouse.

The Data Dictionary should be:

  • Accessible – For both reading and updating.
  • Reliable – The information in the data dictionary must be kept up to date and managed by the appropriate data stewards.
  • Robust – The richer the information, the more useful. Must also be searchable.
  • Central – One and only one data dictionary.

The main requirements in the remainder of this document intend to describe:

  • What information we will store about the data elements in the data dictionary.
  • How the information will be stored.
  • How it will be accessed.
  • How it will be updated.
  • Other considerations.

The process of building out information in the data dictionary will last for as long as the data warehouse is in existence. It is expected that as clients use the system, especially Subject Matter Experts/Data Stewards, they will update the Data Dictionary as needed with new and useful information.

Requirement 1: Required Columns

# - Column Name - Business Definition

1.1: Subject Area - The OBIEE subject area

1.2: Presentation Folder Name - The top level folder in a subject area within OBIEE

1.3: Presentation Subfolder Name - The subfolder (if any) within a presentation folder

1.4: Presentation Column Name - The data element name

1.5: Business Definition - The definition of the data element supplied by the subject matter experts from various offices around campus.

1.6: Additional Notes - Field to allow for entry of a variety of useful notes about the data element. This field can and should be used for:

  1. Usage Notes.
  2. Things to watch out for. Example: 'Do not confuse Student Campus Group with Course Campus Group.'
  3. Operational Definition. Example: 'Total Instructional Fee = Instructional Fee + OOS Surcharge + Comprehensive Fee.'

1.7: DW Table Name - Name of the Data Warehouse table that stores the data.

1.8: DW Column Name - Name of the Data Warehouse column that stores the data.

1.9: Data Type - The oracle data type. Number, Varchar2, Date, etc…

1.10: Source System - The system from which the data originated. Most often, this will be Banner, but other sources of data are possible.

1.11: Source Table - The table that held the data in the source system.

1.12: Source Column - The column that held the data in the source system.

1.13: Cube Section - For conversion purposes only. If the subject area existed in Microsoft Analysis Services Cubes then this information could be useful to determine the folder structure within an OBIEE Subject Area.

1.14: Cube Section Sub-Group - For conversion purposes only. If the subject area existed in Microsoft Analysis Services Cubes then this information could be useful to determine the folder structure within an OBIEE Subject Area.

1.15: Cube Field - For conversion purposes only. If the subject area existed in Microsoft Analysis Services Cubes then this information could be useful to determine the folder structure within an OBIEE Subject Area.

1.16: Last Modified Date - System Maintained field. The Date the information was last updated.

1.17: Last Modified By - System Maintained field. UniqueID of the person that last updated the information.

While this is the list of columns that will be available to store data, all columns will not necessarily be displayed on all reports.

Note: Some of the columns listed do not currently contain any data. The ‘Additional Notes’ field is a good example. This field is intended to be updated by clients over time as the system is used.

Requirement 2: Data Organization

As the IA project expands, there will be instances where data elements will share a common name but will be defined differently based on their role in the particular Subject Area. As such, we need to allow for a different definition for the same data element in different Subject Areas.

To accomplish this, one row will be stored per data element per subject area. This will allow for a different definition per subject area. However, initially, there is no requirement for the definition to be different. This allows data elements and their definitions to be entered once and then duplicated for each subject area.

Consider the following example: Account Code

Base definition: The account code will represent an asset, liability, control account, fund balance, revenue, or expenditure transaction.

Bursar Course Revenue Definition: Debit Account Code – Account debited by the bursar transaction.

Payroll Definition: Account Code from which the payroll was paid.

Human Capital Job Definition: Account Code from which the job is budgeted.

Human Capital Position Definition: Account Code from which the position is budgeted.

Having one row per data element per subject area allows for the client to search the data dictionary for ‘Account Code’ and see all the subject areas where it is being used.

One significant drawback of this method is the potential for needing to make a change in multiple places if the data definition changes. One way to get around this drawback would be to put a base definition in the ‘Data Definition’ field and keep that the same for each subject area. Then in the ’Additional Notes’ field specify how it is used specific to the subject area.

Requirement 3: Must be accessible from the web

Primary access to the data dictionary will be from within OBIEE itself. Several reports will be created to access the data dictionary:

Data Dictionary Dashboard Report

This report will contain only the data elements on each dashboard. A link to this report should be included somewhere on the dashboard.

One of these types of reports should be created for each set of dashboards created. This will allow the user to get a list of data definitions for relevant data elements.

Link to this report in IAPROD for the VP Dashboards: Dashboard Data Dictionary

Data Dictionary Report

This report will be the main report used for general purpose querying of the data dictionary. It will include all data elements in the data dictionary.A prompt will be added to the top of the report to allow for searching. Link to this report in IATEST Data Dictionary (The link above will be updated in the near future after the 4.2 release is promoted to IATest and then IAProd.)

A prompt will be added to the top of the report to allow for searching.

Link to this report in IATEST Data Dictionary (This link above will be updated in the near future after the 4.2 release is promoted to IATest and then IAProd.)

APEX Data Dictionary Application

This application was created primarily to satisfy the requirement for making updates, however, it can be used to view the data dictionary as well. This application allows for searching.

The link to the APEX application in IAPROD is: APEX Data Dictionary App

Requirement 4: Authorized clients must be able to update

Appendix 1: (Other Implementation Options)

Appendix 2: (OBIEE Hover Text)

Appendix 3: (Best Practices/Guidelines for Business Definitions)

Appendix 4: (Columns Considered but Not Included)