The following glossary defines some of the common terminology used with data
warehousing generally and with the Highlander products specifically.
Atomic Level
The atomic level is the finest grain of aggregation summarized by a
dimensional data model. When applied to dimensions, the atomic level refers
to the discrete values the dimension may assume. When applied to a dataset,
the atomic level is the cell created by the intersection of all dimensions
at the atomic level.
The atomic level is the lowest level of detail normally stored in a
multi-dimensional database.
Data Mart
Usually sponsored at the department level and developed with a specific
issue or subject in mind, a Data Mart is a data warehouse with a focused
objective. The scope of data in a data mart is limited to data supporting
the target objective. In this sense, the data mart is more structured than
a general-purpose data warehouse. Data mart's are now generally recognized
as the most cost-effective approach to data warehousing.
Data Warehouse
According to William Inmon, widely considered the father of the modern data
warehouse, a Data Warehouse is a "Subject-Oriented, Integrated,
Time-Variant, Nonvolatile collection of data in support of decision making".
Data Warehouses tend to have these distinguishing features: (1) Use a
subject oriented dimensional data model; (2) Contain publishable data from
potentially multiple sources and; (3) Contain integrated reporting tools.
Dataset
A Highlander Dataset - also known as a hypercube - is a summary measure
arrayed by a collection of dimensions. The most common summary measure is
the count of records falling in a cell, based on the dimensions defining the
cell. Multi-dimensional databases typically contain many datasets.
Dataview
A Highlander Dataview is a data structure containing multiple datasets,
rules for making dimension queries, and data reduction algorithms. A
dataview can provide a transparent view to a single dataset or can combine
datasets in various ways such as by forming ratios. User-defined
calculations using summarized data are implemented by dataviews. End users
interact with dataviews directly, datasets.
Dataview Tree
A Dataview Tree is a hierarchical structure of nodes representing dataviews
and navigation junctions. This structure visually portrays the subject
matter context of the Highlander databases and is an integral part of the
database. Dataview nodes are "opened" to run the dataview itself.
Navigation nodes organize the subject matter hierarchically. User quickly
grasp the scope and nature of the database's information content through
this structure.
Dimension
A Highlander Dimension is an independent variable assuming a finite number
of discrete possibilities. Dimensions may be categorical in nature, or
numeric in nature ranging over a continuous spectrum. In the latter case,
the range of possible dimension values is partitioned into intervals and
categorized by the interval containing its value. Virtual dimensions are
computed from a set of independent variables according to a user-defined
expression then categorized by an interval partition.
Dimension Query
A Dimension Query is a selection criterion for a dimension of a dimensional
data model. The query might select all atomic dimension values ('ALL') or
summarize all values ('TOTAL') essentially ignoring the dimension.
Individual values are selected by naming the value (Dimension Color = 'Red'
'Blue'), or are summarized by inserting a plus sign between their values
(Dimension Color = 'Red' + 'Blue'). Queries for all dimensions specify a
collection of cells for extraction from the database. Cells are summarized
by summing their contents, or by applying a summary metric that depends on
the dataset's definition.
Dimensional Data Model
The Dimensional Data Model represents data as values residing in cells
defined by the intersection of a set of independent variables - dimensions -
each assuming a discrete set of values. A dimensional data model is often
referred to as a hypercube or the Cartesian product of dimensions forming a
n-dimensional structure of cells. For example, two dimensions form a table,
the classic spreadsheet. Cells are defined at each row x column
intersection. Three dimensions form a cube of cells based on the face, row,
and column dimensions. Specifying a value along the face dimension defines
a table in the row x column plane, and further specification of rows defines
lines in the column dimension. The best metrics of a dimensional data model
size are the number of cells (the product of the dimension lengths along all
dimensions) and the density (the proportion of cells having data).
Dimensional data models are the intuitive models most users have of their
data.
Drill Down
Drill Down refers to the process of disaggregating summary results along
some dimension. Disaggregation might be to an intermediate level or to an
atomic level, the finest level recorded for the dimension. Drilling
"through the floor" to the entity level refers to finding the entities that
were summarized in a cell at the atomic level.
Enterprise Data Warehouse
An Enterprise Data Warehouse is a data warehouse containing all publishable
quality data of a permanent nature collected by an organization. This
inevitably includes historic data from multiple data sources. Operational
transaction data is usually excluded due to its volatile nature. Enterprise
data warehouses are valuable resources, are costly to construct, and require
a long time to evolve.
Entity Record
Entity records are tabular records structured as fields within records
within relational tables or flat files. Entity records, sometimes called
transaction records or just transactions, are the source level subject
matter detail summarized by a dimensional data model. Entity records are
not retained in a multi-dimensional database, however many data warehouses
contain methods for identifying and retrieving the summarized entities.
Multi-Dimensional Database
A Multi-Dimensional Database is a database that stores data natively in
multi-dimensional format. Database access is done by specifying a
query for each dimension. Results of an access are summarized
cell values over a subset of cells. The entity source records are
themselves gone and only dimensions and summary measures remain.
Dimensional data models can be implemented in relational databases however
multi-dimensional databases are more efficient and enjoy better performance.
Currently multi-dimensional databases are not standardized; multi-dimensional databases contain the necessary query and reporting tools.
On-Line Analytical Processing (OLAP)
On-Line Analytical Processing - OLAP - is the ability to conduct data
analysis within the context of the database. Data analysis may include
predefined descriptive statistics, user defined expressions executed against
the data, or customized models driven by the data.
Reach Back
Reach Back is the extraction of one or more entity records summarized during
the creation of a multi-dimensional database. Entity records are not
normally retained in a multi-dimensional database, so reach back operations
require recourse to the source entity data used to create the database.