Information requires shape and context to be useful. In order to be of value for an organization, data must be modeled.
Without context, data are merely random numbers or other bits of information. To gain context, the source and the potential uses of data must be understood. Also, the relationships between relevant data sets must be discerned and analyzed.
Data modeling is how developers place into context the large amount of information that is generated by any human endeavor – especially endeavors that are too complex to monitor and measure without a computer.
A data model often resembles a flow chart – a series of boxes containing sets of information and connected by lines that represent the relationships between the different data sets. Data modeling became a must when computers began to be integrated into human activity in the 20th century.
A data model is a map that helps a developer understand how data sets are acquired, how they are stored, how they are related, how they are analyzed and how they are to be used.
Data models have evolved over the years to incorporate changing technology and the need for organizations to “scale” their operations to serve large populations. Any business professional should know the basics of data modeling, and this is especially true for those who choose to work in the field of data analytics.
It is a complex exercise, with a language all its own. Anyone who would make a profession of data analysis will need to become fluent in the language.
Here are just a few of the key definitions every data professional needs to know:
Aggregation – In terms of a DBMS (database management system), this is an aspect of the entity-relationship model that depicts the relationship between different sets of data. In an entity-relationship diagram, this is indicated by a dashed box drawn around the aggregation.
Analytical model – a data model that supports analytical functions using known mathematical formulae; often uses the star schema method of modeling.
Associations – the fundamental building block of a data model, showing the relationship between two pieces of data or two entities; an example is the relationship between a customer and a transaction.
Attributes – tags or characterizations that are used to distinguish elements of data; for example, attributes of a customer might be name and/or age.
Bachman diagram – a data model used primarily in computer software design; it shows network relationships through depictions of how data is stored by the system.
Conceptual data models – a model designed to depict the subject of the data without representing the internal details of physical data storage; it is, as the name suggests, a representation of a related collection of concepts.
Constraints – the elements that represent the relationships among two or more pieces of data; the properties that bind pieces of data are depicted by graphic elements such as solid or broken lines.
Data architecture – the structure or structures of data used by a company or organization to represent its applications or transactions; the architecture depicts how data is processed, stored and used within the organization’s network.
Data flow model – a design that illustrates how data moves through a system; this model depicts external data sources, processes, storage methods and data flow pathways.
DBMS – Database Management System; software used to allow end users and others to interact with applications in order to procure, define, update and otherwise administer a database.
Data index – a structured device within a database used to locate data quickly and efficiently.
Data integrity – the consistency and accuracy of the information contained within and depicted by a data model.
Data mart – a subset of data within the data warehouse (see entry below) that is oriented to a specific data set and used to provide efficient access, updating and other use of the data for stakeholders.
Data structure diagram – a drawing, digitally produced image or three-dimensional physical model that depicts the entities of a database and their relationships, as well as the constraints between them.
Data vault – a hybrid data model developed by business intelligence professional Dan Linstedt to improve tracking of historical data in a flexible, scalable, consistent manner.
Data warehouse – a large amount of information gathered by an organization from many sources; this information often changes over time and it must be analyzed and processed for use in decision-making; also referred to as the enterprise data warehouse.
Document model – a data model that enables storing and retrieval of document-oriented information, as opposed to data that is modeled based on relationships between and among different data sets.
Entity-relationship model – a model that depicts relationships between real-world entities, such as customers, in a conceptual context; items or people that are represented in data form are shown in relationship to one another, without regard to the physical structure of the database.
Flat model – the earliest and least complex data model; it depicts all data sets in a chart with columns and rows.
Hierarchical model – a model that depicts relationships between data sets in the form of a “tree,” with larger groups represented at higher levels and smaller groups represented as descending from the larger groups in order of group size.
Network model – a model that depicts relationships between large groups of data, often stakeholders within an organization; sets depicted might include customers, vendors, products and transactions.
Object-oriented model – a collection of related objects depicted graphically; types of objects depicted can include images and hyperlinks to websites.
Object-relational model – a hybrid model that incorporates principles of the relational model (see entry below), as well as the object-oriented model (see entry above).
Relational model – this is the most common data model, with data sets sorted into columns and rows; rows depict data about a specific instance of the related entity; typically written in Structured Query Language (see entry for SQL below).
Semantic data model – a data model that depicts the relationships between stored data sets and real-world objects or information.
Star schema model – a model that depicts measurements and other facts related to “dimensions,” or specific data sets, as a series of connected fact boxes shaped loosely like a star; in general, one central fact box serves as a primary data source for the rest of the surrounding fact boxes.
SQL – Structured Query Language, or the means of “communication” with relational databases; programmers use SQL to update data or retrieve data from a database.
Transactional model – a data model that depicts the information generated by interactions among different data sets; for example, a sale made to a customer.