Andrew Jones

0817804 J01209

 

CO7107 Data Warehousing and Multi-dimensional Databases

Assessment


 

Table of Contents

Table of Figures. 2

Introduction. 3

Section 1. 3

How does it work. 4

Section 2. 7

Section 3. 9

Outline evolution of databases and discuss how they have tried to address problems. 9

Conclusion. 12

 


 

Table of Figures

Figure 1 DBMS as an interface. 4

Figure 2 Physical and Logical Databases (Microsoft Corporation, 2015). 4

Figure 3 Composition of a DBMS. 5

Figure 4 Gartner Magic Quadrant of DBMS providers. 5

Figure 5 five DBMS models. 6

Figure 6 DBMS Rankings by popularity (DB-engines, 2015). 6

Figure 7 DBMS Model by popularity (DB-engines, 2015). 7

Figure 8 Commercial Industrial Sectors. 8

Figure 9 consistency in reporting. 9

Figure 10 the mathematics of ER Model 10


 

“What is our business? And what business are we in?” Peter Drucker.

Introduction

 

In this report, the nature of a DBMS (Database Management System) will be considered. In addition, how it works will be investigated and an exploration of the market will be undertaken to ascertain what is considered to be leading in the field. An organisation’s needs for data, how this may have changed over the years and how DBMS has evolved to cope with these changes will be considered. PICK OUT KEY POINTS RE. THESE FOR CONCLUSION

 

 

 

 

Section 1

A database, (db), is “A structured set of data held in computer storage and typically accessed or manipulated by means of specialized software.” (database, n., 2015)

This specialized software introduces the term database management system (DBMS) which is used to describe the application suite which administers and manages the database. Within the database, software tools are principally employed for the: storing, indexing, modification, retrieval, transitioning, and querying of data within the system.

Gartner defines a DBMS as a complete software system used to define, create, manage, update and query a database, by which we mean an organized collection of data that may be structured in multiple formats and stored in some form of storage medium.

The DBMS market is defined by relational and nonrelational database management products that are suitable for a broad range of enterprise-level transactional applications. (Computer Weekly, 2015)

How does it work

The DBMS is an interface between the database and the entities that need and require access to the database.

Figure 1 DBMS as an interface

Advantages of the DBMS

Disadvantages of a DBMS

·         Data sharing

·         Increased costs

·         Data security

·         Management complexity

·         Data integration

·         Maintaining currency

·         Referential integrity

·         Frequent upgrade/replacement cycles

·         Minimized data inconsistency

 

·         Improved data access

 

·         Improved decision making

 

·         Increased end-user productivity

 

Table 1 Characteristics of a DBMS the advantages and disadvantages (My Reading Room, 2012)

A DBMS is composed of three principal elements, these are:

1.       Physical database management

Transparent to the user is the storage and retrieval of the logical database over multiple physical files in the storage medium.

Figure 2 Physical and Logical Databases (Microsoft Corporation, 2015)

2.       Database engine

The Database Engine is the principal service for storing, processing, and securing data; it is this package element that is responsible for the: create, read, update and delete (CRUD) functionality of the DBMS.

3.      

Figure 3 Composition of a DBMS

Database schema

The db schema is a logical data storage structure. The schema is composed of a number of schema objections, known as a collection. Some examples of schema objects are:

·         tables

·         views

·         indexes

·         database links

·         procedures

·         functions

·         packages

The DBMS manages a single logical presentation of the data safeguarding concurrency, data integrity, security and backup.

Of the DBMS on the market, those considered to be leading products are Oracle, Microsoft, IBM and SAP. These organisations are so market dominant that there is little left for rival companies other than those involved in very specific market niches, such as Neo Technology with their graph product. Hybrid capabilities extend the functionality of the RDBMS to more use cases, making a stronger case for their ubiquity. (Mullins, Key criteria for deciding if a relational DBMS meets your IT needs, 2015)Relational databases, or RDBMSes, became the norm in IT more than 30 years ago as low-cost servers became powerful enough to make them widely practical and relatively affordable.” (Mullins, Evaluating the different types of DBMS products , 2015)

Figure 4 Gartner Magic Quadrant of DBMS providers

Figure 5 five DBMS models

There are broadly five DBMS models to consider excluding RDBMS. These include key-value DBMS models which utilize “a key-value pair (KVP) which is a set of two linked data items: a key, which is a unique identifier for some item of data, and the value, which is either the data that is identified or a pointer to the location of that data”. (Rouse, 2008). Another model is the document model, in which every record in viewed as a document. (MongoDB, 2015). The column model stores data in columns instead of rows. (Rouse, 2010). The graph model applies the mathematical practice of graph theorem. This model significantly differs from other models in that it allows for ACID compliance, in common with RDBMS. A: atomicity. C: consistency. I: isolation. D: durability. This model is a cornerstone of transaction processing. Compliance with the model ensures that transactions with the database are processed dependably, in that they are successfully committed or completely rolled back.

Figure 6 DBMS Rankings by popularity (DB-engines, 2015)

Figure 7 DBMS Model by popularity (DB-engines, 2015)

The other models employ a BASE methodology. Basically Available, Soft state, Eventual consistency. This BASE model does not provide the guarantees in data transactions that ACID offers, but reflects the complexities of the data managed by these DBMS.

The final form of DBMS is an emergent form called the in memory model. These databases’ data is “stored in main memory to facilitate faster response times”. (Rouse, in-memory database, 2012)

Section 2

Business run on data: the amount of a sale, the cost of a purchase, and the wages of staff: but businesses survive on cash flow and strategic planning; typically one of the first strategic Information Technology and Information Systems decision an organisation makes is its accounting system. Its first serious venture into Business Intelligence (BI). The business can now start to gather, report and plan consistently on the organisations Key Performance Indicators (KPI).

Businesses do not start big. They start small and grow.

Figure 8 Commercial Industrial Sectors

For a small business information needs are initially awfully basic and can be addressed predominantly utilising lists, but as the business grows, and the models for management start to formalise, there becomes a requirement for the enhanced management of data and information sources. During this business growth, the business starts to develop formal reporting structures for its operational data, tactical information, together with its strategic management information. Sources of this data and information include:

·         accounting information

o   sales order processing

o   purchase order processing

o   profit and loss accounts

o   balance sheet

·         Inventory management

·         Production data

Over time the organisation will have generated empirical data to work with, consequently it can begin to forecast, complete empirical comparisons and plan for the future needs. If the business itself continues to grow, it may require other functions to perform its operational and strategic activities. Such functions are:

·         Payroll

·         Marketing

·         HRM

·         CAD/CAM

Also there will be an extension of IT into other areas of business. Unless planned for, local and workgroup developed databases and technology infrastructure will prove difficult and costly to integrate into an overall strategy. There may also be data model alignment issues as a result of mergers or acquisitions.

As the business grows into the large phase, data security issues will rise to prominence. The changes in data value are now very real and the threats to them incredibly serious. Breaches in security are costly to the organisation.

In this large phase of business development the physicality of data requirements will also change. There will be a lot more computers, tablets, phones, scanners, copiers, printers and cameras to manage. In addition there will be the large scale server, mainframe and networking installations required to service the demands, needs and wants of the business. Fundamentally it comprises a lot of infrastructure.

Business Intelligence (BI), is an analytical data analysis process aimed at enhancing business performance by facilitating key decision makers with the ability to collect, store, retrieve and analyze data to make better-informed decisions. It provides for a consistency, which may otherwise be lacking, in reporting and analysis. As the maxim goes: knowledge is power: but the transitioning from data, through information to result in knowledge is time consuming.

With this in mind there is and should only be “one version of the truth”, a single set of data that provides the source for all analysis, planning and decision making.

Data warehousing and BI have become almost interchangeable terms, as definitions vary for what is a very broad term. It is not just a decision support system, nor is it a methodology, it does not require a data warehouse, and it is not the tool set.

Figure 9 consistency in reporting

Data quality is important in the creation of the data warehouse, as the adage goes: garbage in garbage out.

BI is/can be about monitoring change. As a business organization it is beneficial to be able to react to change. In addition the business may want to cause change. Small businesses can survive and prosper on the instincts and intuition of the proprietor; decisions are not necessarily made from factual evidence. As the business grows, layers of management are introduced and “buy in” from stakeholders is required. The business formalizes and professionalizes.

Section 3

Outline evolution of databases and discuss how they have tried to address problems

Change happens at a rate; the rate of this change, n, will affect an organisations’ ability to deal with the results, positive or negative of the change. The faster the rate of n the more agile the organisation needs to be in order to facilitate a successful outcome to the change. The less agile the organisation the more the change will meet resistance. This resistance to change can become inherent as the management structure of the organisation grows.

Flat file

In databases a flat file refers to data files that contain records with no structured relationships. (Webopedia, 2015)

In SQL for Dummies, an introduction to Structured Query Language, Allen G. Taylor notes that the advantage of a flat file is that it takes up less space than a structured file. However, it does require the application to have knowledge of how the data is organized within the file.

Almost pervasively thought of as:

A flat file database is a database that stores data in a plain text file. Each line of the text file holds one record, with fields separated by delimiters, such as commas or tabs. (TechTerms.com, 2015)

Hierarchical database model

A hierarchical database consists of a collection of records that are connected to each other through links. Each record is a collection of fields (attributes), each of which contains only one data value. A link is an association between precisely two records.

Two influential database systems that rely on the hierarchical model are IBM’s Information Management System (IMS) [IBM 1978a, McGee 1977] and MRI’s System 2000 [MRI 1974, 1979]. The first IMS version was developed in the late 1960s by IBM and by North American Aviation (Rockwell International) for the Apollo moon-landing program. (Silberschatz, Korth & Sudarshan, 2010)

Network model

A network database consists of a collection of records connected to one another through links. Each record is a collection of fields (attributes), each of which contains only one data value. A link is an association between precisely two records. (Silberschatz, Korth & Sudarshan, 2010)

Relational model

The relational model is cantered on this idea: the organization of data into collections of two-dimensional tables called “relations.” (Ullman & Aho, 1994)

Figure 10 the mathematics of ER Model

Entity–relationship model (ER)

ER Model is based on Strong Mathematical Foundations. The ER model is based on:

1.       Set Theory,

2.       Mathematical Relations,

3.       Modern Algebra,

4.       Logic, and

5.       Lattice Theory.

In the relational model the mathematical relation construct is used to express the “structure of data values,” while in the ER model the same construct is used to express the “structure of entities.”

ER Model has Explicit Linkage between Entities. The linkage between entities is explicit in the ER model while in the relational model is implicit. In addition, the cardinality information is explicit in the ER model, and some of the cardinality information is not captured in the relational model. (Chen, 2015)

Object model

An object-oriented database management system (OODBMS), sometimes shortened to ODBMS for object database management system), is a database management system (DBMS) that supports the modelling and creation of data as objects. This includes some kind of support for classes of objects and the inheritance of class properties and methods by subclasses and their objects. There is currently no widely agreed-upon standard for what constitutes an OODBMS. (Rouse & Kumar, object-oriented database management system (OODBMS or ODBMS), 2005)

Document model

Also known as JSON store or database

Document Stores provide more functionality: the system recognizes the structure of the objects stored. Objects (or documents) may have a variable number of named attributes of various types (integers, strings, and possibly nested objects), objects can grouped into collections, and the system provides a simple query mechanism to search collections for objects with particular attribute values. (Cattell, 2010)

Graph model

Formally, a graph is just a collection of vertices and edges—or, in less intimidating language, a set of nodes and the relationships that connect them. Graphs represent entities as nodes and the ways in which those entities relate to the world as relationships. This general-purpose, expressive structure allows us to model all kinds of scenarios. (Robinson, Webber & Eifrem, 2013)

Key value

Key-value stores are the simplest NoSQL databases. Every single item in the database is stored as an attribute name (or "key"), together with its value. (MongoDB, 2015)

Key-value Stores provide a distributed index for object storage, where the objects are typically not interpreted by the system: they are stored and handed back to the application as BLOBs. However, these systems usually provide object replication for recovery, partitioning of the data over many machines, and rudimentary object persistence. (Cattell, 2010)

Columnar

Are optimized for queries over large datasets, and store columns of data together, instead of rows. (MongoDB, 2015)

Don’t forget to comment on how data requirements haven’t changed significantly however as the organisation grows there are additional functionalities that generate data that need to be included in the information system TRY TO FIND REF.

 

Conclusion

On the horizon are a number of other db related ideas: CloudDB, DBaaP, DBaaS, DBaaA

In conclusion, the market for DBMS is dominated by four major organisations: Oracle, Microsoft, IBM and SAP. All of these organisations have successfully read the market requirements for data management systems and users of other systems have been profiled and collectively marginalised into exceptionally narrow market niches.