Databases are inescapable today. Most people, whether they are aware of it or not, are using databases in their lives daily—some are using it every hour, every minute of the day. Modern computing relies heavily on databases and we live in an era of “Big Data” and “Cloud” computing more than ever. It’s clear that understanding databases is quite important to those of us in the digital industry.

With at all that in mind, I’m reviewing a Stanford/Coursera course, “Introduction to Databases,” by Jennifer Widom. I thought I’d summarize some notes as I go along.

1. Introduction

Database Management Systems (DBMS)

Key feature requirements of database design:

Massive and scalable — should be able to handle huge volumes of data — terrabytes or more, sometimes per day.
Persistent - the data must persist independent of the program and its relationship to memory
Safety — to ensure that the data remains safe from any point of failure, such as hardware, software, power, users or hackers
Multi-user — must be able to control for concurrency of users manipulating data
Convenient — a database requires (a) physical data independence (the structure of the database should not affect the program usability) and often (b) the queryig of a database utilizes high-level query languages that are declarative.
Efficient — it must be able to handle thousands/queries per second. In other words, performance, performance, performance
Reliable — it must be able to work 99.9999% of the time.

A fairly convincing feature list for strong database systems design.

However, not all data-intensive applications and programs will use…databases. Instead, you can file-based data storage, including things like Excel files. Hadoop, for example, is a processing framework for running operations to run on files.

Key concepts of DBMS designs

Data models — a description of how the data is structured. The general form of the data stored in the DB. The most common one is the relational data model. Another model is the XML document, not a set of records, but as a hierarchical structure of data values. Or a graph data model, in terms of nodes and edges.
Schema vs. data — Kind of like types and variables in programming languages. A type-token relationship structure.
Data definition language (DDL) — Schemas are set up with a DDL for a database
Data manipulation or query language (DML) — A data manipulation language queries and modifies the database

Key people involved

DBMS implementer — person that builds the database system
Database designer — person who establishes the schema for a database for any given application
Database application developer — an interface between the user and the database. Many programs can interface with a database.
Database administrator — maintains and keeps the DB running smoothly. Require tuning parameters to keep optimum performance. Highly valued personnel.

The focus of the course is primarily with database designers and application designers.

2. The Relational Model (RM)

The relational model is more than 35 years old and is the foundation of most DBMS, including commercial DBMS. It’s an incredibly simple model — which is one of its strengths.

A relational model also relies on high-level query languages that are clear, expressive and declarative. Their implementations tend to be efficient as well.

Basic constructs of the RM

A database is a set of named relations (or typically called tables)
Each relation (or table set) has its own set of named atributes (or columns)
The actual data is stored in what is called tuples (or the rows)
Each attribute has has a type (or domain) — e.g., an attribute could be a type of string, an integer, float, binary, enumerated domain, etc. Types can be atomic or structured (more complex types of values in data attributes).
Schema — the structural description of relations (for example a relational table of Students with set attributes, usually set ahead of time)
Instance — actual contents at a given point that fill into a schema, numbering thousands as it grows
Null values — unknown or undefined values. Can be problematic if null values are queried.
Key — an attribute whose value is unique in every tuple/row. Non-unique attribute values are not keys. An attribute of a college may not be unique (there may be colleges, for example, with the same name). A numeric index may, however, be very unique.

Example of creating RM relations and tables in SQL:

Create Table "Student"(ID, name, GPA, photo)
Create Table "College"(name string, state char(2),enrollment integer)

The RM thus allows for a high-level declarative query language.

Querying a relational database

The following are some basic steps in creating and querying a relational database:

Design a schema (attributes/columns) and create it using DDL (see above)
“Bulk load” the initial data to fill in the tuples/rows
Finally, execute queries on the DB and retrieve results. Or update the data itself.

Relational databases allow for high-level ad-hoc queris. In all relational query languages, when you ask a query over a set of relations, you get a whole new query object result. This feature of query languages is called a closure of the object. Another query can be called on this returned object — this is what is known as compositionality.

Two Basic Relational Query languages

Relational algebra — a formal language
SQL — an actual, implementation of a relational algebra

These are two basic forms of how the sematics of a query language is defined.

Notes & Scribbles

Introduction to Databases

Filed under