As your organization’s data competency grows, so too does the quantity of data it needs to store. If you currently store all of your data on Excel spreadsheets, Google Sheets, or other similar storage tool, you may be noticing that a weakness of such files is that it doesn’t scale; more specifically, it feels clumsy and wasteful of computer storage space to have to maintain a massive and ever-growing number of individual, disconnected, locally-stored Excel files.
That’s where databases come in. A database is an organized, centralized data storage repository, into and out of which vast quantities of data can be inserted and retrieved. Users typically insert data into, and retrieve data from, databases using a Database Management System, or DBMS. The benefits of using a database for your data storage over Excel sheets or other flat file is that:
- databases only require data to be stored in one location
- single-copy data storage means that data only needs to be edited/updated once when incorrect values are discovered (imagine discovering an incorrect data point or calculation that appears on 100 different Excel files — you’d have to go in and edit every single file!)
- databases provide a single source of truth, no matter how many users are entering data into, and retrieving data from, the database.
Different Types of Databases
There are many different types of databases, each with its own strengths and weaknesses, but we’ll focus on the two types that are by far the most popular for the purposes of this article.
Relational databases store data in a rigidly-defined collection of tables with rows and columns, much like a spreadsheet with many tabs. This collection of tables is known as a schema, and the tables themselves are known as relations, thus the name “relational” database. The tables are interconnected using a series of what we call keys, which are columns that multiple tables have in common. For example, a set of tables may each have a key column representing a unique customer ID#, where each customer ID# corresponds to the same specific customer across tables. Relational databases also usually have constraints programmed into them, which ensure strong data integrity across the database.
We typically interact with relational databases’ DBMS using a programming language called SQL, shorthand for Structured Query Language; SQL is a fairly intuitive language that allows for non-highly-technical users to insert and retrieve data into and from the database using a straightforward programmatic syntax. Many different relational database frameworks make use of SQL, including MySQL and Postgres ―both open-source database frameworks― in addition to payware options such as Microsoft SQL Server, DB2, and Oracle Database.
Non-relational databases, also known as NoSQL databases, store data in a loosely-defined collection of tables which, unlike in relational databases, may or may not be interconnected in a larger schema. As a result, non-relational databases tend to perform much better for storing many different types of data than relational databases; this is because relational databases tend to have very strict data type constraints programmed in, and also have a much less flexible structure than non-relational databases.
There are several ways to store data in a non-relational database, but two stand out from the crowd.
The first way is through a Key-Value Mechanism, which stores one or more data points according to a lookup value known as a key (if you’re familiar with the VLOOKUP formula in Excel, that is not a bad analogy). Amazon’s DynamoDB is probably the best-known key-value non-relational database framework.
The second way to store data in a non-relational database is through a Document Store, which is similar to a Key-Value Mechanism except that it stores “documents” according to a document name, which is the key in a Document Store; the “document” which is being looked up can be a data point or dataset of any format, which makes this mechanism especially helpful for storing unstructured data. MongoDB is the best-known document store database framework out there today.
Databases and Your Organization
Are you fed up with mind-numbing data entry and editing across a huge cohort of disconnected spreadsheets? Worried about the amount of data storage your organization is using up? Struggling to coordinate your data collection, storage, and consistency among many users? Databases are the perfect solution to all of these issues. We at Boxplot Analytics are strong advocates for database-oriented data storage solutions, and we’d be happy to have one of our experts go over your options with you over Zoom. Not sure whether relational or non-relational databases are right for your organization? We’d be happy to go over that as well.