by boxplot Mar 8, 2018 10:48 am
What is SQL?
Currently, if you Google this question, you’ll get a whole slew of technical articles that aren’t very helpful for understanding just what SQL is and when people use it. We’ll break that down in this blog post. This is a high level overview – if you want to understand how to actually write queries, check out some of my other blog posts.
SQL stands for “Structured Query Language.” Some people spell it out when referring to it (“S-Q-L”) and some people say something that sounds like “sequel.” Both are acceptable. Notice the word “language” in there. The word “SQL” just stands for the language itself. To use SQL, you need to pick a product first. Here are some popular SQL products:
As you can see, each product comes with its own version of SQL language and its own program for typing in SQL queries and running them. Most of the programs are free to download and use. Storing data on the server is what costs money, and usually your company’s IT department will handle all that. They’ll set up a server, put the data onto it, and then give you a username and password to connect to that server.
Even though each of these programs use their own SQL language, the good news is that SQL is standardized – that means that the major concepts won’t change no matter which program you are using. The bad news is that there will be small (and annoying) syntax changes taht you’ll need to be aware of if you switch between programs. For example, concatenation in Microsoft SQL and MySQL does the same thing – puts strings together. But in MySQL, you’d type
SELECT CONCAT('MS','SQL','Tips') whereas in Microsoft SQL Server you’d type
SELECT ('MS' + 'SQL' + 'Tips').
By the way, most of these programs look almost exactly the same. Here’s what pgAdmin looks like:
The other programs have an almost identical layout. You’ll type your queries in the box at the top of the screen, you’ll see what servers/databases you are connected to on the left side of the screen, and you’ll see a preview of the results of your query in the box towards the bottom of the screen.
If you’re a data analyst, or someone who works with data in their job, you will be using SQL to extract data from a server. Companies, even small ones, have massive amounts of data associated with their business. They have data about their employees. Data about who visits their website. Data from social media platforms about who their fans are or where the fans are clicking. Data about sales and customers… you get the picture! Very quickly, a business owner will realize that s/he cannot store all of this information in Excel. The data will simply become too large. So, the solution is to store the data in a server.
A server is a machine whose entire purpose is to store data. Picture a CPU (like for a desktop computer) but with no mouse, no keyboard, no monitor. It’s literally a physical box that just sits, turned on, all day and stores your data. This box could be located physically in your company’s office, but that’s unlikely. Most companies pay for space on Google or Amazon’s servers, which are stored in giant warehouses across the country. To help organize your data, you can have multiple “databases” on that server, and then within each database, you can have multiple “tables.” Graphically, the hierarchy looks something like this:
You use a SQL program (like one of the ones listed above) to connect to this server (and thus also the databases and tables) and then write SQL language to tell the server exactly what data you need for whatever project or question you’re working on.
Real World Sequence
Okay, let’s put it all together in practice. In the real world, the scenario might look something like this:
Your boss asks you a question.
It might be something as simple as “how many customers do we currently have?” (which would be a single number as an answer) or something as complicated as “tell me everything you can find out about our Facebook presence” (which would be a series of dashboards based on your company’s Facebook data).
Get the data you need to answer the question.
You open up your SQL Program (Microsoft SQL Server, Postgres, MySQL, whichever) and write a query that will return exactly the data you need to answer your boss’s question. No more no less. Then you export it (usually as a .csv file).
Analyze the data.
You open up that export from the previous step and use an analysis program like Microsoft Excel, Tableau, or Python to get the answer to your boss’s question. So for example, if you exported as a .csv file, you might open up that .csv file in Microsoft Excel and do some formulas, charts, and PivotTables to get results.
Pass off the results and repeat.
Send the results to your boss – this could mean sending an email with a number in it or sending an entire Excel file with multiple dashboards in it. Or making a PowerPoint and presenting it. Whatever format your company wants is the format in which you’ll deliver the results. Then, you’ll repeat all of these steps for the next question!
While it’s nice to understand that there is a server, as the analyst you don’t usually need to get too caught up in the inner workings of servers and databases. Analysts are typically not responsive for creating databases, changing databases, getting data into databases, or organizing databases. You just need to know how to run SQL queries to extract the data needed to answer questions for your organization.
Need help applying these concepts to your data? Ask your department or organization to subscribe to our DataChat program.
Continue to make data-driven decisions.
Sign up for our email guides that contains relevant tips, software tricks, and news from the data world.
*We never spam you or sell your information.