Fundamentals of SQL
Introduction
I'm currently learning SQL from DataQuest and other resources. In this blog post, I'll write about what I've learned about the following essential clauses: SELECT, FROM, WHERE, GROUP BY, HAVING, ORDER BY, and LIMIT. These clauses provide the foundation for constructing powerful and efficient queries. Later I'll write about more advanced techniques like joins, window functions and common table expressions.
SELECT
The SELECT clause is the starting point of any SQL query. It allows you to choose the columns you want to retrieve from your database table. You can either specify each column individually or use an asterisk (*) to select all columns in the table. Selecting only the necessary columns can significantly improve query performance, especially when working with large datasets.
SELECT first_name, last_name, email
FROM customers;
SELECT *
FROM customers;
FROM
The FROM clause is used to specify the table or tables that a query will retrieve data from. It is an essential part of a SELECT statement, which is used to retrieve data from a database. The FROM clause is followed by the name of the table or tables that contain the data that you want to retrieve. In most cases, you will only need to specify a single table, but you can also retrieve data from multiple tables using JOIN clauses, but I won't go into detail about that here.
SELECT first_name, last_name, address
FROM employees;
WHERE
The WHERE clause enables you to filter data based on specific conditions. This is particularly useful when you want to analyze a subset of your data or retrieve records that meet specific criteria. Using the WHERE clause effectively can help reduce the amount of data you need to process and improve query performance.
SELECT first_name, last_name, age
FROM customers
WHERE age >= 18;
GROUP BY
The GROUP BY clause allows you to group rows that have the same values in specified columns. This is especially useful when you want to perform aggregate functions like COUNT, SUM, AVG, MIN, or MAX on your data. GROUP BY is an essential tool for summarizing and analyzing large datasets.
SELECT department_id, COUNT(employee_id) as num_employees
FROM employees
GROUP BY department_id;
HAVING
The HAVING clause is used in combination with GROUP BY to filter the results of an aggregate function. While the WHERE clause filters rows before they are grouped, the HAVING clause filters groups after they have been aggregated. This enables you to focus on specific groups that meet your criteria.
SELECT department_id, COUNT(employee_id) as num_employees
FROM employees
GROUP BY department_id
HAVING num_employees > 10;
ORDER BY
The ORDER BY clause allows you to sort the result set based on one or more columns in ascending (ASC) or descending (DESC) order. This is particularly helpful when you want to display data in a specific order for reporting or further analysis. The default sorting is ASC, so you don't need to type it out if you want to sort from smallest to largest.
SELECT first_name, last_name, salary
FROM employees
ORDER BY salary DESC;
LIMIT
The LIMIT clause enables you to restrict the number of rows returned by a query. This can be useful when you want to view a sample of your data or display only the top results. Limiting the number of rows returned can also improve query performance.
SELECT first_name, last_name, salary
FROM employees
ORDER BY salary DESC
LIMIT 10;
Order of clauses
There's a recommended order of clauses when you're writing SQL queries. The order is generally flexible and can vary depending on the requirements of the query. However, there are some best practices for organizing the clauses in a logical and readable manner. Usually, you'll want to order your query like this:
SELECT > FROM > WHERE > GROUP BY > HAVING > ORDER BY > LIMIT
However, the order of clauses is not the same as the order of execution.
Order of execution
The order of execution refers to the order in which the different parts of the query are processed by the database engine:
FROM > WHERE > GROUP BY > HAVING > SELECT > ORDER BY > LIMIT
The database engine first evaluates the FROM
clause to determine the tables to be used in the query, then applies any join conditions specified in the JOIN
clauses, and so on.