Exploring a Pre-existing Database with SQL¶
To get familiar with a pre-existing database using raw SQL queries, you need to explore the database schema, tables, relationships, and the data.
Table of Contents¶
- Connect to the Database
- Explore the Schema
- View Data Samples
- Check how the Table is Indexed
- Analyze table sizes and usage
- Document Findings
- Misc Queries
Connect to the Database¶
Whatever database you're using has a CLI interface that can accept SQL queries.
psql -U username -d database_name # PostgreSQL
mariadb -h 192.168.4.66 -u username database_name # mariadb
Explore the Schema¶
List all Tables¶
-- PostgreSQL
SELECT table_name
FROM information_schema.tables
WHERE table_schema = 'public';
-- MySQL
SHOW TABLES;
-- SQLITE
.tables
Check table columns and data types¶
-- PostgreSQL
SELECT column_name, data_type
FROM information_schema.columns
WHERE table_name = 'the_table_name';
-- MySQL
DESCRIBE table_name;
-- SQLite
PRAGMA table_info(table_name);
Check Relationships and Foreign Keys¶
-- PostgreSQL
SELECT
tc.table_name,
kcu.column_name,
ccu.table_name AS foreign_table_name,
ccu.column_name AS foreign_column_name
FROM
information_schema.table_contracts AS tc
JOIN information_schema.key_column_usage AS kcu
ON tc.constraint_name = kcu.constraint_name
JOIN information_schema.constraint_column_usage AS ccu
ON ccu.constraint_name = tc.constraint_name
WHERE constraint_type = 'FOREIGN KEY';
It queries the
information_schema
tables, which are a builtin set of tables that
describe the structure of a database.The query: * Gets the name of the table (
tc.table_name
) that has a foreign key.* Gets the column in that table that serves as the foreign key(
kcu.column_name
)
* Gets the table that the foreign key points to (ccu.table_name AS foreign_table_name
)
* Identifies the specific column in the referenced table (ccu.column_name AS foreign_column_name
)
The JOIN/AS/ON
statements here:
* JOIN
: Combines rows from two tables based on a related column.
* table_constraints
joins with key_column_usage
based on tc.constraint_name
= kcu.constraint_name
* This makes sure that for each foreign key constraint, we get both the table and
column it originates from (kcu
) and the table/column it points to (ccu
).
* AS
: Creates an alias for columns or tables to mmake them easier to reference in
the query.
* ccu.table_name AS foreign_table_name
: Lets you refere to ccu.table_name
as foreign_table_name
.
If you're using MySQL, you might need to query the INFORMATION_SCHEMA
database in
a similar way.
View Data Samples¶
View the first few rows of a table:
SELECT * FROM table_name LIMIT 10;
Get the row count:
SELECT COUNT(*) FROM table_name;
Check how the Table is Indexed¶
Get a list of the indexes:
-- PostgreSQL
SELECT
tablename,
indexname,
indexdef
FROM
pg_indexes
WHERE
schemaname = 'public';
-- MySQL
SHOW INDEX FROM table_name;
-- SQLite
PRAGMA index_list(table_name);
Check how Tables are Linked by Joining Them¶
With the foreign keys, check how things are tied to each other.
- Look for naming conventions.
- e.g., fields like id
, user_id
, order_id
, etc.
- Use JOIN
queries to confirm relationships.
SELECT *
FROM orders o
JOIN users u ON o.user_id = u.id
LIMIT 10;
JOIN
: Combines the rows from two tables based on the specified condition (ON
).*
orders
and users
are combined where o.user_id = u.id
.* It matches the row in the
orders
table where the user_id
column corresponds
to the id
column in the users
table.*
orders o
: Creates an alias to be referenced inside the query.*
users u
does the same.
So if the orders
table contains this data:
| id
| user_id
| amount
|------|------------|---------
| 1
| 101
| 50.00
| 2
| 102
| 75.00
And the users
table contains:
| id
| name
|-------|--------
| 101
| Bob
| 102
| Alice
Then that query will combine those into this:
| id
(orders
) | user_id
| amount
| id
(users
) | name
|-----------------|------------|----------|----------------|-------
| 1
| 101
| 50.00
| 101
| Bob
| 2
| 102
| 75.00
| 102
| Alice
Analyze table sizes and usage¶
-- PostgreSQL
SELECT
relname AS table_name,
pg_size_pretty(pg_total_relation_size(relid)) AS total_size
FROM
pg_catalog.pg_statio_user_tables
ORDER BY
pg_total_relation_size(relid) DESC;
relname
: Builtin for PostgreSQL.
* It's a column in the pg_statio_user_tables
and refers to the name of the table
(or relation, in PostgreSQL terminology).pg_size_pretty()
: Converts raw bute sizes into human-readable format.pg_total_relation_size()
: Calculates the total size of a table, including
indexes, TOAST data, etc.*
pg_statio_user_tables
: A PostgreSQL system view that provides stats for
user-defined tables.
The MySQL equivalent:
-- MySQL
SELECT
table_name AS Table
round(((data_length + index_length) / 1024 / 1024), 2) AS Size_MB
FROM information_schema.TABLES
WHERE table_schema = "database_name"
ORDER BY Size_MB DESC;
ROUND()
: Builtin function to round a number to a specific number of decimal
places.information_schema.TABLES
: Contains metadata about all tables in the database.DESC
: Order results in descending order (largest first).
Document Findings¶
Keep notes of the database to reference.
Keep a list of tables, their purposes and relationships, and note improtant columns -
primary keys, foreign keys, and indexed columns.
Misc Queries¶
Use "exploratory queries" to learn how the data behaves.
-- Aggregate data
SELECT column_name, COUNT(*)
FROM table_name
GROUP BY column_name;
-- Analyze date ranges, averages, patterns
SELECT AVG(column_name), MIN(column_name), MAX(column_name)
FROM table_name;