Designing and Creating a Relational Database for Car Sales.

Cendikia Ishmatuka
11 min readFeb 23, 2024

--

Source: www.google.com

This is the final project for PACMANN's online course on SQL and Relational Databases class. In this article, I will share what I have learned about designing and creating a relational database using PostgreSQL and pgAdmin4.

The tutorial's code and related resources are available on my GitHub repository.

A relational database is a type of database that stores related data and provides access to it, where each row in the table has its unique ID called key [1]. Relational database management systems (RDBMS), such as Oracle, Microsoft SQL Server, MySQL, PostgreSQL, allow us to perform essential database functions, including create, read, update, and delete, using SQL [3]. SQL, also known as Structured Query Language, is currently the most widely used language for relational database management systems (RDBMS) [2,3]. This language is used to manage data stored in the relational database in a tabular form [3].

Background.

In the final project for PACMANN’s SQL & Relational Database class, the students are tasked with designing a relational database to track car sales. The database must include the following features:

  1. Every seller on this database can offer more than one second-hand car.
  2. Before selling their cars, the seller must complete their data such as name, phone contact, and address (the city they live in).
  3. All sellers will offer their products through an advertisement, which will be shown on a website.
  4. The advertisement will contain a title, detailed product information, and the seller’s phone contact.
  5. The advertisement must include the brand, model, body type, transmission type, car production year, and price.
  6. Buyers can search for products based on the seller’s location, car brand, or body type.
  7. If the buyer is interested in purchasing a car, they can bid on the price if the seller allows it.
  8. Purchase transaction is done outside the application, so it is not in the project’s scope.

So, in this article, we will design and create a database for buying and selling used cars to store car data and buyer offers for cars sold by sellers that can facilitate user transactions on the website.

Designing The Database.

Step 1. Create the Table Structures.

After identifying the mission statement, the first step is determining the objects required for the database. We need to store the following objects: the information of the sellers and buyers (including name, phone number, and address), the car’s information (including brand, model, body type, transmission type, year of production, and price), and the bids made by the buyers. So, the table structures will be shown below.

As shown above, each table has its column. For instance, the ‘sellers’ table consists of 4 columns: ‘seller_id’, ‘name’, ‘phone’, and ‘city_id’. You may have noticed that each table column has a name and data type. This is because it helps to organize and understand the information in the table. Here is the explanation for each data type used [3].

  1. int: integer.
  2. varchar(N): string or text with maximum length N.
  3. date: a date with the format ‘YYYY-MM-DD’.
  4. decimal(M, N): float number with a maximum of M total digits and N decimal digits.
  5. boolean: 0 or 1 (False or True).

Now, let’s identify the primary and foreign keys for each table. The primary key (PK) is a column that contains unique values for each row in a table. On the other hand, a foreign key (FK) is used to connect a table to another one. It contains the primary key of the related table [3].

Step 2. Determine Table Relationship.

Table relationship refers to the association between two tables in a database. It defines how the data in one table relates to the data in another. One of the most essential keys in designing a good database is correctly determining the relationship between the tables that have been created.

There are three types of table relationships:

  1. One-to-one (1:1): one record in the first table has precisely one relation in the second table, and vice versa.
  2. One-to-many (1:N): one record in the first table has more than one relation in the second table, but one record in the second table has only one relation in the first table.
  3. Many-to-many (N:N): one record in the first table has more than one relation in the second table, and vice versa.

The table relationship of our database is shown below.

Now we know that the ‘cities’ table has a one-to-many relationship with table ‘sellers’ and ‘buyers’, the ‘buyers’ and ‘product_onsales’ tables have a one-to-many relationship with table ‘bids’, the table ‘sellers’ and ‘products’ have a one-to-many relationship with table ‘product_onsales’.

Step 3. Determine the Business Rule.

The business rules are company rules that must be reflected in the database. In other words, we must add constraints for each field in the database. These are the business rules of our car sales database.

Step 4. Implementing a Relational Database.

Now, we are ready to create our database in PostgreSQL using pgAdmin4. To begin, open pgAdmin4 and create a new database.

To name our database, enter 'carsales-db' and click the save button. Then, to create the tables, we will use this Data Definition Language (DDL) syntax, as shown below.

DROP TABLE IF EXISTS bids;
DROP TABLE IF EXISTS product_onsales;
DROP TABLE IF EXISTS sellers;
DROP TABLE IF EXISTS buyers;
DROP TABLE IF EXISTS products;
DROP TABLE IF EXISTS cities;


-- Create City table.
CREATE TABLE cities(
city_id SERIAL PRIMARY KEY,
name VARCHAR(50) UNIQUE NOT NULL,
longitude DECIMAL(10,7) NOT NULL,
latitude DECIMAL(10,7) NOT NULL
);

-- Create Seller table.
CREATE TABLE sellers(
seller_id SERIAL PRIMARY KEY,
name VARCHAR(50) UNIQUE NOT NULL,
phone VARCHAR(15) NOT NULL,
city_id INT,
CONSTRAINT fk_city
FOREIGN KEY(city_id)
REFERENCES cities(city_id)
);

-- Create Buyer table.
CREATE TABLE buyers(
buyer_id SERIAL PRIMARY KEY,
name VARCHAR(50) UNIQUE NOT NULL,
phone VARCHAR(15),
city_id INT,
CONSTRAINT fk_city
FOREIGN KEY(city_id)
REFERENCES cities(city_id)
);

-- Create Products table.
CREATE TABLE products(
product_id SERIAL PRIMARY KEY,
merk VARCHAR(50) NOT NULL,
model VARCHAR(50) NOT NULL,
body_type VARCHAR(50) NOT NULL,
transmission_type VARCHAR(50) NOT NULL,
year INT NOT NULL
);

-- Create Product on Sales table.
CREATE TABLE product_onsales(
sales_id SERIAL PRIMARY KEY,
seller_id INT NOT NULL,
product_id INT NOT NULL,
price INT NOT NULL,
is_negotiable BOOLEAN NOT NULL,
date_post DATE NOT NULL,
CONSTRAINT fk_seller
FOREIGN KEY(seller_id)
REFERENCES sellers(seller_id),
CONSTRAINT fk_product
FOREIGN KEY(product_id)
REFERENCES products(product_id)
);

-- Create Bids table.
CREATE TABLE bids(
bid_id SERIAL PRIMARY KEY,
sales_id INT NOT NULL,
buyer_id INT NOT NULL,
date_bid DATE NOT NULL,
bid_price INT NOT NULL,
CONSTRAINT fk_sale
FOREIGN KEY(sales_id)
REFERENCES product_onsales(sales_id),
CONSTRAINT fk_buyer
FOREIGN KEY(buyer_id)
REFERENCES buyers(buyer_id)
)

You might see there are a lot of phrases ‘NOT NULL’ in defining our columns. This means that the columns in a table must have a value and cannot be left empty, indicating that NULL values are not allowed as specified in the business rule. Although the primary key column cannot have any null values, we don’t need to set it explicitly using ‘NOT NULL’ [3]. We can use the query tool available in pgAdmin4 to execute this DDL syntax.

Then, our table will be available in the database.

Afterwards, we can create an Entity Relationship Diagram (ERD) to represent the database design.

And here is the ERD.

Populating the Database.

The next task that needs to be completed is to populate the database that has been created. This time, we will generate dummy data using Python and save the dummy data in CSV format. Utilizing Python for this task allows us to customize the dummy data according to our specific requirements, ensuring a comprehensive evaluation of the database’s capabilities. The Python script for generating this dummy data can be seen here.

Then, the next step is to import our generated dummy data into the table. To import a table, right-click on it and select the “import/export data” option.

Then, navigate to where we saved the CSV file and click OK. Make sure that we import the correct file for each table.

Due to the relationship between the tables, there is a specific order in which we need to import the dummy dataset. We must import the ‘cities’ and ‘products’ tables first, as they don’t have any foreign keys. Then, import data for the ‘sellers’ and ‘buyers’ table, followed by the ‘product_onsales’ table, and lastly the ‘bids’ table.

Query Examples.

Once we have finished designing, creating, and populating our database, we can start interacting with the stored data using SQL queries. These queries enable us to retrieve, analyze, and manipulate the data in a way that suits our specific requirements. To showcase SQL's extensive capabilities and flexibility in terms of data manipulation and retrieval, we have prepared a set of query examples.

Transactional Queries.

Example 1. Look for cars with a production year above 2015.

SELECT 
sales_id,
product_id,
model,
year,
price
FROM products
JOIN product_onsales USING(product_id)
WHERE year >= 2015
ORDER BY 3,4;

Output:

Example 2. Add new bid data.

The table before added with new bid data:

INSERT INTO bids(bid_id, sales_id, buyer_id, date_bid, bid_price)
VALUES(216, 46, 75, '2023-03-01', 119100000);

After adding new bid data:

Example 3. Find all the cars that Henry Best sells.

SELECT 
sales_id,
merk,
model,
year,
price,
date_post
FROM product_onsales
JOIN sellers USING(seller_id)
JOIN products USING(product_id)
WHERE sellers.name = 'Henry Best'
ORDER BY date_post;

Output:

Example 4. Find the most affordable car by searching for “Alya” as the keyword.

SELECT 
product_id,
merk,
model,
year,
price
FROM product_onsales
JOIN products USING (product_id)
WHERE model LIKE '%Alya'
ORDER BY price;

Output:

Example 5. Find the closest sold car based on city_id (Samarinda => city_id = 6472)

-- Create function first to calculate the distance
CREATE OR REPLACE FUNCTION calculate_distance(
lat1 DOUBLE PRECISION,
lon1 DOUBLE PRECISION,
lat2 DOUBLE PRECISION,
lon2 DOUBLE PRECISION
) RETURNS DOUBLE PRECISION AS $$
DECLARE
distance DOUBLE PRECISION;
BEGIN
distance := SQRT((lat2 - lat1)^2 + (lon2 - lon1)^2);
RETURN distance;
END;
$$ LANGUAGE plpgsql;

-- Get the longitude and latitude of Kota Samarinda
WITH city_location AS (
SELECT
latitude,
longitude
FROM cities
WHERE city_id = 6472
),
nearest_city AS (
SELECT
city_id,
name,
calculate_distance(
(SELECT latitude FROM city_location),
(SELECT longitude FROM city_location),
latitude,
longitude
) AS distance
FROM cities
WHERE city_id <> 6472
ORDER BY 3
LIMIT 1
)
SELECT
sales_id,
product_id,
merk,
model,
year,
price,
(SELECT distance FROM nearest_city)
FROM product_onsales
JOIN sellers USING (seller_id)
JOIN products USING (product_id)
WHERE city_id = (SELECT city_id FROM nearest_city);

Output:

Analytical Queries.

Example 1. Sort all car models based on their popularity, which the number of bids received should determine.

WITH product_counts AS (
SELECT
model,
COUNT(product_id) AS count_product
FROM product_onsales
JOIN products USING (product_id)
GROUP BY model
),
bid_counts AS (
SELECT
model,
COUNT(product_id) AS count_bid
FROM bids
JOIN product_onsales USING (sales_id)
JOIN products USING (product_id)
GROUP BY model
)
SELECT
pc.model,
pc.count_product,
bc.count_bid
FROM product_counts pc
JOIN bid_counts bc USING (model)
ORDER BY pc.count_product DESC;

Output:

Example 2. Compare the car’s price with the average price per city.

SELECT 
cities.name,
merk,
model,
year,
price,
CAST(AVG(price) OVER(PARTITION BY cities.name)AS INT) AS avg_car_city
FROM product_onsales
JOIN products USING (product_id)
JOIN sellers USING (seller_id)
JOIN cities USING (city_id);

Output:

Example 3. Comparison of the date the user made a bid with the next bid and the bid price given.

WITH bid_info AS (
SELECT
model,
buyer_id,
date_bid AS first_bid_date,
LEAD(date_bid) OVER (PARTITION BY buyer_id ORDER BY date_bid) AS next_bid_date,
bid_price AS first_bid_price,
LEAD(bid_price) OVER (PARTITION BY buyer_id ORDER BY date_bid) AS next_bid_price
FROM bids
JOIN product_onsales USING (sales_id)
JOIN products USING (product_id)
)
SELECT
model,
buyer_id,
first_bid_date,
next_bid_date,
first_bid_price,
next_bid_price
FROM bid_info
WHERE next_bid_date IS NOT NULL
ORDER BY 1;

Output:

Example 4. Comparing the percentage difference between the average car price by model and the average bid price offered by customers in the last 6 months.

WITH avg_bid_last_6month AS (
SELECT
model,
CAST(AVG(bid_price) AS INT) AS avg_bid_6month
FROM bids
JOIN product_onsales USING (sales_id)
JOIN products USING (product_id)
WHERE EXTRACT('month' FROM date_bid) BETWEEN 1 AND 6
GROUP BY 1
),
avg_price AS(
SELECT
model,
CAST(AVG(price) AS INT) AS avg_price
FROM product_onsales
JOIN products USING (product_id)
GROUP BY 1
)
SELECT
model,
avg_price,
avg_bid_6month,
(avg_price - avg_bid_6month) AS difference,
CAST((avg_price - avg_bid_6month)::FLOAT/avg_price * 100 AS DECIMAL(10,2)) AS difference_percent
FROM avg_price
JOIN avg_bid_last_6month USING (model)
ORDER BY 1;

Output:

Example 5. Create a window function of the average bid price of a car brand and model for the last 6 months.

WITH window_function AS (
SELECT
model,
bid_price,
DENSE_RANK() OVER(PARTITION BY model ORDER BY EXTRACT('month' FROM date_bid)) AS ranks
FROM bids
JOIN product_onsales USING (sales_id)
JOIN products USING (product_id)
WHERE model = 'Daihatsu Alya'
)
SELECT
model,
(SELECT AVG(bid_price)
FROM window_function
WHERE ranks BETWEEN 1 AND 6) AS m_min_6,
(SELECT AVG(bid_price)
FROM window_function
WHERE ranks BETWEEN 1 AND 5) AS m_min_5,
(SELECT AVG(bid_price)
FROM window_function
WHERE ranks BETWEEN 1 AND 4) AS m_min_4,
(SELECT AVG(bid_price)
FROM window_function
WHERE ranks BETWEEN 1 AND 3) AS m_min_3,
(SELECT AVG(bid_price)
FROM window_function
WHERE ranks BETWEEN 1 AND 2) AS m_min_2,
(SELECT AVG(bid_price)
FROM window_function
WHERE ranks = 1) AS m_min_1
FROM window_function
GROUP BY 1;

Output:

Conclusion.

To wrap this up, we’ve gone from building a database to populating it with dummy data, and now we’re at the point where we can pull out whatever information we need with SQL. The examples I showed are just the start. Playing with SQL and databases lets us find out all sorts of things, making our work a lot easier and helping us make intelligent choices based on what the data tells us. Hopefully, this has given you a taste of what’s possible and gets you excited about diving into more database and SQL stuff.

References.

[1] Oracle. (n.d.). What is a Relational Database (RDBMS)? Retrieved February 23, 2024, from https://www.oracle.com/database/what-is-a-relational-database/.

[2] Silva, Yasin & Almeida, Isadora & Queiroz, Michell. (2016). SQL: From Traditional Databases to Big Data. 413–418. 10.1145/2839509.2844560.

[3] Soner Yıldırım, “Practical SQL: Designing and Creating a Relational Database”, Towards Data Science, Nov. 27, 2020. https://towardsdatascience.com/practical-sql-designing-and-creating-a-relational-database-de31c40b853f (accessed Feb. 12, 2024).

--

--

Cendikia Ishmatuka

Electrical Engineer UGM | Intersted in Data Analysis and Data Science