Weather Project 03 - Database Schema Design

Our project involved a relational database, which is used to manage registered user information, record user sessions, maintain a city list, cache city weather, and store user favorites. PostgreSQL was selected as a common and reliable option.

DB schema design

Designing a database schema is not a challenging task for me because I have designed many production-level schemas and reviewed various database designs. To achieve our goal, the schema design followed these steps:

Define entities, functions, and properties
Draw an ER relationship diagram
Design the database schema

Step 1: Define entities, functions, and properties

City: Defines the supported cities. Fields include city ID, city name, country, etc.
User: Stores registered user information. Fields include user ID, password hash, avatar ID, email, favorite cities, user sessions, etc.
Weather: Stores weather data queried from third-party APIs. Fields include city ID, report date, temperature, pressure, etc.

Step2: Draw ER diagram

In this step, the main goal is to identify relationships between entities (1:1, 1:N, N:N) and to split entity concepts into appropriate tables.

Step3: Design DB schema

Here is the final version.

Some findings and thoughts

1. About foreign keys

Based on traditional practices, it is recommended to add foreign keys in tables to prevent key mismatch issues. However, I didn’t add any foreign keys because they were not used in my previous experience, as far as I can remember. So I asked myself why this was the case, and got the following answers from AI, which I think make sense.

Impact on Database Performance
Foreign key constraints require additional checks during insert, update, or delete operations. Under high concurrency, this can burden the database and reduce performance.
Example: Deleting a row with foreign key constraints may trigger checks on related tables, causing locks or long transactions.

Incompatibility with Distributed Architecture
In distributed systems (e.g., MySQL sharding or partitioned databases), data resides on different physical nodes, making cross-node foreign key constraints impractical.
Example: User and order data stored in separate databases cannot enforce foreign key relationships.

Application-Level Data Integrity
Modern applications often handle data integrity at the application layer (e.g., Java, Go, Node.js) instead of relying on database-enforced foreign keys.
Example: The application checks for related data before performing insert or update operations.

Flexibility for Migration and Scaling
Large-scale systems frequently require schema changes, such as data migration, sharding, or read/write separation. Foreign keys limit flexibility and increase maintenance costs.
Example: Foreign keys must often be removed before performing database migrations.

Avoiding Cascade Delete Risks
Foreign keys can use CASCADE DELETE, which may unintentionally delete large amounts of related data if misused.
Example: Internet applications typically use soft deletes (e.g., an is_deleted flag) to prevent accidental data loss.

2. About the UserSession table

In the initial design, I planned to store user session information in a separate table to work with the frontend. But when I wrote the Django code, I realized that Django has a built-in table called django_session. So manually creating another session table is unnecessary. Django uses the session_data field in the django_session table and automatically serializes and deserializes Python dictionaries internally through SessionStore.

3. About indexes

Except for the auto-incremented fields, I didn’t add any extra indexes. If this project needs to go live and serve millions of users, I would need to analyze indexes to improve query performance. However, the overall impact would also need to be evaluated.

4. About two cache tables

To be honest, using a relational database for caching is not an ideal solution, because database query performance is not the best for caching scenarios. If I could involve other components, I would use Redis (in-memory) as the first caching layer and use the database as a second cache layer, mainly for data persistence.

James Cen's blog