medgai ...: Data Partitioning in System Design

Data Partitioning Techniques: Making Databases Scale Better

As applications grow and data explodes, databases can become bottlenecks. Queries slow down, servers get overloaded, and scaling becomes a nightmare. That’s where data partitioning comes in - a smart way to split your data into manageable chunks so your system stays fast, efficient, and scalable.

The most common partitioning techniques with simple examples and their benefits:

1. Horizontal Partitioning (Sharding)

Description: Splitting data across multiple tables or databases based on rows.

Example 1: You run a global app with millions of users. Instead of storing all user data in one giant table, you split it by region - Asia, Europe, and America. Each shard handles users from its region, reducing load and speeding up queries.

Example 2 (Layman): Think of a library with millions of books. Instead of keeping them all in one giant hall, you split them into different buildings by genre - fiction, science, history. Each building handles its own visitors, so no single hall gets overcrowded.

Sample query/technique (:

```sql

-- Shard by region
CREATE TABLE users_asia (id INT, name TEXT);
CREATE TABLE users_europe (id INT, name TEXT);
-- Application logic decides where to insert:
INSERT INTO users_asia VALUES (1, 'Amit');
INSERT INTO users_europe VALUES (2, 'John');

```

Benefits:

Distributes traffic across servers
Improves performance and scalability
Enables regional failover and isolation

Real‑world use case:

Facebook and Twitter shard user data across multiple servers to handle billions of profiles and posts. Each shard stores a subset of users, often based on user ID ranges, ensuring queries don’t overload a single database.

Please see Database Sharding topic for a different types and uses of DB sharding.

2. Vertical Partitioning

Description: Splitting data across multiple tables or databases based on columns.

Example 1: Your user table has profile info (name, email) and activity logs (last login, clicks). Profile data is accessed frequently, while logs are bulky and rarely needed. So you split them into two tables — one for profiles, one for logs.

Example 2 (Layman): Imagine a hospital record. Doctors need quick access to patient details (name, age, allergies), but lab technicians need detailed test results. Splitting the record into two files makes it faster for each group to get what they need.

```sql

-- Profile info
CREATE TABLE user_profile (id INT, name TEXT, email TEXT);
-- Activity logs
CREATE TABLE user_activity (id INT, last_login TIMESTAMP);
-- Join when needed
SELECT p.name, a.last_login
FROM user_profile p
JOIN user_activity a ON p.id = a.id;

```

Benefits:

Speeds up queries by isolating hot data
Reduces I/O and memory usage
Makes schema easier to manage

Real‑world use case:

LinkedIn separates frequently accessed profile data (name, headline, connections) from less frequently accessed data like activity logs or analytics.

3. Range Partitioning

Description: Dividing data into partitions based on a range of values.

Example 1: Storing sales data for multiple years. Instead of one massive table, you create partitions for each year - 2021, 2022, 2023. When someone queries 2022 sales, the database skips other years entirely.

Example 2 (Layman): Think of a filing cabinet where folders are arranged by year. If you want 2022 invoices, you go straight to the 2022 folder instead of flipping through everything.

```sql

CREATE TABLE sales (
id INT, amount DECIMAL, sale_date DATE
)
PARTITION BY RANGE (sale_date) (
PARTITION p2021 VALUES LESS THAN ('2022-01-01'),
PARTITION p2022 VALUES LESS THAN ('2023-01-01')
);

```

Benefits:

Optimizes range-based queries
Makes archiving and purging easier
Improves indexing and scan speed

Real‑world use case:

Amazon Redshift and other data warehouses partition sales and transaction data by date ranges (e.g., monthly or yearly).

4. List Partitioning

Description: Partitioning data based on predefined lists of values.

Example 1: Splitting orders by status - pending, shipped, delivered. Order table has statuses: pending, shipped, delivered. Create separate partitions for each status. When you need all "shipped" orders, the database goes straight to that partition.

Example 2 (Layman): Picture a warehouse with separate sections for "pending orders", "shipped orders", and "delivered orders". Workers go directly to the right section instead of searching everywhere.

```sql

CREATE TABLE orders (
id INT, status TEXT
)
PARTITION BY LIST (status) (
PARTITION p_pending VALUES ('pending'),
PARTITION p_shipped VALUES ('shipped'),
PARTITION p_delivered VALUES ('delivered')
);

```

Benefits:

Simplifies data access for categorical queries
Improves performance for status-based filtering
Makes reporting and analytics cleaner

Real‑world use case:

E‑commerce platforms like Flipkart or Amazon partition orders by status (pending, shipped, delivered) to simplify order management and reporting.

5. Hash Partitioning

Description: Distributing data across partitions using a hash function.

Example 1:Using a hash of the user ID to evenly spread data across multiple partitions. You hash user IDs to assign each user to one of 10 partitions. This ensures no single partition gets overloaded - even if users are from the same region or have similar profiles.

Example 2 (Layman): Imagine distributing students into classrooms by rolling dice. The dice (hash function) ensures students are spread evenly, so no single room is overcrowded.

```sql

CREATE TABLE users (
id INT, name TEXT
)
PARTITION BY HASH (id)
PARTITIONS 4; -- evenly spread across 4 partitions

```

Benefits:

Balances load automatically
Avoids hotspots
Great for unpredictable or uniform data

Real‑world use case:

MongoDB and Cassandra use hash partitioning to distribute documents or rows evenly across nodes. For example, user IDs are hashed to balance load across servers.

6. Composite Partitioning

Description: Combining two or more partitioning methods.

Example 1: Partition sales data by year (range), and within each year, you hash by region. This lets you run fast year-based reports and still balance load across regions.

Example 2 (Layman): Think of a supermarket: first, items are grouped by category (fruits, dairy, snacks). Within fruits, they’re further divided by freshness date. This double-layer organization makes it easy to find what you want.

```sql

CREATE TABLE sales (
id INT, region TEXT, sale_date DATE
)
PARTITION BY RANGE (sale_date)
SUBPARTITION BY HASH (region)
SUBPARTITIONS 4 (
PARTITION p2021 VALUES LESS THAN ('2022-01-01'),
PARTITION p2022 VALUES LESS THAN ('2023-01-01')
);

```

Benefits:

Offers flexibility for complex data models
Optimizes both performance and scalability
Ideal for large enterprise systems

Real‑world use case:

Oracle Database supports composite partitioning, often used by large banks and telecom companies. For example, telecom call records are partitioned by date (range) and then hashed by customer ID within each date.

Conclusion

Choosing the right partitioning strategy depends on your data model and query patterns.

Horizontal and hash partitioning help with scalability.
Vertical and list partitioning simplify management.
Range and composite partitioning shine in analytical workloads.

By applying these techniques thoughtfully, you can design databases that scale gracefully and deliver faster, more reliable performance.

medgai ...

Monday, January 26, 2026

Data Partitioning in System Design

1. Horizontal Partitioning (Sharding)

2. Vertical Partitioning

3. Range Partitioning

4. List Partitioning

5. Hash Partitioning

6. Composite Partitioning

Conclusion

No comments:

Post a Comment

Proxies in System Design