Monday, January 26, 2026

Data Partitioning in System Design

Data Partitioning Techniques: Making Databases Scale Better

As applications grow and data explodes, databases can become bottlenecks. Queries slow down, servers get overloaded, and scaling becomes a nightmare. That’s where data partitioning comes in - a smart way to split your data into manageable chunks so your system stays fast, efficient, and scalable.

The most common partitioning techniques with simple examples and their benefits:

1. Horizontal Partitioning (Sharding)


Description: Splitting data across multiple tables or databases based on rows.
Example 1: You run a global app with millions of users. Instead of storing all user data in one giant table, you split it by region - Asia, Europe, and America. Each shard handles users from its region, reducing load and speeding up queries.

Example 2 (Layman): Think of a library with millions of books. Instead of keeping them all in one giant hall, you split them into different buildings by genre - fiction, science, history. Each building handles its own visitors, so no single hall gets overcrowded.

Sample query/technique (:

```sql

-- Shard by region

CREATE TABLE users_asia (id INT, name TEXT);

CREATE TABLE users_europe (id INT, name TEXT);

-- Application logic decides where to insert:

INSERT INTO users_asia VALUES (1, 'Amit');

INSERT INTO users_europe VALUES (2, 'John'); 

```

Benefits:
  • Distributes traffic across servers
  • Improves performance and scalability
  • Enables regional failover and isolation
Real‑world use case:  

Facebook and Twitter shard user data across multiple servers to handle billions of profiles and posts. Each shard stores a subset of users, often based on user ID ranges, ensuring queries don’t overload a single database. 


Please see Database Sharding topic for a different types and uses of DB sharding.

2. Vertical Partitioning


Description: Splitting data across multiple tables or databases based on columns.
Example 1:  Your user table has profile info (name, email) and activity logs (last login, clicks). Profile data is accessed frequently, while logs are bulky and rarely needed. So you split them into two tables — one for profiles, one for logs.

Example 2 (Layman): Imagine a hospital record. Doctors need quick access to patient details (name, age, allergies), but lab technicians need detailed test results. Splitting the record into two files makes it faster for each group to get what they need.

```sql

-- Profile info

CREATE TABLE user_profile (id INT, name TEXT, email TEXT);

-- Activity logs

CREATE TABLE user_activity (id INT, last_login TIMESTAMP);

-- Join when needed

SELECT p.name, a.last_login

FROM user_profile p

JOIN user_activity a ON p.id = a.id;

```

Benefits:
  • Speeds up queries by isolating hot data
  • Reduces I/O and memory usage
  • Makes schema easier to manage
Real‑world use case:  
LinkedIn separates frequently accessed profile data (name, headline, connections) from less frequently accessed data like activity logs or analytics. 


3. Range Partitioning

Description: Dividing data into partitions based on a range of values.
Example 1: Storing sales data for multiple years. Instead of one massive table, you create partitions for each year - 2021, 2022, 2023. When someone queries 2022 sales, the database skips other years entirely.

Example 2 (Layman): Think of a filing cabinet where folders are arranged by year. If you want 2022 invoices, you go straight to the 2022 folder instead of flipping through everything.

```sql

CREATE TABLE sales (

    id INT, amount DECIMAL, sale_date DATE

)

PARTITION BY RANGE (sale_date) (

    PARTITION p2021 VALUES LESS THAN ('2022-01-01'),

    PARTITION p2022 VALUES LESS THAN ('2023-01-01')

);

``` 

Benefits:

  • Optimizes range-based queries
  • Makes archiving and purging easier
  • Improves indexing and scan speed
Real‑world use case:  
Amazon Redshift and other data warehouses partition sales and transaction data by date ranges (e.g., monthly or yearly).


4. List Partitioning

Description: Partitioning data based on predefined lists of values.
Example 1: Splitting orders by status - pending, shipped, delivered. Order table has statuses: pending, shipped, delivered. Create separate partitions for each status. When you need all "shipped" orders, the database goes straight to that partition.

Example 2 (Layman): Picture a warehouse with separate sections for "pending orders", "shipped orders", and "delivered orders". Workers go directly to the right section instead of searching everywhere.

 ```sql

CREATE TABLE orders (

    id INT, status TEXT

)

PARTITION BY LIST (status) (

    PARTITION p_pending VALUES ('pending'),

    PARTITION p_shipped VALUES ('shipped'),

    PARTITION p_delivered VALUES ('delivered')

);

```

Benefits:
  • Simplifies data access for categorical queries
  • Improves performance for status-based filtering
  • Makes reporting and analytics cleaner
Real‑world use case:  
E‑commerce platforms like Flipkart or Amazon partition orders by status (pending, shipped, delivered) to simplify order management and reporting. 


5. Hash Partitioning

Description: Distributing data across partitions using a hash function.
Example 1:Using a hash of the user ID to evenly spread data across multiple partitions. You hash user IDs to assign each user to one of 10 partitions. This ensures no single partition gets overloaded - even if users are from the same region or have similar profiles.

Example 2 (Layman): Imagine distributing students into classrooms by rolling dice. The dice (hash function) ensures students are spread evenly, so no single room is overcrowded.

```sql
CREATE TABLE users (
    id INT, name TEXT
)
PARTITION BY HASH (id)
PARTITIONS 4; -- evenly spread across 4 partitions
```
Benefits:
  • Balances load automatically
  • Avoids hotspots
  • Great for unpredictable or uniform data
Real‑world use case:  
MongoDB and Cassandra use hash partitioning to distribute documents or rows evenly across nodes. For example, user IDs are hashed to balance load across servers.


6. Composite Partitioning

Description: Combining two or more partitioning methods.
Example 1: Partition sales data by year (range), and within each year, you hash by region. This lets you run fast year-based reports and still balance load across regions.

Example 2 (Layman): Think of a supermarket: first, items are grouped by category (fruits, dairy, snacks). Within fruits, they’re further divided by freshness date. This double-layer organization makes it easy to find what you want.

```sql

CREATE TABLE sales (

    id INT, region TEXT, sale_date DATE

)

PARTITION BY RANGE (sale_date)

SUBPARTITION BY HASH (region)

SUBPARTITIONS 4 (

    PARTITION p2021 VALUES LESS THAN ('2022-01-01'),

    PARTITION p2022 VALUES LESS THAN ('2023-01-01')

);

```

Benefits:
  • Offers flexibility for complex data models
  • Optimizes both performance and scalability
  • Ideal for large enterprise systems
Real‑world use case:  
Oracle Database supports composite partitioning, often used by large banks and telecom companies. For example, telecom call records are partitioned by date (range) and then hashed by customer ID within each date.


Conclusion

Choosing the right partitioning strategy depends on your data model and query patterns.
  • Horizontal and hash partitioning help with scalability.
  • Vertical and list partitioning simplify management.
  • Range and composite partitioning shine in analytical workloads.
By applying these techniques thoughtfully, you can design databases that scale gracefully and deliver faster, more reliable performance.

Thursday, January 22, 2026

Design Patterns

In software engineering, design patterns are tried‑and‑tested solutions to recurring problems in system design. They provide a shared vocabulary for developers and help create flexible, maintainable, and scalable applications. Broadly, design patterns fall into three categories: Creational, Structural, and Behavioral.

In brief, Design Patterns are Reusable Solutions for Common Software Problems.

Creational Patterns

Creational patterns deal with object creation mechanisms, making systems more independent of how their objects are created.

        1. Singleton 

    • Ensures a class has only one instance and provides a global point of access to it.
    • Example: Logger, configuration settings.
                    ```
                    // Ensures only one instance of a class exists.
// Useful for shared resources like loggers or configuration.
class Singleton {
    private static Singleton instance;
    private Singleton() {}
    public static Singleton getInstance() {
        if (instance == null) instance = new Singleton();
        return instance;
    }
}

                    ``` 

        2. Factory Method  

    • Defines an interface for creating objects but lets subclasses decide which class to instantiate.
    • Example: Document creation in a word processor.
                    ```
                    // Defines an interface for creating objects, letting subclasses decide which type.
// Useful when you want flexible object creation.
interface Document { void open(); } 
class WordDoc implements Document { public void open(){ System.out.println("Word opened"); } }
class PdfDoc implements Document { public void open(){ System.out.println("PDF opened"); } }

class DocumentFactory {
    static Document create(String type) {
        return type.equals("word") ? new WordDoc() : new PdfDoc();
    }
}
                    ```

        3. Abstract Factory  

    • Provides an interface for creating families of related or dependent objects without specifying their concrete classes.
    • Example: GUI toolkit that can create buttons and checkboxes.
                    ```
                    // Provides an interface for creating families of related objects.
// Example: GUI toolkit that can create platform-specific buttons.
interface Button { void render(); }
class WinButton implements Button { public void render(){ System.out.println("Windows Button"); } }
class MacButton implements Button { public void render(){ System.out.println("Mac Button"); } }

interface GUIFactory { Button createButton(); }
class WinFactory implements GUIFactory { public Button createButton(){ return new WinButton(); } }
class MacFactory implements GUIFactory { public Button createButton(){ return new MacButton(); } }
                    ```

        4. Builder  

    • Separates the construction of a complex object from its representation, allowing the same process to create different representations.
    • Example: Building a house with different layouts.
 ```
// Separates construction of a complex object from its representation.
// Useful for creating objects step by step.
class House {
    String walls, roof;
    public String toString(){ return walls + " & " + roof; }
}
class HouseBuilder {
    House house = new House();
    HouseBuilder buildWalls(){ house.walls="Brick Walls"; return this; }
    HouseBuilder buildRoof(){ house.roof="Tile Roof"; return this; }
    House build(){ return house; }
}
```

        5. Prototype  

    • Creates new objects by copying an existing prototype instance.
    • Example: Cloning objects.
```
// Creates new objects by cloning an existing prototype.
// Useful when object creation is expensive.
class Prototype implements Cloneable {
    String field;
    Prototype(String f){ field=f; }
    public Prototype clone(){ try { return (Prototype) super.clone(); } catch(Exception e){ return null; } }
}
```

Structural Patterns

Structural patterns focus on how classes and objects are composed to form larger structures.

        1. Adapter
    • Allows incompatible interfaces to work together by wrapping an existing class with a new interface.
    • Example: Connecting a legacy system to a modern application.
```
// Allows incompatible interfaces to work together.
// Wraps an old system with a new interface.
class OldSystem { void oldRequest(){ System.out.println("Old system"); } }
interface NewSystem { void request(); }
class Adapter implements NewSystem {
    OldSystem old = new OldSystem();
    public void request(){ old.oldRequest(); }
}
```
        2. Decorator  
    • Dynamically adds responsibilities to an object without modifying its structure.
    • Example: Adding scrollbars to a window.
```
// Dynamically adds responsibilities to an object.
// Example: adding scrollbars to a window.
interface Window { void draw(); }
class SimpleWindow implements Window { public void draw(){ System.out.println("Window"); } }
class ScrollDecorator implements Window {
    Window w; ScrollDecorator(Window w){ this.w=w; }
    public void draw(){ w.draw(); System.out.println(" + Scrollbars"); }
}
```
        3. Facade  
    • Provides a simplified interface to a complex subsystem.
    • Example: An API that simplifies the usage of a complex library.
```
// Provides a simplified interface to a complex subsystem.
// Example: hiding multiple steps behind one simple method.
class ComplexLib { void step1(){} void step2(){} }
class Facade {
    ComplexLib lib = new ComplexLib();
    void simple(){ lib.step1(); lib.step2(); System.out.println("Simplified"); }
}
```
        4. Proxy  
    • Provides a surrogate or placeholder for another object to control access to it.
    • Example: Lazy initialization, access control.
```
// Provides a placeholder to control access to another object.
// Useful for lazy loading or access control.
interface Service { void run(); }
class RealService implements Service { public void run(){ System.out.println("Running"); } }
class ProxyService implements Service {
    RealService real = new RealService();
    public void run(){ System.out.println("Proxy check"); real.run(); }
}
```

Behavioral Patterns

Behavioral patterns are concerned with algorithms and the assignment of responsibilities between objects.

        1. Observer  
    • Defines a one‑to‑many dependency so that when one object changes state, all dependents are notified automatically.
    • Example: Event handling systems.

      ```
                // Defines a one-to-many dependency so observers are notified of changes.
// Example: event handling systems.
import java.util.*;
interface Observer { void update(String msg); }
class Subject {
    List<Observer> obs = new ArrayList<>();
    void add(Observer o){ obs.add(o); }
    void notifyAllObs(){ for(Observer o:obs) o.update("Event happened"); }
}
```
        2. Strategy  
    • Defines a family of algorithms, encapsulates each one, and makes them interchangeable.
    • Example: Swappable sorting algorithms.
```
// Defines a family of algorithms and makes them interchangeable.
// Example: switching between sorting strategies.
interface Strategy { void execute(); }
class QuickSort implements Strategy { public void execute(){ System.out.println("QuickSort"); } }
class MergeSort implements Strategy { public void execute(){ System.out.println("MergeSort"); } }
class Context {
    Strategy s; Context(Strategy s){ this.s=s; }
    void run(){ s.execute(); }
}
```

        3. Command  
    • Encapsulates a request as an object, allowing for parameterization, queuing, and logging.
    • Example: Undo functionality in applications.
```
// Encapsulates a request as an object.
// Useful for undo/redo functionality.
interface Command { void execute(); }
class PrintCommand implements Command { public void execute(){ System.out.println("Print"); } }
class Invoker { Command c; Invoker(Command c){ this.c=c; } void run(){ c.execute(); } }
```
        4. Iterator  
    • Provides a way to access elements of a collection sequentially without exposing its internal representation.
    • Example: Iterating over a list of objects.
```
// Provides a way to access elements sequentially without exposing representation.
// Example: iterating over a collection.
import java.util.*;
class Demo {
    public static void main(String[] args){
        List<String> list = Arrays.asList("A","B","C");
        for(String s : list) System.out.println(s);
    }
}
```
        5. State  
    • Allows an object to alter its behavior when its internal state changes, appearing to change its class.
    • Example: State machines in game development.
```
// Allows an object to change behavior when its internal state changes.
// Example: state machines in games.
interface State { void handle(); }
class Happy implements State { public void handle(){ System.out.println("Happy"); } }
class Sad implements State { public void handle(){ System.out.println("Sad"); } }
class Context {
    State state;
    void set(State s){ state=s; }
    void request(){ state.handle(); }
}
```

Conclusion

Design patterns are not silver bullets, but they provide proven blueprints for solving common design challenges. By mastering these patterns, developers can write cleaner, more maintainable code and communicate solutions more effectively with their peers.

Tuesday, January 20, 2026

Load Balancers in System Design

When building scalable and reliable systems, one of the most critical components is the load balancer. Its job is simple yet powerful: distribute incoming network traffic across multiple servers so that no single server becomes overwhelmed. This ensures smooth performance, high availability, and a better user experience.

🔑 Key Functions of a Load Balancer

  • Traffic Distribution: Spreads requests evenly across servers.

  • Redundancy & Reliability: Keeps services available even if one server fails.

  • Scalability: Makes it easy to add or remove servers as demand changes.

  • Health Monitoring: Continuously checks server status to avoid routing traffic to unhealthy nodes.

  • Session Persistence: Ensures a user’s session stays on the same server when needed.

  • SSL Termination: Offloads encryption/decryption tasks from backend servers.


🛠️ Types of Load Balancers

  • Hardware Load Balancers 
    Physical devices dedicated to balancing traffic. Often used in enterprise setups.

  •  Software Load Balancers  
    Applications running on standard hardware. Flexible and cost‑effective.

  • Cloud‑Based Load Balancers
    Services offered by cloud providers (AWS ELB, Azure Load Balancer, GCP Load Balancing) that scale automatically and integrate seamlessly with cloud infrastructure.


⚙️ Common Load Balancing Algorithms

  • Round Robin: Sequentially distributes requests across servers.

  • Least Connections: Routes traffic to the server with the fewest active connections.

  • IP Hash: Uses the client’s IP address to determine which server handles the request.

  • Weighted Round Robin: Assigns more requests to servers with higher capacity.


🌐 How Requests Flow Through a Load Balancer

  • Client Request: A user sends a request to the application’s public IP or domain.

  • DNS Resolution: The domain name system translates the domain into the load balancer’s IP.

  • Traffic Reception: The load balancer receives the request.

  • Health Check: It verifies which backend servers are healthy and available.

  • Algorithm Selection: Based on the chosen algorithm (e.g., round robin), it decides where to send the request.

  • Request Forwarding: The request is passed to the selected backend server.

  • Server Processing: The server handles the request and generates a response.

  • Response Return: The server sends the response back to the load balancer.

  • Client Response: Finally, the load balancer forwards the response to the client.

System design diagram illustrating client requests routed through DNS to a load balancer, which forwards traffic to healthy backend servers using algorithms like round robin or least connections, then returns responses to the client.
Diagram showing how a load balancer distributes client requests across backend servers for reliability and scalability.

NOTE: While sending response back, Load Balancer does not use DNS since IP of client is already known.

🚀 Why Load Balancers Matter

Without load balancers, systems risk downtime, bottlenecks, and poor user experience. By intelligently distributing traffic, they provide the foundation for scalable, resilient, and high‑performance applications.

✅ Pros of Using Load Balancers

  • Improved Reliability: If one server fails, traffic is rerouted to healthy servers.

  • Scalability: Easily add or remove servers to handle changing traffic loads.

  • Optimized Performance: Distributes requests to avoid bottlenecks and reduce latency.

  • Security Features: SSL termination and protection against DDoS attacks.

  • Session Persistence: Maintains user sessions across requests when needed.

⚠️ Cons of Using Load Balancers

  • Added Complexity: Requires configuration, monitoring, and maintenance.

  • Cost: Hardware load balancers and cloud services can be expensive.

  • Single Point of Failure: If not properly configured, the load balancer itself can become a bottleneck.

  • Latency Overhead: Adds a small delay due to routing and health checks.


🧠 Conclusion

Load balancers are a cornerstone of modern system design. They ensure that applications remain resilient, scalable, and performant under varying loads. Whether you're deploying a small web app or architecting a global-scale platform, understanding how load balancers work — and choosing the right type and algorithm — is essential for building robust infrastructure.

Database Sharding

Database Sharding: Scaling Your Data the Smart Way

As applications grow, databases often struggle to handle massive amounts of data efficiently. Database sharding is a powerful architecture pattern that solves this problem by splitting large datasets into smaller, more manageable chunks called shards. These shards are distributed across multiple machines or database nodes, improving scalability and performance.

Why Do We Need Sharding?

  • Handles big data workloads
  • Improves query performance
  • Enables horizontal scaling
  • Provides fault isolation

Types of Database Sharding

        1. Key-Based Sharding

               Data is distributed using a hash function.  

               Example: `application_id % 3` → three shards.

        2. Range-Based Sharding

            Data is split by ranges of a column.  

            Example: Names A–P → shard 1, Q–Z → shard 2.

        3. Vertical Sharding

            Data is divided by feature or column groups.  

            Example: On Twitter, user profiles, followers, and tweets are stored in separate shards.

        4. Directory-Based Sharding

            A lookup table maps records to shards.  

            Example: A directory table stores shard IDs for flexible routing.


Advantages

- ✅ Scalability for large datasets  

- ✅ Faster queries due to smaller shard sizes  

- ✅ Fault isolation across shards  


Challenges

- ❌ Complex shard management  

- ❌ Rebalancing data when shards fill up  

- ❌ Cross-shard queries can be slow  


Conclusion

Database sharding isn’t a one-size-fits-all solution, but for applications handling billions of records, it’s often the key to scaling efficiently. By choosing the right sharding strategy, you can build a scalable, distributed database system that grows with your application.

Friday, January 16, 2026

Things to know for System Design

What is System Design?

System design is the process of defining the architecture, components, and
interfaces of a system to meet specific requirements. Here's a brief breakdown:
  1. Identify Requirements: Determine what the system needs to do.
  2. High-Level Design: Outline the system's major components and their interactions.
  3. Detailed Design: Specify the internal workings of each component.
  4. Implementation: Write the code and integrate the components.
  5. Testing: Verify that the system meets all requirements and functions correctly.
  6. Maintenance: Update and improve the system over time.
Functional vs Non-Functional Requirements

Functional: Basic functionalities that the system should offer
Non-Functional: Quality constraints in application like portability, maintainability,
reliability, security etc.

What are the components of System Design?
  1. Architecture: The overall structure of the system, including how components interact and the flow of data.
  2. Components: The individual parts of the system, such as modules, services, or microservices.
  3. Interfaces: The points of interaction between different components or with external systems.
  4. Data Storage: How and where data is stored, including databases, data warehouses, and data lakes.
  5. APIs (Application Programming Interfaces): Interfaces that allow different components or systems to communicate with each other.
  6. Security: Measures to protect the system and data from unauthorized access and vulnerabilities.
  7. Scalability: The system’s ability to handle increased load and expand as needed.
  8. Performance: Ensuring the system operates efficiently and meets performance requirements.
  9. Fault Tolerance and Reliability: The system’s ability to continue functioning correctly even when parts fail.
System Design Life Cycle | SDLC (Design) 

This is actual and practical way for proceeding to solve any design problem:
  1. Requirements
  2. Estimation and Constraints
  3. HLD
  4. LLD
  5. Data Model Design
  6. API Design
  7. Identify and Resolve Bottlenecks
Structured Analysis and Structured Design

Structured Analysis is a technique used to understand and define the
requirements of a system. It involves creating models that represent the system's
functions, data, and control flow. The primary goal is to convert system
requirements into a blueprint for development.
[Data Flow Diagrams (DFDs), Entity-Relationship Diagrams (ERDs), State-Transition
Diagrams, Process Specifications] .

Structured Design follows Structured Analysis and focuses on creating a
blueprint for constructing the system. It uses the results of the analysis phase to
design the system architecture and components.
[Modularity, Top-Down Design, Structure Charts, Coupling and Cohesion]

Please see other posts for system design key concepts.

Saturday, January 8, 2022

Event Handling in Spring

Spring's event handling is single-threaded so if an event is published,  until and unless all the receivers get the message, the processes are blocked and the flow will not continue. [Synchronous]

Four things necessary for an event:

  • Source : to publish the event in Spring - ApplicationEventPublisher object
  • Event : event must extends from ApplicationEvent // springframework 4.2 + - - any object 
  • Listener : IOC container
  • Handler : Handler method must annotated with @EventListener


Creating, publishing and handling custom events in Spring

Create an event class, `CustomEvent` by extending `ApplicationEvent`. This class must define a default constructor which should inherit constructor from ApplicationEvent class.

import org.springframework.context.ApplicationEvent;
public class CustomEvent extends ApplicationEvent {
    public CustomEvent(Object source) {
        super(source);
   }
   public String toString(){
        return "My Custom Event";
   }
}

Once your event class is defined, you can publish it from any class, let us say `EventClassPublisher` which implements `ApplicationEventPublisherAware`. You will also need to declare this class in XML configuration file as a bean so that the container can identify the bean as an event publisher because it implements the ApplicationEventPublisherAware interface.

import org.springframework.context.ApplicationEventPublisher;
import org.springframework.context.ApplicationEventPublisherAware;
public class CustomEventPublisher implements ApplicationEventPublisherAware {
    private ApplicationEventPublisher publisher;
    public void setApplicationEventPublisher (ApplicationEventPublisher publisher) {
        this.publisher = publisher;
    }
    public void publish() {
        CustomEvent ce = new CustomEvent(this);
        publisher.publishEvent(ce);
    }
}

A published event can be handled in a class, let us say `EventClassHandler` which implements `ApplicationListener` interface and implements `onApplicationEvent` method for the custom event.

import org.springframework.context.ApplicationListener;
public class CustomEventHandler implements ApplicationListener<CustomEvent> {
    public void onApplicationEvent(CustomEvent event) {
        System.out.println(event.toString());
    }
}

A main file/controller file to trigger events:

import org.springframework.context.ConfigurableApplicationContext;
import org.springframework.context.support.ClassPathXmlApplicationContext;
public class MainApp {
    public static void main(String[] args) {
        ConfigurableApplicationContext context = new ClassPathXmlApplicationContext("Beans.xml");
       CustomEventPublisher cvp = (CustomEventPublisher) context.getBean("customEventPublisher");
       cvp.publish();
   }
}

Beans.xml for DI:

<?xml version = "1.0" encoding = "UTF-8"?>
<beans xmlns = "http://www.springframework.org/schema/beans"
             xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation = "http://www.springframework.org/schema/beans"
             http://www.springframework.org/schema/beans/spring-beans-3.0.xsd">
    <bean id = "customEventHandler" class = "com.CustomEventHandler"/>
    <bean id = "customEventPublisher" class = "com.CustomEventPublisher"/>
</beans>

In springboot, same architecture is being followed under the hood with annotations.

Wednesday, October 13, 2021

JMS: Deep dive in MQ systems and Roadmap to Kafka

JMS: Java Messaging Service


JMS is an example of Messaging Systems based on asynnchronous design pattern which is used by microservices to interact with each other to send data and events. For introductory part, please refer here.
JMS allow applications to create and send, receive and read messages in a shared environment.

JMS: Programming Model


Basic components to develop client based producer and consumer system are:
  • ConnectionFactory: Use the Java Naming and Directory Interface (JNDI) to find a ConnectionFactory object, or instantiate a ConnectionFactory object directly and set its attributes.
    Based on delivery model, client has separate instance for connection factory to create a connection to a provider:
    • Point to point :  QueueConnectionFactory
    • Publish/subscribe: TopicConnectionFactory
         The following snippet of code demonstrates how to use JNDI to find a connection factory object:

Context ctx = new InitialContext();
ConnectionFactory cf1 = (ConnectionFactory) ctx.lookup("jms/QueueConnectionFactory");
ConnectionFactory cf2 = (ConnectionFactory) ctx.lookup("/jms/TopicConnectionFactory");
        
        Alternatively, you can directly instantiate a connection factory as follows:

ConnectionFactory connFactory = new com.sun.messaging.ConnectionFactory();
QueueConnectionFactory connFactory = new com.sun.messaging.QueueConnectionFactory();
TopicConnectionFactory connFactory = new com.sun.messaging.TopicConnectionFactory();

  • Connection: Use the ConnectionFactory object to create a Connection object. This can be done as follows:
Connection connection = connFactory.createConnection();

        Note that you must close all connections you have created using the Connection.close() method.

  • Session: Use the Connection object to create one or more Session objects, which provide transactional context with which to group a set of sends and receives into an atomic unit of work. A session can be created as follows:
Session session = connection.createSession(false, Session.AUTO_ACKNOWLEDGE);

        The createSession() method takes two arguments:
    • the first (false in this case) means that the session is not transacted,
    • the second means that the session will automatically acknowledge messages when they have been received successfully.

  • Destination: A destination object is used by the client to specify the source of messages it consumes and the target of messages it produces.
    In the point-to-point messaging - destinations are known as queues,
    In the publish/subscribe model of messaging - destinations are known as topics.




  • Use JNDI to find Destination object(s), or instantiate one directly and configure it by setting its attributes. The following snippet of code demonstrates how to perform a JNDI lookup of a queue named jms/SomeQueue:

    Destination dest = (Queue) ctx.lookup("jms/SomeQueue");

    Or, you can directly instantiate and configure a destination object:

    Queue q = new com.sun.messaging.Queue("world");

  • MessageProducer: Use a Session and a Destination object to create the needed MessageProducer object, which are used for sending messages to a destination.
    Note that you can create a MessageProducer object without specifying a Destination object, but in that case a Destination object must be specified for each message produced.

    MessageProducer producer = session.createProducer(SomeQueue OR SomeTopic);

    Once a producer has been created, it can be used to send messages as follows: producer.send(message);
  •   MessageConsumer: JMS messages can be consumed in two ways:
    • Synchronously: A client(receiver or subscriber) explicitly fetches a message from the destination using the receive method.
      • Use a Session object and a Destination object to create any needed MessageConsumer objects that are used for receiving messages.

        MessageConsumer consumer = session.createConsumer(SomeQueue or SomeTopic);
        connection.start();
        Message msg = consumer.receive(Long timeOut);

      • Once the consumer has been created, it can be used to receive messages. Message delivery, however, doesn't begin until you start the connection created earlier, which can be done by calling the start() method:
    • Asynchronously: A client can register a message listener(like event listener) with a consumer. 
      • Instantiate a MessageListener object and register it with a MessageConsumer object.
      • A MessageListener object acts as an asynchronous event handler for messages. The MessageListener interface contains one method, onMessage(), which you implement to receive and process the messages.

        MessageListener listener = new MyListener();
        consumer.setMessageListener(listener);

      • In order to avoid missing messages, the start() method should be called on the connection after the listener has been registered. When message delivery begins, the JMS provider automatically invokes the message listener's onMessage() whenever a message is delivered.

Implementation

  • Download and install Java System Message Queue. Once installed, the \bin directory contains a utility to install and uninstall the broker as a Window Service (imqsvcadmin). In addition, it contains the executable for the broker (imqbrokerd)
  • To test installation:
    • Run the broker. Go to the bin directory of your installation and run the following command: > imqbrokerd -tty
      The -tty option causes all logged messages to be displayed to the console in addition to the log file.
    • The broker should start and display a few messages before displaying: imqbroker@hostname:7676 ready
  • Test the broker by running the following command in a separate window:
    > imqcmd query bkr -u admin -p admin
    It should display all the details including host, messages count and cluster information.
Sample code(point to point):

import javax.jms.*;

public class TestJms {
   public static void main(String argv[]) throws Exception {
      // The producer and consumer need to get a connection factory and use it to set up a connection and a session
      QueueConnectionFactory connFactory = new com.sun.messaging.QueueConnectionFactory();
      QueueConnection conn = connFactory.createQueueConnection();
      // The session is not transacted, and it uses automatic message acknowledgement
      QueueSession session = conn.createQueueSession(false, Session.AUTO_ACKNOWLEDGE);
      Queue q = new com.sun.messaging.Queue("Hello world !!!");
     
      QueueSender sender = session.createSender(q); // Sender
      TextMessage msg = session.createTextMessage(); // Text message
      msg.setText("Hello nik !");
      System.out.println("Sending the message: "+msg.getText());
      sender.send(msg);
      
      QueueReceiver receiver = session.createReceiver(q); // Receiver
      conn.start();
      Message m = receiver.receive();
      if(m instanceof TextMessage) {
         TextMessage txt = (TextMessage) m;
         System.out.println("Message Received: "+txt.getText());
      }
      session.close();
      conn.close();
   }
}

  • Compile TestJms.java:

    > javac -classpath /lib/jms.jar{;|:}/lib/img.jar TestJms.java

    Note: The choice of PATH SEPARATER CHARACTER, {;|:}, is platform dependent.
    • ':' on UNIX/Linux, and
    • ';' on Windows
  • Run: Assuming that the imqbrokerd is still running, run TestJms:

    > java -cp /lib/jms.jar{;|:};/lib/img.jar TestJms TestJms

  • Sample Output:
    > Sending the message: Hello nik !
       Message Received: Hello nik !

Reliable Messaging

JMS defines two delivery modes:
  • Persistent messages: Guaranteed to be successfully consumed once and only once. Messages are not lost.
  • Non-persistent messages: Guaranteed to be delivered at most once. Message loss is not a concern.
This, however, is all about performance trade-offs. The more reliable the delivery of messages, the more bandwidth and overhead required to achieve that reliability. Performance can be maximized by producing non-persistent messages, or you can maximize the reliability by producing persistent messages.

Message-Driven Beans

JMS is a mandatory API and service in J2EE platform. A good example is the message-driven bean, one of a family of EJBs specified in EJB 2.0/2.1. The other two EJBs are session beans and entity beans, which can only be called synchronously.

A JMS Message-Driven Bean (MDB) is a JMS message consumer that implements the JMS MessageListener interface. The onMessage() method is invoked when a message is received by the MDB container. Note that you do not invoke remote methods on MDBs (like with other enterprise beans) and as a result there are no home or remote interfaces associated with them. It also worth noting that with J2EE 1.4, MDBs are not limited to JMS; rather, a multiplicity of MDB interfaces can be declared and consumed by application components implementing those interfaces.

Limitations of JMS:

  • JMS is Java-based. In multi-tiered applications using microservices, where multiple languages and frameworks are used, this can become a hindrance.
  • In JMS, although APIs are specified, the message format is not. This is a limitation of JMS. They just have to use the same API.

An alternate solution to these problems is using Kafka.

JMS and Kafka are both wildly popular solutions for messaging. Whilst JMS has been around for longer it is still a very popular choice for certain use cases. Before you consider which one is better, it is best to do your homework and study the business requirements and your capabilities.

A friendly comparison between JMS and Kafka as follow:

ParameterJMSKafka
Order of MessagesThere is no guarantee that the messages will be received in order.The receiving of messages follows the order in which they are sent to the partition.
FilterThis is a JMS API message selector that allows the consumers to specify which messages they are interested in. This way, message filtering happens in JMS. Message selection can follow specific criteria. The filtering occurs at the producer.There is no concept of the filter at the broker level. Hence, messages picked up by the consumer do not specify any criteria. The filtering can happen only at the consumer level.
Persistence of MessagesIt provides either in- memory or disk-based storage of messages.It stores the messages for a specified period whether or not it has been picked up by the consumer.
Push vs. Pull of MessagesThe providers push the JMS message to queues and topics.The consumers pull the message from the broker.
Load BalancingLoad balancing can be designed by implementing some clustering mechanism. Thus, once the producer sends the messages, the load will be distributed across the clusters.Here load balancing happens automatically. Because once the Kafka nodes publish its metadata that indicates which servers are up and running in the cluster. Also, it tells the producer where the leader is. Thus, the client can send messages to the appropriate partition.

For more information, please visit JMS vs Kafka.

Data Partitioning in System Design

Data Partitioning Techniques: Making Databases Scale Better As applications grow and data explodes, databases can become bottlenecks. Querie...