Mastering System Design [03]: Advanced Concepts and Architectures in System Design

Bhavyansh @ DiversePixel
7 min readSep 7, 2024

--

As systems become more complex, it’s essential to understand advanced concepts and architectures that enable scalability, resilience, and real-time capabilities. This article explores topics such as messaging patterns, real-time communication, and strategies for handling concurrency and distributed coordination.

1. Polling vs. Streaming

Polling: Polling is a technique where a client repeatedly requests data from a server at regular intervals. This approach is simple but can lead to inefficiencies and increased load on the server due to repeated requests, especially when there are no new updates.

Streaming: Streaming allows a server to push data to the client in real-time as soon as it’s available. This reduces the latency of data delivery and minimizes unnecessary network traffic.

Use Cases and Decision-Making Criteria:

  • Polling is useful in scenarios where real-time updates are not critical, and network efficiency is a concern.
  • Streaming is preferred for applications requiring real-time data delivery, such as live sports updates, financial tickers, or collaborative tools.

Purpose Solved: Choosing between polling and streaming affects how timely data is delivered and the overall efficiency of the network communication in an application.

2. Pub/Sub Messaging

How Pub/Sub Messaging Works: Publish/Subscribe (Pub/Sub) is a messaging pattern where publishers send messages to a channel without knowing who the subscribers are, and subscribers listen to channels to receive messages of interest. This decouples the message producers and consumers, allowing for scalable and flexible communication.

Real-World Example: Apache Kafka:

  • Apache Kafka is a distributed streaming platform that uses the pub/sub model to handle high-throughput, fault-tolerant messaging. It allows applications to process and react to streams of records in real-time.

Use Cases:

  • Event-driven architectures
  • Logging and monitoring systems
  • Real-time analytics

Purpose Solved: Pub/Sub messaging supports scalable and decoupled communication, enabling real-time data processing and distribution across distributed systems.

3. WebSockets

What are WebSockets?: WebSockets provide a full-duplex communication channel over a single, long-lived TCP connection. Unlike HTTP, which is request-response based, WebSockets allow for two-way communication, enabling servers to push updates to clients in real-time.

When to Use WebSockets: WebSockets are ideal for applications requiring low-latency, real-time communication, such as chat applications, online gaming, or collaborative editing tools.

Purpose Solved: WebSockets enhance real-time interactivity and reduce the overhead associated with repeatedly opening and closing connections, as seen with traditional HTTP requests.

4. MapReduce

Concept of MapReduce: MapReduce is a programming model for processing large datasets in parallel across a distributed cluster. The model consists of two main functions:

  • Map: Processes and transforms input data into key-value pairs.
  • Reduce: Aggregates the mapped data, combining values associated with the same key.

How MapReduce Works:

  • Data Splitting: The input data is divided into chunks and distributed across multiple nodes.
  • Mapping: Each node processes its chunk of data independently, emitting intermediate key-value pairs.
  • Shuffling: Intermediate data is redistributed based on keys to the appropriate reducer nodes.
  • Reducing: Reducer nodes aggregate the data by key, producing the final output.

Real-World Example: Apache Hadoop:

  • Apache Hadoop uses the MapReduce model to process large datasets across a distributed computing cluster, making it suitable for big data analytics.

Purpose Solved: MapReduce allows efficient parallel processing of vast amounts of data, enabling scalability and fault tolerance in big data applications.

5. Scale Cube and Microservices

Understanding the Scale Cube: The Scale Cube is a model that describes three dimensions of scalability for systems:

  1. X-axis (Horizontal Duplication): Scaling by running multiple instances of the same service.
  2. Y-axis (Functional Decomposition): Splitting different functionalities into separate services, often leading to a microservices architecture.
  3. Z-axis (Data Partitioning): Sharding data to distribute load across different nodes.

Transitioning to Microservices: Microservices architecture involves breaking down a monolithic application into smaller, independently deployable services. Each microservice is responsible for a specific business function and can be developed, deployed, and scaled independently.

Purpose Solved: The Scale Cube and microservices provide a framework for designing scalable, modular systems that can be developed and deployed independently, improving agility and fault isolation.

6. Heartbeat/HTTP Keep-Alive

Heartbeat Mechanism: A heartbeat is a periodic signal sent between two devices or systems to indicate active status or connectivity. It’s commonly used in distributed systems to detect failures and maintain system health.

HTTP Keep-Alive: HTTP Keep-Alive maintains an open connection between a client and server, allowing multiple requests to be sent over a single connection, reducing the overhead of establishing new connections.

Importance and Implementation:

  • Heartbeats are used in cluster management and failover mechanisms to detect node availability.
  • HTTP Keep-Alive improves network efficiency by reducing latency and saving server resources.

Purpose Solved: Heartbeat mechanisms ensure system resilience by monitoring component health, while HTTP Keep-Alive enhances communication efficiency.

7. Bloom Filters and Hash Tables

What are Bloom Filters?: A Bloom filter is a probabilistic data structure used to test whether an element is a member of a set. It allows for fast and memory-efficient membership checks but may produce false positives.

Hash Tables: Hash tables store data in key-value pairs, allowing for efficient data retrieval. They use a hash function to map keys to buckets, making lookups, insertions, and deletions fast.

Use Cases in System Design:

  • Bloom Filters: Useful for applications like caching, database query optimization, or spam detection, where false positives are tolerable, and memory efficiency is crucial.
  • Hash Tables: Widely used in databases, caching, and indexing due to their fast lookup times.

Purpose Solved: Bloom filters and hash tables provide efficient data storage and retrieval mechanisms, optimizing memory usage and lookup speed in various applications.

8. Distributed Lock

What is a Distributed Lock?: A distributed lock is a mechanism that ensures that only one process or thread can access a shared resource at a time in a distributed system, preventing conflicts and ensuring data consistency.

Implementation Strategies:

  • Database-based Locks: Using a database table to manage lock status.
  • Redis-based Locks: Using Redis as a distributed in-memory store to manage locks with mechanisms like SETNX (set if not exists).
  • ZooKeeper-based Locks: Using Apache ZooKeeper to coordinate locks in distributed systems.

Use Cases and Challenges:

  • Use Cases: Distributed transactions, leader election, ensuring consistency in distributed systems.
  • Challenges: Handling failures, ensuring liveness, and avoiding deadlocks.

Purpose Solved: Distributed locks provide a way to synchronize access to shared resources in distributed environments, ensuring consistency and preventing race conditions.

9. Concurrency

Managing Concurrency: Concurrency involves executing multiple tasks simultaneously to improve system throughput and responsiveness. Effective concurrency management ensures that multiple processes or threads do not interfere with each other, avoiding race conditions and deadlocks.

Avoiding Race Conditions: Race conditions occur when multiple processes or threads access shared resources simultaneously, leading to unpredictable outcomes.

Ensuring Thread Safety: Thread safety ensures that shared data is accessed in a way that prevents data corruption and inconsistent states, typically through synchronization mechanisms like locks, semaphores, and atomic operations.

Purpose Solved: Concurrency management is crucial for maximizing system throughput and ensuring safe access to shared resources in multi-threaded or distributed environments.

10. Latency Metrics and Tracing

Monitoring Latency: Latency refers to the time taken for a request to travel from the client to the server and back. Monitoring latency helps identify bottlenecks and optimize system performance.

Tracing: Distributed tracing involves tracking a request’s path through different components of a system to understand its flow and identify latency sources.

Tools and Techniques:

  • Zipkin and Jaeger are popular tools for distributed tracing in microservices architectures.
  • Use latency metrics like P99 (99th percentile latency) to identify worst-case performance scenarios.

Purpose Solved: Monitoring latency and using tracing tools helps optimize system performance by identifying and resolving bottlenecks, ensuring a smooth user experience.

11. Messaging Queues

Deep Dive into Messaging Queues: Messaging queues allow asynchronous communication between different components of a system, enabling decoupled, scalable architectures.

Types of Messaging Patterns:

  • Pub/Sub (Publish/Subscribe): Enables broadcast messaging to multiple consumers.
  • Point-to-Point: Ensures that a message is consumed by only one receiver.

Real-World Example: RabbitMQ:

  • RabbitMQ is a widely used message broker supporting both pub/sub and point-to-point messaging, providing reliable, scalable message delivery.

Purpose Solved: Messaging queues enable asynchronous communication, improving system scalability and reliability by decoupling components and balancing load.

12. P2P Networks

What are Peer-to-Peer (P2P) Networks?: P2P networks distribute workload among peers, eliminating the need for a central server. Each peer can act as both a client and a server, sharing resources directly with other peers.

Use Cases and Challenges:

  • Use Cases: File sharing, distributed computing, blockchain networks.
  • Challenges: Ensuring security, managing peer discovery (tracker, gossip protocol), and handling dynamic network changes.

Purpose Solved: P2P networks enable decentralized architectures, improving fault tolerance and resource utilization by distributing load among multiple peers.

13. Practical Example: Designing a Microservices Architecture with Real-Time Communication

Scenario: Design a microservices architecture for a social networking platform that requires real-time messaging, efficient data processing, and strong consistency across distributed components.

Design Considerations:

  1. Microservices Structure:
  • Split different functionalities (e.g., user management, messaging, notifications, content delivery) into separate microservices to enhance scalability and maintainability.

2. Real-Time Communication with WebSockets:

  • Use WebSockets for the messaging service to enable real-time, bidirectional communication between users.

3. Pub/Sub Messaging System:

  • Implement a pub/sub system (e.g., Apache Kafka) to handle events such as message delivery, user status updates, and notifications.

4. Distributed Locks for Consistency:

  • Use Redis or ZooKeeper for distributed locks to ensure consistency when multiple services need to coordinate access to shared resources.

5. Concurrency Management:

  • Ensure thread safety and avoid race conditions in services handling concurrent requests, such as messaging and notifications.

6. Latency Metrics and Tracing:

  • Implement distributed tracing (e.g., using Jaeger) to monitor the flow of requests and measure latency across services, optimizing performance and identifying bottlenecks.

Outcome: By leveraging microservices architecture with real-time communication, pub/sub messaging, and distributed locks, the platform achieves scalability, low latency, and strong consistency, providing a robust user experience.

Conclusion

Mastering these advanced concepts and architectures equips system designers with the tools to build scalable, resilient, and high-performance systems. This article concludes our exploration of fundamental and advanced topics in system design, preparing you for practical application in real-world scenarios.

--

--

Bhavyansh @ DiversePixel

Hey I write about Tech. Join me as I share my tech learnings and insights. 🚀