Understanding Caching, Load Balancing, and Content Delivery Networks (CDNs) in Web Applications

·

22 min read

Caching

Examples of Places where data is shown very frequently to the end user.

Context (Application)Data shown very frequently
YouTubeFrequently accessed metadata on the Trending page like video titles, thumbnails, tags, duration.
AmazonDetails of top products, including their descriptions, names, prices, images, and category information.
ZerodhaFinancial reports and analyses from the last week or month.
SwiggyFrequently browsed data of menus and high-rated restaurants.

Cache - Problem Statement:

In dynamic, user-centric applications like YouTube, Amazon, Zerodha, and Swiggy, certain data points are accessed frequently by users. Every time such a request is made, the data has to be fetched from the main database, which can be time-consuming and put unnecessary load on the server.

Cache - Solution:

Caching is an efficient way to solve this problem. By storing frequently accessed data in a cache, a temporary storage area, the data retrieval times can be greatly reduced, and the server load is decreased, improving overall system performance.

Which Scenario should I Cache?

Caching is most effective when we apply the Locality of Reference principle, which states that the same values, or related data, are frequently accessed. It's about predicting which data will be accessed regularly based on usage patterns and caching that for future use. For instance, on a YouTube Trending page or Amazon's Top Products page, the data is relatively static and accessed frequently by users - an ideal scenario for caching.

Cache vs Database

Cache (In-memory storage)Database (Persistent storage)
Data RetentionVolatile - data is lost when the process is terminated or system is powered offNon-volatile - data is maintained even after the system is powered off
Access SpeedHigh - data is stored in RAM, which has lower latency and faster access timesLower - data is stored in disk, which has higher latency and slower access times compared to RAM
Data SizeSuitable for small to medium datasets - limited by the size of the available memorySuitable for larger datasets - only limited by the size of the available disk space
Use CaseUsed for data that requires fast access and is accessed frequently (e.g., session data, frequently accessed subsets of larger datasets)Used for storing the main copy of the data, including data that isn't accessed frequently
ConsistencyTypically, eventual consistency is acceptable as cache data can be repopulated from the database if neededStrong consistency is required as this is the authoritative source of data
CostHigher - memory is more expensive per byte than disk storageLower - disk storage is cheaper per byte than memory
Data StructureTypically simple key-value pairs for fast lookupSupports complex data structures, relationships, and queries
Failure ImpactLow - as the cache can be regenerated from the databaseHigh - loss of data from the database typically means permanent data loss unless backups are available

Consider a sample request as shown in the figure below, we can cache at all levels

Untitled

ComponentCaching Explanation
ClientStores resources like HTML, CSS, JavaScript, and images to reduce the number of requests to the server.
Web ServerCaches responses from the application server, especially for responses that are expensive to generate and don't change often.
Application ServerCaches frequently accessed data, results of complex calculations, or even full HTML pages to reduce the load on the database and improve response time.
DatabaseCaches frequently accessed data and query results to speed up read operations.

Problem Statement: You have a server that receives 1000 requests, but your cache can only accommodate 100 requests. As new requests keep coming in, you need a mechanism to decide which requests should be kept in the cache and which ones should be evicted to make room for new requests.

Solution: The solution lies in implementing a cache eviction policy. Cache eviction policies are strategies that decide which items should be removed from the cache when it is full, making space for new items.

Basic terminologies

TermDefinition
CacheA component that stores data so future requests for that data can be served faster
Cache HitAn event that occurs when the requested data is found in the cache
Cache MissAn event that occurs when the requested data is not found in the cache
Cache EvictionThe process of removing entries from the cache based on certain policies (like LRU, LFU, etc.)
Cache Aside PatternA pattern where the application code is responsible for loading data into the cache from the data source when the requested data is not found in the cache
Cache StampedeA type of cascading failure that can occur when intense bursts of traffic trigger an abnormal number of cache misses at the same time
Distributed CacheA cache that spreads its entries across multiple nodes in a network, improving scalability and performance
Global CacheA type of distributed cache where the complete dataset is stored in each node
Cache CoherenceThe consistency of shared resource data in distributed cache systems
Time to live (TTL)A mechanism that determines the lifespan of data in a cache.
Write-throughA caching strategy where data is written into the cache and the corresponding database simultaneously
Write-back/Write-behindA caching strategy where data is written to cache first and then to the corresponding database at certain intervals or under certain conditions
Cache warmingThe process of loading frequently accessed data into a cache before it is requested

Cache Eviction Policies

Cache eviction policies are the rules that determine which entries should be removed from the cache when it is full and new entries need to be added. The objective of these policies is to optimize the use of cache by efficiently deciding which entries are least likely to be used in the future. Here are some common cache eviction policies:

PolicyDefinition
LRU (Least Recently Used)Discards the least recently used items first.
LFU (Least Frequently Used)Discards the least frequently used items first.
FIFO (First In, First Out)Discards the oldest data first.
LIFO (Last In, First Out)Discards the newest data first.
Random Replacement (RR)Randomly selects a candidate item and discards it to make space when necessary.

The choice of the eviction policy would generally depend on the access patterns of the data.

Caching Patterns

Caching patterns are strategies to manage data caching, which can greatly improve the performance and also over come cache Invalidation - When data in the database is constantly being updated, it is important to ensure that the cache is also updated to reflect these changes. Otherwise, the application will serve outdated or stale data. So, we use these techniques to also maintain the consistency of the cache.

Cache-Aside Pattern (Lazy Loading):

Cache-aside is one of the commonly used caching strategies, where cache and database are independent, and it is the responsibility of the application code to manage cache and database to maintain data consistency. The application interacts with the cache and database, and the cache doesn’t interact with the database at all. So cache is “kept aside” as a scalable in-memory data store.


Read operation in cache-aside pattern

mermaid-diagram-2023-07-29-125827.png

Write operation in cache-aside pattern

Catch-Aside Pattern - Write.png

Write Through Cache

Data is sent to Cache and then to DB and responded back

Write Through Cache.png

AdvantagesDisadvantages
Data consistency: Ensures strong data consistency between the cache and the database because every write to the cache results in a write to the database.
Latency: Every write operation has the latency cost of writing to both the cache and the database, which can slow down the application.

| | Simplicity: Simpler to implement since there's no synchronization required between cache and database. | Resource Usage: Increased resource usage as every write operation is performed twice. | | Reliability: Provides higher reliability as data is written to persistent storage immediately. | Performance Impact: Depending on the frequency of write operations, it may slow down the overall performance of the system due to the time taken to write to the database in addition to the cache. |

Write Back Cache

In the Write-Back (or Write-Behind) caching strategy, when a client makes a write request, the request goes to the cache first. The cache then marks the corresponding data as "dirty" and updates its content, responding to the client with a successful write acknowledgment. This strategy operates asynchronously - the cache later syncs the updated data back to the database without causing any delay to the client's write request. Since there is a delay to update the latest data into database when compared to cache, there is a possibility of data loss if the cache fails for some reason

AdvantagesDisadvantages
1. Speed: Write operations are faster as data is written to the faster cache.1. Data Consistency: There can be a delay in reflecting the changed data in the database, potentially leading to inconsistency.
2. Reduced Database Load: The immediate write load on the database is lessened due to asynchronous writes.2. Data Loss: There's a risk of data loss in case of system failure before the cache has had a chance to write back the data to the database.
3. Complexity: Managing when and how data is written back to the database can add extra complexity to the system.

Write Around Cache

Data is written directly to the database without updating the cache. When subsequent read requests are made for the same data, it's not in the cache, so the cache fetches it from the store and serves it from there for any future requests.

AdvantagesDisadvantages
Reduced I/O to the cacheInitial read performance hit
Prevents cache pollutionHigher latency for write operations

What are Distributed Caches

Distributed caching is the practice of using multiple caching servers that are spread across a network. Unlike traditional caches, which are usually limited to the memory of a single machine, distributed cache can scale beyond the memory limits of a single machine by linking together multiple machines (distributed clusters).

In distributed caching, each caching server maintains a portion of the cached data, and requests for data are directed to the appropriate server based on a hashing algorithm or some distribution strategy.

  • Distributed caching is useful in an environment of high data access because distributed architecture can incrementally add more machines to the cluster. This will increase the cache size with the growth in data.
  • Distributed caching provides several benefits: faster response times, scalability, increased availability, and better fault tolerance. It is used in large-scale applications and cloud-based architectures.

There is an issue when multiple servers store cache and everything has different data would be not consistent

solution : Global Cache

Global Cache:

In a distributed system, a global cache can be seen as a distributed cache where each node shares the same view of the cached data. For example, consider an e-commerce website like Amazon which is served by multiple servers around the world. When a user makes a request, it can hit any of these servers. Let's say the user is searching for a particular book and this request goes to Server A. Server A fetches the book's data from the database and stores it in its cache before returning the data to the user.

Now, if another request for the same book comes and hits Server B, if we are using a global cache, Server B would already have the book's data in its cache (even though the original request went to Server A). This is because the cache updates are propagated to all nodes in a global cache system.

This approach ensures data consistency but increases the complexity due to the need to maintain cache consistency across multiple nodes. Also, it can lead to higher network traffic and latency because every cache write needs to be propagated to all nodes.

But we can also use Distributed Non-Global Cache

Non-global (Local) Distributed Cache:

In a non-global distributed cache, each node maintains its own cache. Using the same Amazon example, if a request for a book's data goes to Server A, it fetches the data from the database, stores it in its cache, and returns it to the user. However, in this case, if another request for the same book's data comes but hits Server B, Server B wouldn't have the book's data in its cache because the cached data isn't shared between nodes.

This approach is less complex and can provide better performance due to reduced network traffic. However, it can lead to data inconsistency, as different nodes might have different data in their caches.

How to handle distributed caches?

Strategies to manage distributed caches include:

  1. Consistent Hashing: This method ensures that when a node is added or removed, only a minimal amount of keys need to be relocated, keeping the system stable.
  2. Data Replication: Keeping multiple copies of the same data in different nodes can increase resiliency and data availability.
  3. Cache Invalidation: Using techniques like write-through or write-behind caching helps keep the cache and the database in sync.

Redis

Redis is an in-memory data structure store, used as a distributed, in-memory key–value database, cache and message broker, with optional durability.

Redis can function as both a standalone (normal) cache and a distributed cache.

  1. Standalone (Normal) Cache: In a simple deployment, Redis can be used as a local cache for an application. The application directly communicates with a single Redis node for storing and retrieving cache data.
  2. Distributed Cache: In a distributed cache setup, cache data is spread across multiple nodes. This increases the total amount of data that can be stored in the cache and improves cache performance by increasing the total processing power of the cache layer. Redis can be used in this manner by setting up multiple Redis instances and using client-side sharding or by using Redis Cluster.
  3. Redis Cluster: Redis Cluster is Redis's answer to high scalability requirements and is essentially a form of distributed caching. It automatically divides your dataset among multiple nodes. This not only allows you to store larger datasets but also improves performance by increasing the total computational power. Redis Cluster also provides automatic failover and therefore high availability.

Redis Features

  1. Data structures: Redis supports a number of in-memory, high-performance data structures. Each structure is associated with a set of server-side commands that let you interact with the data.
  2. In-Memory Store: All data in Redis is stored in memory, which enables high-speed read and write operations. This is ideal for use cases that require real-time data processing.
  3. Persistence: Even though Redis is an in-memory database, it provides two mechanisms for persisting data onto a disk:
Persistence MethodDescription
RDB (snapshotting)It saves the state of the Redis dataset at specific points in time in a binary file. This is useful for backups.
AOF (Append Only File)It logs every write operation received by the server, which can then be replayed at startup, reconstructing the original dataset.
  1. Replication: Redis uses master-slave replication. It allows slave Redis servers to be exact copies of their master servers. All write operations performed on the master are propagated to the connected slaves.
  2. Transactions: Redis groups commands together and then executes them as a single transaction (using MULTI and EXEC commands), providing a certain level of atomicity.
  3. Pub/Sub Capabilities: Redis Pub/Sub implements the messaging system where the senders (publishers) sends the messages while the receivers (subscribers) receive them. The messages are sent to channels named by the clients.
  4. Scripting: Redis supports Lua scripting, which allows for the execution of complex procedures in the server itself, without network overhead.
  5. Scalability: Redis offers several options for scaling data:
Scaling OptionDescription
Vertical ScalingInvolves increasing the capacity of a single server, such as adding more memory or computational power.
Horizontal ScalingInvolves dividing the dataset among multiple servers. This can be done using Redis's built-in partitioning, or through client-side sharding.

Redis by default → Write Back Cache but can configure other mechanism too.

Redis vs Memcached

FeatureRedisMemcached
Data TypesRedis supports more complex data types like hashes, lists, sets, sorted sets, bit arrays, hyperloglogs and geospatial indexes, in addition to simple key-value pairs.Memcached only supports simple string-based key-value pairs.
PersistenceRedis can be configured to write its data to the disk, making it suitable for tasks that require durability and fault-tolerance.Memcached does not support data persistence. When a Memcached server is restarted, all data is lost.
ReplicationRedis supports master-slave replication. This allows it to be used in a distributed system and provides a level of fault-tolerance.Memcached does not support replication.
Memory UsageRedis tends to be more memory-efficient due to advanced eviction policies (like LRU, LFU etc.) and memory management.Memcached might consume more memory for metadata, especially for smaller objects.
Atomic OperationsRedis provides more advanced operations like transactions and Lua scripting.Memcached provides basic atomic operations like increment/decrement.
Thread SupportRedis uses single-threaded architecture, but starting from version 6.0 it offers multithreaded I/O.Memcached supports multithreading natively, potentially providing better performance on multi-core machines.
Use CaseRedis is more suitable when you need a more feature-rich solution or have complex data types. Its persistence and replication make it suited for more than just caching.Memcached is a great fit when you need a simple, quick, and straightforward caching solution without the need for persistence or complex data structures.

Proxy

Intermediate component that acts on behalf of client

Forward Proxy vs Reverse Proxy

1_xvcdLFcqCTH-GBD6VsnBfg.webp

TypeDescriptionUsageExample
Forward ProxyActs on behalf of the client. Provides anonymity, content filtering, access control, and caching.Used by clients to request resources from different servers while maintaining their anonymity.Individuals accessing geo-blocked content; Schools or offices limiting access to certain websites and caching frequently visited sites to save bandwidth.
Reverse ProxyActs on behalf of the server. Provides security, load balancing, SSL termination, and caching.Used by servers to distribute incoming client requests to appropriate servers.Large websites distributing traffic among several servers for load balancing; E-commerce sites using reverse proxies for SSL termination to offload the processing of SSL encryption/decryption.

Load Balancer

ConceptDefinitionExample
Load BalancerA device that distributes network or application traffic across a number of servers to increase capacity and reliability.An e-commerce website using a load balancer to handle high traffic during holiday seasons.
Service RoutingMethod by which requests are directed to appropriate services based on set rules in a microservices architecture.A router directing 'Payment' requests to Payment Service and 'Inventory' requests to Inventory Service in an e-commerce application.
Service DiscoveryA component of distributed systems where services are automatically discovered as they come online or go offline.Netflix’s Eureka keeps track of all the microservices and their instances.
Service DirectoryA database of available service instances, often used interchangeably with service registry.A Service Directory maintaining a list of services like User Service, Payment Service, and so on, in a microservice architecture.

Load balancing strategies

Load Balancing StrategyShort DescriptionAdvantagesDisadvantages
Round RobinDistributes requests in a circular manner across all servers.Simple and fair queue, works well with equal distribution.Ignores server load and response times. Not ideal for servers with different processing capabilities.
Least ConnectionsRoutes requests to the server with the fewest active connections.Handles uneven traffic effectively.Inefficient if server performance varies significantly.
Weighted Round RobinAssigns a weight to each server and distributes requests accordingly.Fair and can account for different server capacities.Assigning appropriate weights can be challenging.
Weighted Least ConnectionsCombines the benefits of least connections and weighting.Takes into account both the number of connections and servers' capabilities.Complex to implement and maintain.
IP HashAssigns each user to a specific server based on their IP address.Great for applications requiring session persistence.Poor distribution if client base is small or unevenly distributed.
GeographicalDirects requests based on the geographic location of the user or server.Reduces latency and complies with data residency requirements.Requires geolocation capabilities. May lead to uneven load distribution.
Content-BasedRoutes requests based on the content of the request.Allows for targeted distribution of traffic.Complex to implement, requires understanding of request content.
Least Response TimeSends requests to the server with the quickest response time.Minimizes latency.Requires constant monitoring, might not consider server load.

Load balancers are generally grouped into two categories: Layer 4 and Layer 7. Layer 4 load balancers act upon data found in network and transport layer protocols (IP, TCP, FTP, UDP). Layer 7 load balancers distribute requests based upon data found in application layer protocols such as HTTP.

Content Delivery Network

A Content Delivery Network (CDN) is a geographically distributed network of servers that work together to provide fast delivery of internet content. It allows for the quick transfer of assets needed for loading internet content including HTML pages, javascript files, stylesheets, images, and videos.

A CDN minimizes the distance between the visitor and the website's server, hence reducing the latency and increasing the site rendering speed. This is done through Points of Presence (PoP), which are physical data centers or cache servers located around the world that help deliver content more efficiently.

For example, if you're a YouTube user in India looking for trending videos, a CDN would make sure you get the content that is popular in India, as stored in the nearest PoP, rather than having to fetch that content from a server in the US, thus reducing latency.

Here's a table explaining key features of a CDN:

FeatureDescription
POP (Points of Presence)Physical data centers or cache servers located around the world that deliver content to users from the nearest location.
Content Caching and ReplicationCDNs store cached versions of your website's content in multiple geographical locations (PoPs), thus reducing the load on the original server and improving user experience by delivering content faster.
Load BalancingCDNs distribute network traffic across multiple servers to avoid network congestion and improve user experience.
Failure and RedundancyCDNs provide backup servers in case the primary ones fail. This redundancy improves reliability and uptime.
SecurityCDNs protect websites against malicious attacks and data breaches. They can provide DDoS protection, security certificates, and other security features.

Please note that the specific content cached in different geographical regions will depend on the content popularity and access patterns in those regions. This is managed by complex algorithms used by the CDN providers.

Resources:

System Design - Distributed Cache

https://www.youtube.com/watch?v=iuqZvajTOyA

Quiz

QuestionAnswer OptionsCorrect Answer
Which communication pattern is best suited for message queues in a distributed system?- Synchronous communication
- Asynchronous communication
- Direct peer-to-peer communication
- Centralized client-server communication
Ans!Asynchronous communication
What is load balancing in a system design context?- The process of distributing data across multiple caches for redundancy.
- The practice of distributing network traffic or requests across multiple servers.
- The process of securing data by encrypting it during transmission.
- The practice of storing and managing data on multiple servers.
Ans!The practice of distributing network traffic or requests across multiple servers.
Which load balancing algorithm allows clients to select a server based on a key or hash of the client's IP address?- Round-robin
- Weighted Round-robin
- Least Connections
- IP Hash
Ans!IP Hash
What is a "cache hit" in the context of caching?- When new data is stored in the cache for the first time.
- When the cache is full and cannot store any more data.
- When requested data is found in the cache, avoiding the need to fetch it from the original source.
- When the cache is cleared to free up memory for new data.
Ans!When requested data is found in the cache, avoiding the need to fetch it from the original source.
What is "write-through caching"?- Data is written to the cache only and not to the original data source.
- Data is written to both the cache and the original data source simultaneously.
- Data is written to the original data source only and not to the cache.
- Data is written to the cache first and then periodically synchronized with the original data source.
Ans!Data is written to both the cache and the original data source simultaneously.
What is cache invalidation?- The process of encrypting cache data for security purposes.
- The process of removing cached data that has expired or become invalid.
- The process of moving cache data to a different location for redundancy.
- The process of compressing cache data to save storage space.
Ans!The process of removing cached data that has expired or become invalid.
How does a forward proxy work?- It intercepts and forwards all incoming requests from clients to the appropriate servers.
- It intercepts and forwards all outgoing requests from servers to the appropriate clients.
- It acts as an intermediary between clients and servers, forwarding client requests to external servers on their behalf.
- It acts as an intermediary between clients and servers, forwarding server responses to clients on their behalf.
Ans!It acts as an intermediary between clients and servers, forwarding client requests to external servers on their behalf.
How does a CDN improve the performance of a website or application?- By compressing all outgoing data for faster transmission.
- By encrypting all incoming data to ensure security.
- By storing and serving content from geographically distributed servers.
- By providing direct access to the origin server for faster response times.
Ans!By storing and serving content from geographically distributed servers.
Which load balancing method allows the load balancer to consider the server's processing capacity when distributing traffic?- Round-robin
- Least Connections
- IP Hash
- Weighted Round-robin
Ans!Weighted Round-robin
What is the main objective of cache eviction strategies?- To ensure that all data in the cache is regularly refreshed.
- To reduce the overall size of the cache to conserve memory.
- To minimize cache misses and maximize cache hit rates.
- To encrypt all cached data for security purposes.
Ans!To minimize cache misses and maximize cache hit rates.