Memory Cache versus Query with index in Spring Boot

When working with data, there are two primary approaches: caching and querying. The choice between these methods depends on the specific requirements of the data access scenario. This article explores the situation of finding a user by name, considering factors such as data rarely changing and method execution frequently.

Cache

Caching is a data storage technique that utilizes an intermediate layer, typically with faster access speeds, to minimize latency and enhance application performance.

Cacheable annotation

This annotation instructs Spring Boot to store the results of a method's first successful execution in a Cache. This Cached data is then retrieved and returned instead of executing the method again for subsequent calls with the same input parameters.

@Cacheable("users")
public List<User> getAllFromCache() {
    return userRepository.findAll();
}
public User getByNameFromCache(String name) {
    return userServiceCache
            .getAllFromCache()
            .stream()
            .filter(user -> user.getName().equals(name))
            .findFirst()
            .orElseThrow(() -> new Error("USER_NOT_FOUND"));
}

Spring Boot manages cache providers through the CacheManager Interface. For straightforward implementation, this article utilizes the ConcurrentMapCache Provider.

@Bean
public CacheManager cacheManager() {
    return new ConcurrentMapCacheManager("users");
}

ConcurrentMapCache utilizes a concurrent HashMap data structure is stored in the Heap, which is inherently volatile. This means that upon code deployment or server restarts, the Cached data is lost, leading to increased latency for subsequent user retrievals. Fortunately, this issue can be effectively addressed by reloading the Cache upon Spring Boot application startup using the EventListener Annotation.

EventListener annotation

@EventListener
public void onApplicationReady(ApplicationReadyEvent event) {
    userServiceCache.getAllFromCache()
}

Garbage Collector

Storing large Tables in HashMap data structures within Heap memory can cause Heap memory to reach its limit, triggering the Garbage Collector to run even when there aren't any Objects to collect. This Caching approach can lead to more frequent Garbage Collector runs, which is not suitable for real-time applications.

Furthermore, Caching data in the Heap memory for extended periods can be considered a Memory Leak, as some Objects may never be accessed but continue to exist due to their reference in the HashMap storage.

Query with Index

For the scenario described in the post, creating a B-Tree Index on the name column of the User Table is a minimal approach.

@Repository
public interface UserRepository extends JpaRepository<User, Long> {
    Optional<User> findByName(String name);
}
public User getByNameFromIndex(String name) {
    return userRepository
            .findByName(name)
            .orElseThrow(() -> new Error("USER_NOT_FOUND"));
}

In contrast to Caching, which involves storing a complete copy of the data in Memory, querying with an Index requires only a small amount of additional storage space for the Index structure itself.

RecordIndex size
10000752 kB
200001416 kB

Network

The primary drawback of this approach is latency, which arises from the network communication between the Database and the Application. This latency is caused by various factors, including the physical distance between the servers, the overhead of protocol headers, and security mechanisms.

Benchmark

  • CPU: I5-1135G7

  • OS: Ubuntu 22.04

  • Network: Localhost

  • Tool: bombardier

  • Number of connections: 100

  • Number of requests: 100000

  • Number of records: 10000

  • Database: Postgres

Found record

The first benchmark scenario is to find an existing user with the input name.

Cache

Average LatencyThroughput
Case 16.75 ms3.36 MB/s
Case 25.51 ms4.46 MB/s
Case 37.34 ms3.27 MB/s

Query

Average LatencyThroughput
Case 19.84 ms2.31 MB/s
Case 29.16 ms2.69 MB/s
Case 38.91 ms2.70 MB/s

Not found record

The second benchmark scenario is user with the input name does not exist.

Cache

Average LatencyThroughput
Case 1185.38 ms189.04 KB/s
Case 2185.76 ms195.98 KB/s
Case 3185.52 ms215.15 KB/s

Query

Average LatencyThroughput
Case 1183.22 ms191.24 KB/s
Case 2190.16 ms191.45 KB/s
Case 3190.58 ms209.46 KB/s