Memory Cache versus Query with index in Spring Boot
When working with data, there are two primary approaches: caching and querying. The choice between these methods depends on the specific requirements of the data access scenario. This article explores the situation of finding a user by name, considering factors such as data rarely changing and method execution frequently.
Cache
Caching is a data storage technique that utilizes an intermediate layer, typically with faster access speeds, to minimize latency and enhance application performance.
Cacheable annotation
This annotation instructs Spring Boot to store the results of a method's first successful execution in a Cache. This Cached data is then retrieved and returned instead of executing the method again for subsequent calls with the same input parameters.
@Cacheable("users")
public List<User> getAllFromCache() {
return userRepository.findAll();
}
public User getByNameFromCache(String name) {
return userServiceCache
.getAllFromCache()
.stream()
.filter(user -> user.getName().equals(name))
.findFirst()
.orElseThrow(() -> new Error("USER_NOT_FOUND"));
}
Spring Boot manages cache providers through the CacheManager
Interface. For straightforward implementation, this article utilizes the ConcurrentMapCache
Provider.
@Bean
public CacheManager cacheManager() {
return new ConcurrentMapCacheManager("users");
}
ConcurrentMapCache
utilizes a concurrent HashMap
data structure is stored in the Heap, which is inherently volatile. This means that upon code deployment or server restarts, the Cached data is lost, leading to increased latency for subsequent user retrievals. Fortunately, this issue can be effectively addressed by reloading the Cache upon Spring Boot application startup using the EventListener
Annotation.
EventListener annotation
@EventListener
public void onApplicationReady(ApplicationReadyEvent event) {
userServiceCache.getAllFromCache()
}
Garbage Collector
Storing large Tables in HashMap
data structures within Heap memory can cause Heap memory to reach its limit, triggering the Garbage Collector to run even when there aren't any Objects to collect. This Caching approach can lead to more frequent Garbage Collector runs, which is not suitable for real-time applications.
Furthermore, Caching data in the Heap memory for extended periods can be considered a Memory Leak, as some Objects may never be accessed but continue to exist due to their reference in the HashMap
storage.
Query with Index
For the scenario described in the post, creating a B-Tree Index on the name
column of the User
Table is a minimal approach.
@Repository
public interface UserRepository extends JpaRepository<User, Long> {
Optional<User> findByName(String name);
}
public User getByNameFromIndex(String name) {
return userRepository
.findByName(name)
.orElseThrow(() -> new Error("USER_NOT_FOUND"));
}
In contrast to Caching, which involves storing a complete copy of the data in Memory, querying with an Index requires only a small amount of additional storage space for the Index structure itself.
Record | Index size |
10000 | 752 kB |
20000 | 1416 kB |
Network
The primary drawback of this approach is latency, which arises from the network communication between the Database and the Application. This latency is caused by various factors, including the physical distance between the servers, the overhead of protocol headers, and security mechanisms.
Benchmark
CPU: I5-1135G7
OS: Ubuntu 22.04
Network: Localhost
Tool: bombardier
Number of connections: 100
Number of requests: 100000
Number of records: 10000
Database: Postgres
Found record
The first benchmark scenario is to find an existing user with the input name.
Cache
Average Latency | Throughput | |
Case 1 | 6.75 ms | 3.36 MB/s |
Case 2 | 5.51 ms | 4.46 MB/s |
Case 3 | 7.34 ms | 3.27 MB/s |
Query
Average Latency | Throughput | |
Case 1 | 9.84 ms | 2.31 MB/s |
Case 2 | 9.16 ms | 2.69 MB/s |
Case 3 | 8.91 ms | 2.70 MB/s |
Not found record
The second benchmark scenario is user with the input name does not exist.
Cache
Average Latency | Throughput | |
Case 1 | 185.38 ms | 189.04 KB/s |
Case 2 | 185.76 ms | 195.98 KB/s |
Case 3 | 185.52 ms | 215.15 KB/s |
Query
Average Latency | Throughput | |
Case 1 | 183.22 ms | 191.24 KB/s |
Case 2 | 190.16 ms | 191.45 KB/s |
Case 3 | 190.58 ms | 209.46 KB/s |