Sizing your Connection Pool

Introduction

I started to build up the ideas presented in this post while reading an email (sent to the play-framework mailing list) from the author of BoneCP, deprecating it in favour of HikariCP. From there, I read a really good article where the connection pool heuristic about sizing is explained, and another one with a thorough critique of different pools.

The formula below is provided by the PostgreSQL project as a heuristic to start trying different pool settings around this starting point:

connections = ((core_count * 2) + effective_spindle_count)

This made me think about the differences that choices about the connection pool can bring, and decided to do some deliberate experiments about how different implementations and settings would affect the latency of a typical data access layer.

The Problem

Imagine you are developing a great web platform, with a classic architecture: a front-facing web interface that calls REST services on the backend, that in turn hits the database and reply to the user with the content requested.

Once your platform reaches success, the application server is overloaded, and so is the database.

At this point, if you didn’t plan accordingly, the answer to the question:

“Will my users stay, or will they go away experiencing the website is sloooow?”

won’t be really pleasing.

Actually, you should have planned for it, and this is the goal of this article:
a simple way to setup things and start doing capacity planning, focusing on one of the lowest layers of the architecture, the database connection pool. Most of web platforms are deployed on clusters, but I’ll just start simple and understand this small piece of the puzzle and build things up from here.

The Connection Pool

Once the website is hit with 10k users, you could easily have around 20k tps on db.

  • How do you manage database connections?
  • Do you open one connection for each client request?

Of course not, since opening and closing database connections is expensive,
so there are several implementations out there that provide just this:
a cache to reuse existing connections, so that clients can be served faster.

For these tests, I considered the following connection pools:

There are also several others, for example the one provided in Tomcat.
Anyway I don’t want to advice one pool over the other, since you should always simply define your use case, measure the performances, tune and then decide for the best that suits you. And by the way the query is just a simple SELECT * FROM table WHERE id = 1;

Show Me The Numbers!

I started to develop a small infrastructure to test the latency, but then I did some research and found out that JMH is a much more advanced tool that suits the purpose very well. I started from scratch using the maven archetype, and then including a simple data access layer implemented with Mybatis.

If you are really serious about measuring latency, you should watch Gil Tene’s excellent talk on How NOT to Measure Latency, where he discusses coordinated omission and its impact on latency measurement. The numbers shown in this experiment mostly suffer for this, and maybe one day I’ll experiment with his HdrHistogram.

I decided to focus test towards one specific aspect of the pools, the latency of a request.
You can find the code to run these tests on github, where pull requests are welcome to fix or improve things.
The test configures 2 threads to hit the db, with 2 scenarios: fixed or increasing pool size.

The Results

I report here just the ones for HikariCP, if you are curious about other pool implementations, just checkout the code and run it yourself.

Fixed pool size

# Benchmark: fm.mox.mybatis.jmh.HikariCPBenchmark.testGetById
# Parameters: (maximumPoolSize = 1)
Result: 1.508 ±(99.9%) 0.119 ms/op [Average]
Statistics: (min, avg, max) = (0.573, 1.508, 22.413), stdev = 1.231
Confidence interval (99.9%): [1.389, 1.627]
Samples, N = 1164
mean = 1.508 ±(99.9%) 0.119 ms/op
min = 0.573 ms/op
p( 0.0000) = 0.573 ms/op
p(50.0000) = 1.237 ms/op
p(90.0000) = 2.060 ms/op
p(95.0000) = 2.758 ms/op
p(99.0000) = 6.838 ms/op
p(99.9000) = 21.900 ms/op
p(99.9900) = 22.413 ms/op
p(99.9990) = 22.413 ms/op
p(99.9999) = 22.413 ms/op
max = 22.413 ms/op

# Benchmark: fm.mox.mybatis.jmh.HikariCPBenchmark.testGetById
# Parameters: (maximumPoolSize = 5)
Result: 0.919 ±(99.9%) 0.087 ms/op [Average]
Statistics: (min, avg, max) = (0.489, 0.919, 21.692), stdev = 1.142
Confidence interval (99.9%): [0.831, 1.006]
Samples, N = 1861
mean = 0.919 ±(99.9%) 0.087 ms/op
min = 0.489 ms/op
p( 0.0000) = 0.489 ms/op
p(50.0000) = 0.669 ms/op
p(90.0000) = 1.198 ms/op
p(95.0000) = 1.836 ms/op
p(99.0000) = 5.637 ms/op
p(99.9000) = 20.902 ms/op
p(99.9900) = 21.692 ms/op
p(99.9990) = 21.692 ms/op
p(99.9999) = 21.692 ms/op
max = 21.692 ms/op

# Benchmark: fm.mox.mybatis.jmh.HikariCPBenchmark.testGetById
# Parameters: (maximumPoolSize = 100)
Result: 1.290 ±(99.9%) 0.135 ms/op [Average]
Statistics: (min, avg, max) = (0.553, 1.290, 33.456), stdev = 1.535
Confidence interval (99.9%): [1.155, 1.425]
Samples, N = 1412
mean = 1.290 ±(99.9%) 0.135 ms/op
min = 0.553 ms/op
p( 0.0000) = 0.553 ms/op
p(50.0000) = 0.880 ms/op
p(90.0000) = 2.324 ms/op
p(95.0000) = 3.387 ms/op
p(99.0000) = 5.200 ms/op
p(99.9000) = 33.388 ms/op
p(99.9900) = 33.456 ms/op
p(99.9990) = 33.456 ms/op
p(99.9999) = 33.456 ms/op
max = 33.456 ms/op

Increasing pool size


# Benchmark: fm.mox.mybatis.jmh.HikariCPBenchmark.testGetById
# Parameters: (maximumPoolSize = 1)
Result: 1.129 ±(99.9%) 0.053 ms/op [Average]
Statistics: (min, avg, max) = (0.507, 1.129, 15.450), stdev = 0.642
Confidence interval (99.9%): [1.076, 1.182]
Samples, N = 1592
mean = 1.129 ±(99.9%) 0.053 ms/op
min = 0.507 ms/op
p( 0.0000) = 0.507 ms/op
p(50.0000) = 1.022 ms/op
p(90.0000) = 1.290 ms/op
p(95.0000) = 1.486 ms/op
p(99.0000) = 3.264 ms/op
p(99.9000) = 12.905 ms/op
p(99.9900) = 15.450 ms/op
p(99.9990) = 15.450 ms/op
p(99.9999) = 15.450 ms/op
max = 15.450 ms/op

# Benchmark: fm.mox.mybatis.jmh.HikariCPBenchmark.testGetById
# Parameters: (maximumPoolSize = 5)
Result: 0.884 ±(99.9%) 0.061 ms/op [Average]
Statistics: (min, avg, max) = (0.482, 0.884, 12.026), stdev = 0.833
Confidence interval (99.9%): [0.823, 0.946]
Samples, N = 2003
mean = 0.884 ±(99.9%) 0.061 ms/op
min = 0.482 ms/op
p( 0.0000) = 0.482 ms/op
p(50.0000) = 0.678 ms/op
p(90.0000) = 1.174 ms/op
p(95.0000) = 1.910 ms/op
p(99.0000) = 5.175 ms/op
p(99.9000) = 11.836 ms/op
p(99.9900) = 12.026 ms/op
p(99.9990) = 12.026 ms/op
p(99.9999) = 12.026 ms/op
max = 12.026 ms/op

# Benchmark: fm.mox.mybatis.jmh.HikariCPBenchmark.testGetById
# Parameters: (maximumPoolSize = 100)
Result: 0.593 ±(99.9%) 0.023 ms/op [Average]
Statistics: (min, avg, max) = (0.384, 0.593, 8.765), stdev = 0.386
Confidence interval (99.9%): [0.571, 0.616]
Samples, N = 3089
mean = 0.593 ±(99.9%) 0.023 ms/op
min = 0.384 ms/op
p( 0.0000) = 0.384 ms/op
p(50.0000) = 0.542 ms/op
p(90.0000) = 0.696 ms/op
p(95.0000) = 0.797 ms/op
p(99.0000) = 1.834 ms/op
p(99.9000) = 7.297 ms/op
p(99.9900) = 8.765 ms/op
p(99.9990) = 8.765 ms/op
p(99.9999) = 8.765 ms/op
max = 8.765 ms/op

Comments

The tests were ran on my MacBook Pro i7 with 2 cores, and changing a bit the JMH configuration it is easy to change some parameters and run on a 16 cores box.

From these numbers we can obviously (from this article) see that the pool size affects latency, and the impact is more significant on the fixed size pool than on the increasing pool, because with 2 threads issuing requests the pool would not be required to grow that much over the initial size.

Conclusions

This post was oriented at showing how to lay out a methodology for performance tuning rather than telling what’s the best connection pool implementation. I wrote it just to satisfy my curiosity and learn a thing or two about connection pool sizing and performance measuring.

If you are serious about performance, always measure your assumptions and be aware about the fallacies about the coordinated omission that you may be incurring into. Hope it satisfied your curiosity as well and provided some food for thought.

Leave a comment