BIND 9.20 Brings Streamlined Core, Some New Features

Today, ISC is proud to release BIND 9.20.0, our newest stable branch.

This new release of BIND is available on our downloads page.

This branch will be supported for four years, through the first quarter of 2028, per our current software release plan. After several months of experience in production, we plan to declare this version to be ESV (an Extended Support Version), likely around the end of 2024 or early 2025.

Application Infrastructure Improvements in BIND 9.20.0:

  • The application core (the infrastructure that holds everything together) has been rewritten to use libuv asynchronous event loops exclusively. In BIND 9.16, we introduced a new networking manager using libuv as an asynchronous event handler on top of the existing application infrastructure. In BIND 9.20, the transition to libuv asynchronous loops is complete and BIND 9 is powered by libuv from the ground up. This simplifies and streamlines the internal infrastructure and allows us to keep the data processing pinned to threads and reduce context switching, which improves overall resource consumption.
  • At the same time, we are using specialised threadpools provided by libuv to offload long-duration tasks and, instead of quantising the work on our own, we rely on the operating system scheduler to provide fair scheduling between the networking and offloaded threads. This simplifies the code that powers Response Policy Zones, Catalog Zones, Zone Transfers, DNSSEC Validation, and a couple other long-running tasks - and improves latency when long-running tasks are mixed with normal DNS queries.
  • A new database backend, called QP trie, has been added to BIND 9 and made the default cache and zone database, replacing the venerable RBTDB (Red-Black Tree Database). The QP trie database uses the Userspace RCU (Read-Copy-Update) Library, which is now mandatory to compile and run BIND 9. Using Userspace RCU will allow us to remove POSIX locking as a synchronisation mechanism and replace it with Quiescent-State-Based Reclamation (QSBR) as a memory reclamation mechanism. Much work remains to be done, but in the future you should expect BIND 9 to be more scaleable on systems with many CPUs.
  • The DNS name compression algorithm used in BIND 9 has been revised: it now compresses more thoroughly than before, so responses containing names with many labels may have a more compact encoding than before.

Improvements in DNSSEC Support

  • DNSSEC Policy is now the only option for managing signed zones. The auto-dnssec option has been removed.
  • Support for DNSSEC multi-signer model 2 (IETF RFC 8901) when using inline-signing was added.
  • PKCS#11 Support has been restored by utilising the new OpenSSL 3.0.0 Engine API.
  • HSM support was added to dnssec-policy. Keys can now be configured with a key-store that allows users to set the directory where key files are stored and to set a PKCS#11 URI string. The latter requires OpenSSL 3 and a valid PKCS#11 provider to be configured for OpenSSL.

Feature Updates

  • Catalog Zones schema version 2 (as described in the “DNS Catalog Zones” IETF draft version 5 document), is now supported by BIND 9.
  • More Extended DNS Errors are now supported.
  • The DNS over TCP and DNS over TLS implementations have been refactored to use a unified transport. This in turn allowed us to add the new PROXYv2 transport.
  • PROXYv2 support is available for all DNS transports currently supported by BIND.
  • Support for User Statically Defined Tracing (USDT) probes has been added. These probes enable fine-grained application tracing using the perf command and introduce no overhead when they are not enabled. We plan to add more user probes in the future releases.
  • The statistics channel now includes information about incoming zone transfers in progress.

Resolver Performance Testing Results

For a description of the resolver performance test bed used to create the charts below, please see this earlier blog. Note that the most impressive change is significantly reduced memory usage with better latency. The improvements we see in BIND 9.20 vs BIND 9.18 are a continuation of the improvements we saw from BIND 9.16 to BIND 9.18. BIND 9.20 is better-performing than BIND 9.18, and far better-performing than BIND 9.16.

Charts shown in this section show aggregate results from three repetitions of the test on each version. The full line shows the average of the three tests, while the colored background shows the range between minimum and maximum for each version. That is, the wider the color background is, the more unpredictable the characteristic is, and vice versa.

Response latency - How quickly does the resolver respond?

The most useful but also the most convoluted metric is response latency, which directly affects user experience. Unfortunately, DNS latency is wildly non-linear: most answers will arrive within a split-millisecond range for all cache hits. Latency increases to a range of tens to hundreds of milliseconds for normal cache misses and reaches its maximum, in the range of seconds, for cache misses which force communication with very slow or broken authoritative servers.

This inherent nonlinearity also implies that the simplest tools from descriptive statistics do not provide informative results.

To deal with this complexity, the fine people from PowerDNS developed a logarithmic percentile histogram which visualizes response latency. It allows us to see things such as:

  • 95 % of queries were answered within 1 ms (cache hits)
  • 99 % of queries were answered within 100 ms (typical cache misses)
  • 99.5 % of queries were answered within 1000 ms (problematic cache misses)

On these charts, the lines that are closer to the bottom left corner of the chart are showing lower latency, which is a better result. Flat lines on the top at the 2000 ms mark show client timeouts.

UDP Traffic

For UDP performance tests, we concentrate traffic we captured on 15 real resolvers on a single test box, pushing BIND to its limits. The first chart shows latency in the first minute of the test, i.e. “cold cache”. This is when the resolver is under highest stress.

UDP Response latency

This chart shows the cold start of BIND:

Chart showing response latency, test time 0 - 60 seconds, UDP traffic, showing BIND 9.20 has significantly lower latency than BIND 9.18.28 and BIND 9.16 EOL version, which had lots of client timeouts.

Version 9.20.0 is not able to answer roughly 3 % of queries within the 2000 ms client timeout - which is a consequence of us pushing the resolver to its limits. Under the same conditions, BIND 9.18.28 was not able to answer 15 % of queries, and the end-of-life version BIND 9.16 could not answer roughly 25 % of queries. In other words, this chart shows a massive improvement in the cold-cache efficiency of BIND 9.20.0.

After the first minute, the cache is already populated with records and becomes “hot”. This changes latency for clients significantly:

Chart showing response latency, test times 60 - 120 s, UDP traffic, showing BIND 9.20 has about the same latency as BIND 9.18.28 over this test period.

In this particular scenario, hot cache latency has not changed significantly between BIND 9.18.28 and BIND 9.20.0, while end-of-life BIND 9.16 struggled to keep up with the load. The wide colored background around the blue line shows large instability in latency across three repeated tests.

UDP CPU utilization

Let’s have a look at CPU load during the first two minutes of the test. We monitor the time BIND processes spent using the CPU as reported by the Linux kernel Control Group version 2 metric usage_usec, and then normalize the value in a way which gives 100 % utilization = 1 fully utilized CPU. Our test machine has 16 cores, so its theoretical maximum is 1600 %. CPU usage is a cumulative metric and we plot a new data point every 0.1 seconds.

Chart showing CPU load, test times 0 - 120 s, for UDP traffic. Showing higher usage in the first 11 seconds for BIND 9.20.0 and slightly lower CPU usage in the remainder of the test, compared to BIND 9.18.28.

Here we can see higher CPU load in the first 11 seconds of the test for BIND 9.20.0, and generally slightly lower CPU usage in the remainder of the test, compared to BIND 9.18.28. Effectively this shows better parallelization of work in BIND 9.20.0, which is how we reached significantly improved response latency in the cold-cache scenario.

UDP Memory usage

Similarly to CPU usage, we use the Linux kernel Control Group version 2 metric memory.current to monitor BIND 9’s memory consumption. It is documented as “the total amount of memory currently being used” and thus includes memory used by the kernel itself to support the named process, as well as network buffers used by BIND. Resolution of the resource monitoring data is 0.1 seconds, but the memory consumption metric is a point-in-time value, so hypothetical memory usage spikes shorter than 0.1 second would not show on our plots.

Chart showing memory consumption, test times 0 - 120 s, for UDP traffic. Showing three times smaller memory usage in the first 11 seconds for BIND 9.20.0 and slightly lower usage in the remainder of the test, compared to BIND 9.18.28. Memory usage for BIND 9.20.0 and 9.18.28 are slowly converging at the right hand side of the test.

In the first 11 seconds of the test, where CPU load is highest, the memory consumption is only at one-third of the usage we saw for versions 9.18.28 and 9.16. This indicates much lower overhead when handling cache-miss traffic. The very narrow colored background around the BIND 9.20.0 line also shows that memory consumption is more predictable than it used to be. Another angle revealed by this chart is that the old BIND 9.16 allocator effectively did not return memory to the operating system at all, and BIND 9.18 still had significant room for improvement in this area.

Over time the memory usage of versions 9.20.0 and 9.18.28 are slowly converging at the right-hand side of the chart, which indicates that per-record cache overhead in 9.20.0 is somewhat larger than it was in 9.18.28.

TCP Traffic

For the TCP performance test, we concentrate traffic we captured on five real resolvers on a single test box and force all clients to use TCP to the resolver. Individual clients keep the TCP connection open for up to 10 seconds after their last query.

TCP Response latency

The following chart shows latency in the first minute of the test, i.e. “cold cache”. During this period the resolver is under highest stress as it must do most DNS resolutions, and at the same time accept more TCP connections from clients than during steady operation.

Chart showing response latency, test time 60 - 120 seconds, TCP traffic, showing BIND 9.20 has significantly lower latency than BIND 9.18.28. BIND 9.16 EOL version timed out almost all queries.

BIND 9.18.28 is able to handle the initial load but still 0.5 % of queries time out. On version 9.20.0 less than 0.2 % of queries time out. Clearly, BIND 9.16 was hopelessly timing out.

After the first minute the cache is already populated with records and becomes “hot”, and returning clients also have a chance to reuse existing TCP connections for subsequent queries. This changes latency for clients significantly:

Chart showing response latency, test time 60 - 120 seconds, TCP traffic, showing BIND 9.20 has significantly lower latency than BIND 9.18.28. BIND 9.16 EOL version timed out almost all queries.

We can see that version 9.18.28 improved latency for roughly 25 % of queries.

TCP CPU utilization

Chart showing CPU load, test times 0 - 120 s, for TCP traffic. Showing higher usage in the first 3 seconds for BIND 9.20.0 and slightly lower CPU usage in the remainder of the test, compared to BIND 9.18.28. BIND 9.16 CPU usage is very high and unustable.

CPU load in the first three seconds of the test is higher for BIND 9.20.0, showing better parallelization of work when lots of TCP connections needs to be accepted at the beginning. For the remainder of the test, version 9.20.0 generally has slightly lower CPU usage compared to BIND 9.18.28. And again, BIND 9.16 shows it’s not up to the task and its CPU usage is very high and unstable through the whole test.

TCP Memory usage

Chart showing memory consumption, test times 0 - 120 s, for TCP traffic. Showing six times smaller memory usage for BIND 9.20.0 compared to BIND 9.18.28.

For TCP-only traffic we can see massive improvement in memory consumption. Version 9.20.0 consumes roughly 6x less memory than version 9.18.28 while handling the same traffic, and it still provides better latency and consumes less CPU time.

DNS-over-TLS Traffic

For the DoT performance test, we concentrate traffic we captured on two real resolvers on a single test box and force all clients to use DNS-over-TLS to the resolver. Individual clients keep the TLS connection open for up to 10 seconds since their last query. This time we skip BIND 9.16 because it does not support DoT at all.

DoT Response latency

The following chart shows latency in the first minute of the test, when the cache is empty or “cold”. During this period the resolver is under highest stress as it must generate queries to resolve most names and at the same time accept more DoT connections from clients than during steady-state operation. The TLS handshake required for DoT is an expensive operation.

Chart showing response latency, test time 0 - 60 seconds, DoT traffic, showing BIND 9.20 has significantly lower latency than BIND 9.18.28.

After the first minute the cache is already populated with records and becomes “hot”. Returning clients have a chance to reuse existing TLS connections for subsequent queries.

Chart showing response latency, test time 60 - 120 seconds, DoT traffic, showing BIND 9.20 has significantly lower latency than BIND 9.18.28.

These two charts show that BIND 9.20.0 provides better latency for roughly 50 % of the queries sent over DoT.

DoT CPU utilization

Chart showing CPU load, test times 0 - 120 s, for DoT traffic. Showing higher usage in the first 3 seconds for BIND 9.20.0 and slightly lower CPU usage in the remainder of the test, compared to BIND 9.18.28.

Except for better parallelization at the beginning, the CPU load of both versions is the same.

DoT Memory usage

Chart showing memory consumption, test times 0 - 120 s, for DoT traffic. Showing three times smaller memory usage for BIND 9.20.0 compared to BIND 9.18.28.

For DoT-only traffic we can again see massive improvement in memory consumption. Version 9.20.0 consumes roughly 3x less memory than version 9.18.28 while handling the same traffic, and also providing better latency.

ISC-provided packages for BIND 9

There has been some confusion about where to find the ISC-provided packages for BIND 9.20. We made a change for the start of the new 9.20 branch, suggested by a discussion on the bind-users mailing list. The goals was to enable everyone to update easily from the package repository, without causing an unexpected upgrade from 9.18 to 9.20. ISC provides packages for Fedora, Ubuntu and Debian labelled as bind, bind-esv (for extended support version) or bind-dev (development version). Since the bind and bind-esv repos already had BIND 9.18 in them, if we had put 9.20 in the bind repo, people updating who intended to just get the maintenance update, would be updated a full major version. This seemed like an unexpected and therefore undesirable outcome.

The new BIND 9.20.0 version is available now in the repos labeled bind-dev. After a release or two, we will move BIND 9.20.x to the bind repo, when we have a new 9.21.x version to post in the bind-dev repository.

References

Recent Posts

What's New from ISC

Previous post: Summer Update from ISC