Intro
At Skroutz we operate a wide variety of services comprising the ecosystem behind Skroutz.gr, a comparison shopping engine which evolved to an e-commerce marketplace. We run these services on our own infrastructure, bare metal servers and virtual machines. All hosts are running Debian GNU/Linux, which on July 6th 2019 had its latest stable release, called Buster. Buster came with lots of changes in included packages, as expected in a major release.
We started experimenting with dist-upgraded Buster hosts a couple of months before the official release, as soon as Buster got in “freeze” state. This strategy would give us a taste of what to expect with the new software versions and how to get better prepared to smoothly upgrade the operating system underneath our services with minimum disruption.
The problem
The issue we’re going to discuss in this post manifests pretty simply: after dist-upgrading a virtual machine to Buster and rebooting it, it took a couple of minutes before we could actually regain access via ssh. Virtual machine reboots are part of routine maintenance work to keep our services up-to-date and secure. When orchestrating such works across a fleet of hundred hosts, we certainly would like to avoid spending minutes before verifying that each host did come back up and healthy.
Investigation
It’s widely known that virtual machines do not enjoy the privilege of high quality randomness as the physical hosts do, since a virtual machine’s devices are emulated by design, thus do not feature unpredictable behavior, a useful ingredient for randomness 1 2 3.
Various references, e.g. Debian bug reports 4 5, suggested that this
behavior was to be attributed to OpenSSL and how it gathers entropy via the
getrandom()
system call. But all these online references were not descriptive
enough or conclusive, so we opted for digging deeper and understand the issue.
Kernel ring buffer displays important information coming from the kernelspace and it’s the first place we looked at. Consider this snippet from a Buster VM that just booted:
Three important points stand out:
-
before anything else it’s the kernel entry point which requests randomness with
get_random_bytes()
kernel function. We will explain its behavior and usage below. -
systemd (userspace) is also requesting randomness while bringing up system’s services
-
crng init
(crng stands for cryptographic random number generator) takes almost 2 minutes since boot
kernel’s get_random_bytes()
get_random_bytes()
is an in-kernel interface to provide random bytes. In our
case, it is called from kernel’s entry point 6 if CONFIG_STACKPROTECTOR
is set, which is true for kernels packaged in Debian. That message is printed
if CONFIG_WARN_ALL_UNSEEDED_RANDOM
is not set (again true for Debian) to
inform us that we don’t have a fully seeded CRNG. In case you’re curious, these
numbers are required for GCC’s “stack-protector” feature. When a function gets
called, a random number is placed on the stack, just before the return address.
This number is called “canary” and is validated by the kernel after returning.
If an attacker performs a stack-based buffer overflow, the canary value will be
overwritten. The kernel will detect this attack and throw a kernel panic 7.
A quick look into the kernel codebase shows us that it is unlikely that the boot process will actually block here, rather we have a clear indication that kernel’s CRNG is not properly initialized and we’ll see how that affects userspace processes that depend on that.
systemd ssh.service
Following lines in dmesg show that systemd has started as well and it actually reads bytes from urandom, albeit uninitialized.
systemd allows us to print a tree of the time-critical chain of systemd units (including services) as well as the time spend for each one to be started. This is done via:
It’s clear that ssh service takes somewhat longer than usual to get up. Its journal reads:
It seems that ssh.service gets stuck in its ExecStartPre
command:
sshd -t
just checks the validity of configuration files and sanity of keys.
So, why is it blocking? To get an insight on why ExecStartPre
times out, we
decided to enrich it like this:
We basically wrap the sshd
invocation with strace
and instruct it to keep
aggregate time statistics about each system call made by the executable. Our
intention is to identify the system call sshd is spending most of its time at
before finally get killed by systemd.
After rebooting the VM we got our sshd strace logfiles:
This is the output of the first attempt (which gets killed by systemd):
It’s self-evident that sshd spends the whole time trying to acquire randomness
via getrandom()
system call.
The second systemd attempt to get sshd up actually succeeds with the strace log reading:
Notice that the second attempt succeeds (12:49:10) exactly at the same time
getrandom()
returns a result, which coincides exactly with the timestamp the
kernel’s entropy pool gets initialized:
Quick sidenote: We were curious about why sshd is calling getrandom()
even if
its just validating its configuration. A quick look at sshd’s source code,
shows that it seeds its RNG during startup, even if its just validating its
configuration:
seed_rng()
is invoking RAND_status()
, an OpenSSL library function which,
finally, executes getrandom()
.
Changes for getrandom()
system call
So we’ve identified that ssh.service
blocks waiting for getrandom()
syscall.
Then our focus shifted to understanding why/when getrandom()
blocks and how is
that related with the kernel’s CRNG.
First, it’s clear that whether getrandom()
will read from /dev/urandom
or
/dev/random
and whether will it block or not is controlled by the relevant
flags: GRND_RANDOM
and GRND_NONBLOCK
(check getrandom(2)
for more). A
quick search showed that neither OpenSSH nor OpenSSL (which OpenSSH relies on
for cryptography) do not set any of these flags, meaning getrandom()
will
have its default behavior: will block until the kernel’s CRNG is ready.
If these flags are not set then either the system call or the CRNG did change in the meantime. And this meant digging into kernel source code and git history… :D Debian Stretch features kernels from the 4.9.x linux-stable tree while Debian Buster features kernels from the 4.19.x series.
Pondering over the output of git log -p v4.9..v4.19 -- drivers/char/random.c
is really an enjoyful activity but we’ll spare you the time and directly point
you to commit
43838a23a05fbd13e47
by Theodore Ts’o. This commit is entitled random: fix crng_ready() test
and
was introduced in linux 4.17 as a response to multiple
security
issues
reported by Google’s Project Zero. It basically changes the crng_ready()
function to be more strict about when linux’s CRNG is safe for cryptographic
use cases:
But how does this commit affect getrandom()
syscall? The following block is
getrandom’s definition from linux v4.9.144 (just a kernel version in a Stretch
host), ie before random: fix crng_ready() test
was applied.
Upon early boot, getrandom()
would treat crng_init == 1
good enough and
would just return urandom_read
, so would not block. This was was not
considered “secure” enough. After applying random: fix crng_ready() test
getrandom()
behavior changed: it would block (unless called with
GRND_NONBLOCK
) until CRNG was really cryptographically ready, i.e.
crng_init == 2
.
Resolution
As soon as we pinpointed the reason ssh (and other userspace software) could
block early on boot when calling getrandom()
we urged to evaluate possible
solutions. Our goal is to assist the virtual machine to get “good enough”
entropy early on when booting. Providing QEMU guests with quality entropy is
not a novel issue, rather it’s a recurring one when one needs to operate a
cryptographically intensive application within a virtual machine.
We discarded the option of running a userspace daemon, such as HAVEGED inside every VM. Currently, as far we are concerned, there are no practical attacks against HAVEGED, but it has received a lot of criticism for low-quality entropy, state leaking, etc 8 9. Also, from a infrastructure perspective, we aim to provide anything that’s necessary to VMs, without having to perform modifications inside the guests. Users should be able to use our virtualization infrastructure without having to modify images, due to an “unwanted” side-effect on the host’s kernel.
Instead, we’d prefer a cleaner approach and turned our attention to VirtIO RNG
10. VirtIO RNG is a paravirtualized device for QEMU, that exposes a hardware
RNG inside the guest. Enabling it for QEMU instances basically allows physical
hosts to inject randomness in virtual guests by exposing a special purpose
device, /dev/hwrng
. VirtIO RNG is configurable and can be wired up on the
host to retrieve entropy from various sources, such as /dev/{,u}random
or
even a hardware RNG. The downside of this solution for us was that it was not
immediately available in our virtual machines cluster manager, Ganeti. Such a
missing feature can also be seen as a contribution opportunity though! So Nikos
got down to implement what was missing for the KVM hypervisor in Ganeti 11.
In the meantime another possible solution emerged: RDRAND
. This is a x86 CPU
instruction, available on modern Intel (Ivy Bridge and later) and AMD
processors, that returns random numbers as supplied by the hardware’s
cryptographically secure pseudorandom number generator 12. In other words one
may trust the physical CPU to fetch “cryptographically secure” numbers. Using
RDRAND
is possible under certain conditions, which we luckily met:
-
physical host’s CPUs has to support this instruction. In our case, all bare metal servers consisting our Ganeti cluster did actually feature modern enough Intel CPUs.
-
Linux kernel has to use randomness provided by the CPU. Indeed, this functionality has been made available in Linux v4.19 by Theodore Ts’o and has been enabled in Debian since
debian/4.19.20-1~9
.
Apart from RDRAND
, new Intel x86 CPUs expose yet another instruction, called
RDSEED
. RDSEED
returns numbers of “seed-grade entropy”, the output
of a true RNG that should be used by software seeding a pseudo-RNG. This would
provide even better quality of entropy to our hosts, together with a possible
speed gain. Unfortunately, not all hosts in our fleet support this instruction,
so we dismissed the idea.
Finally, we were able to expose RDRAND
CPU flag to all our guests by simply
modifying ganeti cluster’s KVM cpu_type
hypervisor parameter like so:
This allowed Buster guests to properly initialize their kernel CRNG and all
subsequent calls to getrandom()
did not block.
Trusting the CPU to provide “cryptographically secure” random numbers may raise
some concerns, given that hardware vendors have been found to compromise their
products’ security/integrity when pressured or instructed by high-power,
high-influence institutions. ^_^ This is even highlighted by Theodore Ts’o in
the respective aforementioned commit. Our decision to use the RDRAND
instruction and trust the CPU was preceded by weighing various related
parameters: we already trust the CPU for all the things, being the dominant,
followed by the fact that Debian has enabled that by default.
Conclusion
-
RDRAND
andRDSEED
helps the kernel quickly initialize its CRNG, inducing non-blocking calls togetrandom()
, thus no lag during boot.RDRAND
provides an acceptable seed for randomness, not necessarily a high quality entropy flow. This should be acceptable for most applications/cases where a pseudo-random generator likeurandom
is sufficient. -
VirtIO RNG also solves the CRNG early boot starvation issue.
-
VirtIO RNG is the way to go when the guest machines needs high-quality (and probably high volume of) entropy.
-
VirtIO RNG support was not available for Ganeti at the time of our investigation, but we did work for such a feature. We thereby judged
RDRAND
was an acceptable short-term solution and went for it.
If you have any questions, ideas, thoughts or considerations, feel free to leave a comment below.
Links
-
https://www.bsi.bund.de/SharedDocs/Downloads/DE/BSI/Publikationen/Studien/ZufallinVMS/Randomness-in-VMs.pdf ↩
-
https://blogs.gentoo.org/marecki/2018/01/23/randomness-in-virtual-machines/ ↩
-
https://elixir.bootlin.com/linux/v4.9.144/source/drivers/char/random.c#L52 ↩
-
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=910504 ↩
-
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=912087 ↩
-
https://0xax.gitbooks.io/linux-insides/content/Initialization/linux-initialization-4.html ↩
-
https://lwn.net/Articles/584225/ ↩
-
http://www.diva-portal.org/smash/get/diva2:1141835/FULLTEXT01.pdf ↩
-
https://lwn.net/Articles/525459/ ↩
-
https://wiki.qemu.org/Features/VirtIORNG ↩
-
https://github.com/nkorb/ganeti/commits/feature/virtio-rng ↩
-
https://software.intel.com/en-us/blogs/2012/11/17/the-difference-between-rdrand-and-rdseed ↩