Quite a while ago we posted our architecture in bullet points. Of course a lot has changed since 2011: our traffic figures have increased five-fold, making us one of the most popular sites in Greece and we started our expansion to two more countries. Our development team now counts 30 developers compared to 7 back in 2011. Naturally our systems team and our infrastructure have grown accordingly and in diverse ways. This post will try to give a broad idea of what our infrastructure looks like today and how we manage it, setting the ground for a series of more specific follow-up posts.
So, let’s start with the human factor first. What started with a part-time system administrator ten years ago, evolved to today’s 7-member systems team responsible for keeping skroutz.gr up and running around the clock. Site reliability apart, our team’s responsibilities include tools development, performance monitoring, optimization and infrastructure scaling. Needless to say, we also provide resources and support to our developers.
Two of the team members have been with Skroutz in one way or another since its early beginnings, even before the company was founded ten years ago. A third one was acting as the lone sysadmin for 3 demanding years until a 3-person team was brought together in 2012. Since then the team has grown steadily and now includes a dedicated system tools developer and an office IT administrator.
Tinkerers by nature, we are all Free Software enthusiasts and try to contribute as much as possible back to the community (trivia: the team includes a Debian Developer and a Debian Maintainer). Despite the company being a Ruby shop, we mostly use Python for our team’s needs.
Our production server farm takes up two full-height racks, hosted at the MedNautilus IDC facility in Athens, Greece. With a total of 48 physical servers, we use mostly 1U servers, together with 3 mini-blade chassis holding 8 low-spec blade servers each. A third rack is coming, hosted at another location and serving as our hot-backup infrastructure.
We use bare metal for our most stressed machines; this includes our MariaDB servers, skroutz.gr application servers, MongoDB servers and caches. The rest (including our Elasticsearch cluster, alve.com application servers and OpenStack Swift nodes) run as KVM instances on our 24-node Ganeti cluster.
We use HAProxy for SSL termination, load balancing and request routing. HAProxy’s high performance and stability, coupled with detailed statistics and configurable logging make it a perfect fit for this role.
Back in 2011 our main Rails app ran on Passenger. Nowadays it runs on Unicorn sitting behind an nginx instance on every application server. Unicorn’s graceful restart capability greatly helps our continuous deployment strategy and its out-of-band GC implementation keeps our HTTP responses fast and our clients happy. skroutz.gr itself uses 5 bare-metal application servers, while alve.com runs on two virtual machines.
We use MariaDB as our main RDBMS with a simple master/slave setup leveraging MariaDB’s semi-synchronous replication. Failover and slave promotion is performed using MHA with minimal downtime. Our combined on-disk database size is roughly 60 GB.
There are a couple of additional read-only slaves: one for obtaining backups and another one used to create cheap read/write database replicas for our staging environment using LVM thin snapshots.
We use MongoDB to store all kinds of interesting data: recommended products, category click-through rates, billing information, historical product prices and trends etc. Our MongoDB dataset is far bigger than our relational data, weighing in at 400 GB and constantly growing. The MongoDB replica set currently includes 3 hardware nodes.
Our search function is backed by a 6-node Elasticsearch cluster running on virtual machines. Elasticsearch has been the most important improvement in our stack since 2011: it is fast, scales easily and our developers love it. The application rarely hits the database anymore, which makes us even happier. There are also a couple smaller special-purpose Elasticsearch clusters storing user profile data and sharing product information across the three (country) flavors of our application.
In 2011 we were using a handful of OpenVZ containers on a single host, mostly for internal services. Today we run a 24-node Ganeti cluster with more than 160 KVM instances, hosting production, staging and internal services. Ganeti offers a great degree of flexibility and some unique features that make it stand out for production usage: its integrated support for DRBD allows the creation of highly-available virtual machines where application-level redundancy is not an option; live migrations allow us to do zero-downtime hardware maintenance and even helped us move datacenters with minimal downtime - twice. Instance placement constraints also help maximize availability, by ensuring that our Elasticsearch nodes reside on separate physical machines for example. We love Ganeti so much that we co-organized and hosted the first Ganeti Users & Developers meeting.
Apart from KVM, we also use containers (via LXC) for more ephemeral needs, like spawning one-off services for the unit test harness.
The operating system
We run Debian on all of our infrastructure, from routers and servers to laptops and dashboard Rasberry Pi’s. Our core infrastructure runs Debian Wheezy, while new machines are mostly being set up with Debian Jessie. We chose Debian for its quality, security support and software availability and we have been happy every single day with it.
We do host our own internal repositories (using reprepro) for backports, locally modified packages and what is not already provided in Debian proper. We try to package as much as possible of what we use and encourage team members to contribute to Debian while at work.
Automation is the key to maintain control of the infrastructure and Debian
helps a lot by providing packages with sane defaults, respecting
local changes and providing knobs (like alternatives,
APT pinning) to tweak the whole system in a safe and predictable way. All
servers are managed using Puppet with a central puppet
master providing the configuration. We keep our Puppet tree in a git repository
and use Phabricator’s Differential to review
changes. A buildbot instance is integrated with
arcanist to automatically run
unit tests and test catalog compilation for every Puppet changeset under
Back in 2011 our servers were hosted on a shared LAN. Today we run our own Autonomous System and manage the whole network stack, from switches to BGP peerings with upstream providers.
A pair of low-power 1U servers running Debian, equipped with 6 GbE NICs, 4 CPU cores and 8 GB RAM each serve as our active-backup router cluster. These machines handle the BGP peerings with our two upstream providers and also provide stateful firewalling at the edge of our infrastructure. We use BIRD as a BGP and OSPF speaker. High-availability is provided by Keepalived and conntrackd, making sure our internet connectivity remains uninterrupted in the event of a failure. A pmacct-based BGP-enabled NetFlow collector helps maintain good network visibility and monitor traffic patterns closely.
Our headquarters is connected with the production site via a 100 Mbps Metro Ethernet line, letting us act as our own ISP. The whole network (including our HQ) is dual-stack, with IPv6 considered production-quality.