At Skroutz, debugging issues in production as soon as possible is crucial to minimizing downtime and ensuring a smooth experience for our users.
Traditionally, investigating incidents or bugs in HTTP requests involved reproducing the issue in the development environment by adding breakpoints (e.g binding.pry
).
However, this method can be time-consuming and challenging, especially when dealing with:
- Sub-sampled or Anonymized Data – Our development database snapshot contains sub-sampled & anonymized data.
- Time-sensitive Bugs – Today’s data doesn’t exist in our development snapshot.
- Missing Datastores – Our development snapshot lacks certain datastores (like some huge ones for logs of user actions).
- Service Accessibility – Various internal/external services are often inaccessible from development by default and settings/code need to be modified to access them.
- Environment Slowness – Slow
bundle install
, long rails server boot times, asset compilation, and rails server reload delays can slow the process in the development environment.
To make debugging faster, especially for incidents where downtime or broken site functionality is critical, we now have the capability to set breakpoints directly in production HTTP requests.
Setting Breakpoints in Production
With this new approach, we can place breakpoints at any filepath:lineno
within the production environment – no need to reproduce bugs locally.
At first, we set breakpoints in a production Rails console:
bp 'app/controllers/skus_controller.rb:210'
bp 'app/controllers/skus_controller.rb:313'
Replaying Production HTTP Requests
Then we append the new developer-only ?replay=true
request parameter to the problematic URL in our web browser:
skroutz.gr/s/25272119/Apple-iPhone-12-5G-4GB-128GB-Prasino.html?replay=true
Instead of receiving the usual HTML response, we get a unique identifier for the request:
37268f1231947a3b8d4c57bdb264c3a007c2bb26abf2c47b0a0f35427dea4078
Finally, we replay the request in the production rails console:
app.replay('37268f1231947a3b8d4c57bdb264c3a007c2bb26abf2c47b0a0f35427dea4078')
The execution halts at the previously-specified breakpoints, from there on we can step through code with the usual debugger commands (step, next, continue etc) allowing state inspection within the production path.
Managing Breakpoints
We list existing breakpoints:
bp
We remove a specific breakpoint (by index):
rm_bp 0
We remove all breakpoints:
rm_bp
Breakpoints in Production Rails Console Explained
This functionality leverages the ruby/debug gem, previously only accessible from debugger mode (via binding.break
). We created a custom interface that calls ruby/debug
internal methods, enabling breakpoint management from the main level of the Rails console.
HTTP Request Replay Explained
By default, the Rails console provides app.get() and app.post() for running HTTP requests inline. However, these methods are not convenient enough for running requests for personalized pages that require developer’s HTTP headers & cookies.
Previously, passing the headers & cookies manually as arguments to these methods was required but obviously that is too slow & discouraging.
Now, with a custom Rack middleware that is activated when ?replay=true
is present:
- HTTP Headers & Cookies are automatically persisted encrypted in our Redis for a short period of time.
- HTTP Requests are quickly replayed as they happened in the browser using the persisted headers & cookies by running
app.replay(<request_id>)
from a Rails console.
Key Benefits
-
Lightning-fast debugging sessions: No need to spend a lot of time & mental capacity to reproduce a bug in the development environment.
-
Real production state: Inspecting exactly what caused the bug in the live environment.
-
Improved efficiency: Debugging complex issues that depend on external services or real-time data is now easier & faster than ever.
Many thanks to @bill-kolokithas & @iridakos for contributing!