This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Overview

Generic overview on the Hyperfoil tool

There’s plenty of web benchmark tools around and it might be hard to pick one, and investing time into not-established tools is risky. Hyperfoil was not created for the pure joy of coding but as a solution to set of problems which could not be solved all by single other existing tool.

Free software

Free software allows you to take your benchmark and publish it for everyone to verify. With proprietary licenses that wouldn’t be so easy.

Hyperfoil is licensed under Apache License 2.0.

Distribution

Generating load with single node stops scaling at certain point and you need to orchestrate the benchmark across a cluster of nodes. Simple command-line tools usually ignore this completely (you’re supposed to start them, gather and merge the data in a bash scripts). Other tools use open-core model with the clustering part being a paid/proprietary option. There are frameworks that have clustering built in as well.

Hyperfoil uses leader-follower model with Vert.x Event bus as the clustering middleware. While running from single VM is possible (and quite easy) as well, the design is distributed by default.

Accuracy

The point of web benchmark is usually finding out what happens when your system is accessed by thousands of concurrent users - each doing a page load every few seconds. However many traditional load drivers simplify the scenario to few dozen of virtual users (VUs) that execute one request after another or with very short delays in between - this is referred to as the Closed System Model (as the set of VUs is finite). This discrepancy leads to problem known as coordinated omission and results in significantly skewed latency results and pathological conditions (queues overflow…) not being triggered.

Hyperfoil embraces Open System Model by default - virtual users are completely independent until it runs out of resources, recording that situation in consequence. Hyperfoil runs a state-machine for each VU and all requests are executed asynchronously.

Versatility

While you can design your benchmark to just hit single endpoint (URL) with desired rate this is likely not what the users would be doing. Randomizing some parts of the query or looping through a list of endpoint might be better but the resulting scenario might be still too simplified.

Hyperfoil introduces a DSL expressed in YAML with which you can sketch the scenario in a fine detail, including timing, concurrent requests, processing responses and so forth. We’re not trying to invent a new programming language, though, so if the DSL gets too complex you can write the logic in Java or any other JVM language.


If you’re eager to try out Hyperfoil go to the first quickstart. Otherwise let’s have a deeper look into the terms and concepts.

1 - Concepts

Hyperfoil key terms and concepts

This document explains some core terms used throughout the documentation.

Controller and agents

While it is possible to run benchmarks directly from CLI, in its nature Hyperfoil is a distributed tool with leader-follower architecture. Controller has the leader role; this is a Vert.x-based server with REST API. When a benchmark is started controller deploys agents (according to the benchmark definition), pushes the benchmark definition to these agents and orchestrates benchmark phases. Agents execute the benchmark, periodically sending statistics to the controller. This way the controller can combine and evaluate statistics from all agents on the fly. When the benchmark is completed all agents terminate.

All communication between the controller and agents happens over Vert.x eventbus - therefore it is independent on the deployment type.

Phases

Conceptually the benchmark consists of several phases. Phases can run independently of each other; these simulate certain load executed by a group of users (e.g. visitors vs. admins). Within one phase all users execute the same scenario (e.g. logging into the system, selling all their stock and then logging off).

Phases are also using for scaling the load during the benchmark; when looking for maximum throughput you schedule several iterations of given phase, gradually increasing the number of users that run the scenario.

A phase can be in one of these states:

  • not running (scheduled)
  • running
  • finished: users won’t start new scenarios but we’ll let already-started users complete the scenario
  • terminated: all users are done, all stats are collected and no further requests will be made

The state of phase on every agent is managed by Controller; this is also the finest grained unit of work it understands (controller has no information about state of each user).

Sessions

The state of each user’s scenario is saved in the session; sometimes we speak about (re)starting sessions instead of starting new users. Hyperfoil tries to keep allocations during benchmark as low as possible and therefore it pre-allocates all memory for the scenario execution ahead. This is why all resources the benchmark uses are capped - it needs to know the sizes of pools.

It is also necessary to know how many sessions we should preallocate - maximum concurrency of the system. If this threshold is exceeded Hyperfoil allocates further session as needed, but this is not the optimal mode of operation. It means that either you’ve underestimated the resources need or you’ve put a load on the system that it can’t handle anymore, requests are not being completed and scenarios are not finished - which means that session objects cannot be recycled for reuse by next user.

Scenario

Scenario consists of one or more sequences that are composed of steps. Steps are similar to statements in programming language and sequences are an equivalent of blocks of code.

While most of the time the scenario will consist of sequential operations as the user is not multi-tasking, the browser (or other system you’re simulating) actually executes some operations in parallel - e.g. during page load it loads images concurrently. Therefore at any time the session contains one or more active sequence instances; when all sequence instances are done, the session has finished and can be recycled for a new user. Most of the time the scenario will start with only one active instance and as it progresses, it might create instances of other sequences (e.g after evaluating a condition it creates a sequence instance according to the branching logic).

2 - FAQ

Frequently Asked Questions

Why is Hyperfoil written in Java?

People are often concerned about JVM performance or predictability. While nowadays JVM is very good in the sense of throughput, dealing with jitter can be challenging. We are Java engineers, though, and we believe that these issues can be mitigated with a right design. That’s why we try to be very careful on the execution hot-path.

We could achieve even better properties with C/C++, but the development effectivity would suffer. We could be succesful in Go, but we’re not as intimate with its internals. Other languages and frameworks would pose its own challenges. So far, the choice of Java was not found to be a limiting factor.

Why are you inventing your own YAML-based DSL instead of using Javascipt/Lua/…?

While some people might be more comfortable with describing their complex scenarios in a familiar language, running a script execution engine would have impact on performance and could put us out of control. We are not trying to invent a new language, written in YAML structure. Instead we propose a set of components common to many scenarios that could be recombined as it suits you. If you ever feel that the YAML is becoming cumbersome, try to move your complex benchmark logic to Java code and use it that way, instead.

I just want to know what is the maximum throughput!

Maximum throughput is a single number which makes comparison very easy. Was my code change for better or worse? However finding the sweet spot is not as simple as throwing in few hundred concurrent threads running one request after another and taking the readings. With too high concurrency you can get worse results due to contention and longer queues, so you need to try different concurrency levels anyway.

There’s nothing wrong with this type of test as long as you know what you’re doing, and that the response latencies might be far off reality. It’s actually very good test when you look only for regressions - and Hyperfoil supports that, too.

Hyperfoil is so hard to set up, I’ll just use …

Some tools can be run from the shell, with everything set just through options and arguments. That is quite handy for a quick test - and if the tool is sufficient for the job, use it. Or you can try runnning the same through Hyperfoil - e.g. for wrk/wrk2 we offer a facade (CLI command wrk or bin/wrk.sh) that creates a benchmark with the same behaviour but you also get all the detailed results as from any other run - see the migration guide. Once that your requirements outgrow what’s possible in these simple tools, you can embrace the full power of benchmark composition.

What does that ‘Exceeded session limit’ error mean?

With open-model phase types (constantRate, increasingRate, decreasingRate) the concurrency should not be limited. However as Hyperfoil tries not to allocate any memory during the benchmark we need to reserve space ahead for all sessions that could run concurrently - we call this the session limit. By default this limit is equal to number of users per second (assuming that the scenario won’t take more than 1 second).

When you get the ‘Exceeded session limit’ error this means that some of the requests took a long time (or you have delays as part of the scenario) and Hyperfoil ran out of session pool. In that case you can change the limit using maxSessions property on the phase to the expected maximum concurrency. E.g. if you expect that the scenario will take 3 seconds and you’re running at usersPerSec: 100 you should set maxSessions: 300 (or rather more to give it a buffer for unexpected jitter).

If increasing the limit doesn’t help it usually means that the load at the tested system is too high and the responses are not arriving as fast as you fire the requests. In that case you should lower the load.