Hyperfoil

Fetch embedded resources

When browsing a website it’s not only the main page the webserver needs to serve - usually there are static resources such as images, stylesheets, scripts and more. Hyperfoil can automatically download these.

Hyperfoil implements a non-allocating HTML parser, with a pre-baked onEmbeddedResource handler: This selects HTML tags with appropriate attributes that represent an URL:

The other part of the solution is executing a GET request at given URL. Modern browsers don’t download the resources one-by-one, nor do they fetch them all at once - usually these download around 6-8 resources in parallel. In Hyperfoil this is implemented by adding the URL to a queue. The queue delivers each URL to a new instance of a sequence (automatically generated) that fires the HTTP request, and limits the number of these sequences.

Since Hyperfoil pre-allocates all data structures ahead, you need to declare a limit on number of resources that can be fetched. If the queue cannot store more than maxResources URLs it simply discards the other URLs, emitting a warning to the log.

The parser signals the queue when there’s no more data on the producing side, and when all of them are consumed the queue can fire some extra action. Note that completion of the original request does not wait until all the resources are fetched.

Below is an example of a configuration fetching embedded resources:

- fetching:
  - httpRequest:
    GET: /foobar/index.html
    handler:
      body:
        parseHtml:
          onEmbeddedResource:
            fetchResource:

              maxResources: 16
              metric:
              # Drop the query part
              - ([^?]*)(\?.*)? -> $1
              onCompletion:
                set: allFetched <- true

For details please see the reference.

Next: Automatic follow of redirects