Skip to content

Varnish cache tags with a big number of entities  #3168

@bastnic

Description

@bastnic

On a biiig api platform website with lots and lots of nested entities, we need varnish to be fully working.

I talked with @alanpoulain last night and we concurred that I don't have an usual setup but the experience must be shared as it can helps some people.
I already have fixed all my issues on my side, but cool if we can fix the official release.

There is multiples troubles:

  • too much entities means too much iris, iris are quite long so with more than one hundred iris it crashes because Cache-Tags headers are too long. More on that later.
  • too much entities updated means too much iris to clean on VarnishPurger, it crashes and do not clean anything.
  • we also had Generate iri for child related resources #2905 but it's fixed now.

Too much cache tags to add on the response

On the biggest response i found, the Cache-Tags header weight 100k chars. It's quite long.
I tried multiples approaches:

  • increase (a lot) varnish headers (hi http_resp_hdr_len, http_resp_size): doesn't scale as much as I want
  • chunk the headers, as it's allowed by http (hi http_max_hdr): it's seems to work at first but in fact when the purge happens, it only use the Cache tags of the first header line. Maybe a bug on my side
  • on a collection, just strip all the tags of the items of that collection as when we do an operation on a resource, the iri of the collection is also given. It reduces a LOT!
  • BUT, it only removes the iris of the resource itself. My entities are quite nested so I also have as much iris as different subsresources. So much that I can almost said that this is a collection itself.

Brace yourself, awful code:

// in AddTagsListener
$posibleCollections = [];

        // first, attempt to get all iri's prefix to check if collections emerged
        foreach ($resources as $resource) {
            if (preg_match('#(.*)/.*-.*-.*.*$#', $resource, $matches)) {
                if (!isset($posibleCollections[$matches[1]])) {
                    $posibleCollections[$matches[1]] = 0;
                }
                ++$posibleCollections[$matches[1]];
            }
        }

        // extract all collections (more than XX iris of the same type)
        $posibleCollections = array_filter($posibleCollections, function ($count, $collection) {
            return $count > self::MAX_IRIS_TO_BE_CONSIDERED_AS_COLLECTION; // magic number
        }, ARRAY_FILTER_USE_BOTH);

        // then remove the corresponding iris
        $resources = array_filter($resources, function ($item) use ($posibleCollections) {
            foreach ($posibleCollections as $collection => $count) {
                if (strpos($item, $collection) === 0) {
                    return false;
                }
            }

            return true;
        });

        // add the collection
        foreach ($posibleCollections as $collection => $count) {
            $resources[$collection] = $collection;
        }

with this patch, I' finish with only 4-5 collections on some big list, and I'm ok with that.

One issue I may have is that I'm using ORMBehaviors\Translatable and when I update a translation only and nothing on the entity, the entity itself is not updated (only the translation), but the translation is not an api platform resource. In PurgeHttpCacheListener, the main entity is seen as a "relationTag" and so the collection iri is not added to the list of iri to purge. I will fix that with another doctrine listener that update the main entity updated at field and so the entity will be seen as updated.

Too much cache tags to add on the response

When I insert a lot of entities at once, the purge cannot works as the regexp is waaaaaaaaay too long.

Multiple approaches too:

  • chunk the iris, and make XX BAN calls
  • if more than XX (magic number) iris, considers it's a big flush and wipe everything. That's the current approach I use (mainly because i'm lazy)

The first approach can add some safety and fix #1856.

WDYT?

cc @teohhanhui

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions