Add caching to more functions #198

p12tic · 2018-10-04T05:38:07Z

On cppreference-doc we are processing around 4000 comparatively large pages using premailer. This takes a while with the current implementation. After looking into profiler results and doing some experimental changes it turns out that simply caching results of more functions and increasing the size of the caches speed up premailer by around 2.5 times.

This PR implements the above-mentioned changes in a way that the new behavior can be enabled on runtime explicitly, old-behavior is default and existing users are almost not impacted.

The function_cache decorator is modified to accept the size of the cache as a argument to the wrapper, so that it can be changed on runtime. The size of the caches may now be controlled via additional parameter to the Premailer.__init__ method.

Also, function_cache is modified not to drop the contents of the cache when it overflows: this is the canonical behavior of caches. Small cache may still be very useful for handling the majority of requests and since it already reached the maximum size without inducing out of memory condition, being at the maximum size instead of zero will likely not induce out of memory condition later. If users are concerned about an out of memory condition arising later, when doing other tasks, we should provide a method to clear caches explicitly.

Furthermore, function_cache is modified to convert list arguments to tuples so that they can be hashed. _HashedSeq can be retired this way.

Caching behavior has been added to the top functions that I saw in the profiling results. The cache itself has very low overhead, so we know that at worst case we will only increase memory usage without reducing performance too much.

Finally, python 2.6 and 3.3 have been removed from the travis.yml as the upstream testing tools no longer support these python versions.

Please let me know what do you think about this. Thanks for your time!

coveralls · 2018-10-04T05:43:35Z

Coverage remained the same at 100.0% when pulling 21436d4 on p12tic:more-caching into ebfd310 on peterbe:master.

… tools

peterbe

There are so many good things with this PR and I don't want to entirely block your progress but there is one great option we should consider and that's Python 3's collections.lru_cache. Yes, it means we need a backport for Python 2.

I'm not sure it would work for us right away, unless we do some tricks, like you did, with converting lists to tuples. But functools.lru_cache is wicked smart and highly optimized.

Another very attractive option is to depend on CacheTools. It supports Python 2, has a LFU implementation too if that's applicable to us.

peterbe · 2018-10-04T15:15:40Z

premailer/cache.py

-
-
-def function_cache(expected_max_entries=1000):
+def function_cache():


Now is not the time but the correct name ought to be "function_memoize" or something like that.

peterbe · 2018-10-04T15:21:06Z

premailer/cache.py

-                    cached.cache.clear()
+
+                if max_cache_entries is None or len(cache) < max_cache_entries:
+                    cache[hashed] = result


It's a bit strange that the caching just decides to stubbornly refuse to add more entries. I've already forgotten what you said in the justification in the PR description. Can you please add a code comment here to explain this reasoning.

peterbe · 2018-10-04T15:35:36Z

premailer/premailer.py

                 cache_css_parsing=True,
+                 cache_css_parsing_size=1000,
+                 cache_css_output=True,
+                 cache_css_output_size=1000,


I don't think these need to be here at all. They're operational, not functional. They clutter up the code and obscures the options.

The likelihood of someone preferring to only set it to 500 or 10_000 is extremely rare and I think there are much better ways to do that. Like this:

import os def function_cache(): ...snip... # indicator of cache missed sentinel = object() max_cache_entries = int(os.environ.get('PREMAILER_MAX_CACHE_ENTRIES', 1000)) @functools.wraps(func) def inner(*args, **kwargs): ... return inner

That will make the code a lot simpler. It'll give all the power to your use case too

peterbe · 2018-10-04T15:35:50Z

premailer/premailer.py

+
+        cache_css_output_size: Specifies the size for various CSS output
+            caches. If set to None, the size of the caches will not be limited.
+        '''


peterbe · 2018-10-04T15:37:28Z

premailer/premailer.py

+        cache_css_parsing_size: Specifies the size for various CSS parsing
+            caches. If set to None, the size of the caches will not be limited.
+
+        cache_css_output: Specifies whether to cache the CSS output results.


Unlike the cache_*_size options, this one is actually not operational. Let's keep this. Setting this will potentially have an effect on the generated HTML output.

peterbe · 2018-10-04T15:45:37Z

Actually, the more I think about it, we should consider CacheTools. It's just so convenient and we don't need to do any heavy lifting mess with the difference between Python 2 and 3.

I still think the max cache size should be controlled from a documented environment variable though.

peterbe · 2018-10-08T13:36:15Z

By the way, see #203
Watch out for merge conflicts if you rebase this.

peterbe · 2018-10-09T15:37:26Z

cachetools is awesome. I think we should continue this effort over in #206

peterbe · 2018-11-26T19:12:42Z

#206 landed.

p12tic added 8 commits October 3, 2018 22:54

Refactor function_cache() to allow dynamic configuration of its size

8693d22

Use canonical caching behavior of not clearing cache on overflow

7e8cf5e

Collect the documentation on Premailer class contructor in single place

fc4b60d

Add a way to specify the sizes of premailer caches

ed02d18

Remove function wrapping _cache_parse_css_string and doing nothing

5f83627

Use tuple to allow list arguments to be hashed in function_cache()

a84765d

Cache results of csstext_to_pairs()

44950b1

Cache the results of CSSSelector()

ca9ae7d

p12tic force-pushed the more-caching branch from fbdd3d4 to 970876c Compare October 4, 2018 05:57

p12tic added 2 commits October 3, 2018 23:00

Cache the results of _css_rules_to_string()

0e0dba1

travis: Drop python 2.6 and 3.3 which are no longer supported by test…

21436d4

… tools

p12tic force-pushed the more-caching branch from 970876c to 21436d4 Compare October 4, 2018 06:01

This was referenced Oct 4, 2018

WIP: Preprocess: Use new caching features available in premailer 3.x.y p12tic/cppreference-doc#67

Open

Performance improvements for QCH preprocessing p12tic/cppreference-doc#64

Closed

peterbe reviewed Oct 4, 2018

View reviewed changes

peterbe mentioned this pull request Oct 9, 2018

use cachetools for caching #206

Merged

4 tasks

peterbe closed this Nov 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add caching to more functions #198

Add caching to more functions #198

Uh oh!

p12tic commented Oct 4, 2018 •

edited

Loading

Uh oh!

coveralls commented Oct 4, 2018 •

edited

Loading

Uh oh!

peterbe left a comment

Uh oh!

peterbe Oct 4, 2018

Uh oh!

peterbe Oct 4, 2018

Uh oh!

peterbe Oct 4, 2018

Uh oh!

peterbe Oct 4, 2018

Uh oh!

peterbe Oct 4, 2018

Uh oh!

peterbe commented Oct 4, 2018

Uh oh!

peterbe commented Oct 8, 2018

Uh oh!

peterbe commented Oct 9, 2018

Uh oh!

peterbe commented Nov 26, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants



		def function_cache(expected_max_entries=1000):
		def function_cache():

Add caching to more functions #198

Add caching to more functions #198

Uh oh!

Conversation

p12tic commented Oct 4, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coveralls commented Oct 4, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

peterbe left a comment

Choose a reason for hiding this comment

Uh oh!

peterbe Oct 4, 2018

Choose a reason for hiding this comment

Uh oh!

peterbe Oct 4, 2018

Choose a reason for hiding this comment

Uh oh!

peterbe Oct 4, 2018

Choose a reason for hiding this comment

Uh oh!

peterbe Oct 4, 2018

Choose a reason for hiding this comment

Uh oh!

peterbe Oct 4, 2018

Choose a reason for hiding this comment

Uh oh!

peterbe commented Oct 4, 2018

Uh oh!

peterbe commented Oct 8, 2018

Uh oh!

peterbe commented Oct 9, 2018

Uh oh!

peterbe commented Nov 26, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

p12tic commented Oct 4, 2018 •

edited

Loading

coveralls commented Oct 4, 2018 •

edited

Loading