-
Notifications
You must be signed in to change notification settings - Fork 187
Add caching to more functions #198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are so many good things with this PR and I don't want to entirely block your progress but there is one great option we should consider and that's Python 3's collections.lru_cache. Yes, it means we need a backport for Python 2.
I'm not sure it would work for us right away, unless we do some tricks, like you did, with converting lists to tuples. But functools.lru_cache is wicked smart and highly optimized.
Another very attractive option is to depend on CacheTools. It supports Python 2, has a LFU implementation too if that's applicable to us.
|
|
||
|
|
||
| def function_cache(expected_max_entries=1000): | ||
| def function_cache(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now is not the time but the correct name ought to be "function_memoize" or something like that.
| cached.cache.clear() | ||
|
|
||
| if max_cache_entries is None or len(cache) < max_cache_entries: | ||
| cache[hashed] = result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit strange that the caching just decides to stubbornly refuse to add more entries. I've already forgotten what you said in the justification in the PR description. Can you please add a code comment here to explain this reasoning.
| cache_css_parsing=True, | ||
| cache_css_parsing_size=1000, | ||
| cache_css_output=True, | ||
| cache_css_output_size=1000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these need to be here at all. They're operational, not functional. They clutter up the code and obscures the options.
The likelihood of someone preferring to only set it to 500 or 10_000 is extremely rare and I think there are much better ways to do that. Like this:
import os
def function_cache():
...snip...
# indicator of cache missed
sentinel = object()
max_cache_entries = int(os.environ.get('PREMAILER_MAX_CACHE_ENTRIES', 1000))
@functools.wraps(func)
def inner(*args, **kwargs):
...
return innerThat will make the code a lot simpler. It'll give all the power to your use case too
| cache_css_output_size: Specifies the size for various CSS output | ||
| caches. If set to None, the size of the caches will not be limited. | ||
| ''' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AWESOME!!
| cache_css_parsing_size: Specifies the size for various CSS parsing | ||
| caches. If set to None, the size of the caches will not be limited. | ||
| cache_css_output: Specifies whether to cache the CSS output results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unlike the cache_*_size options, this one is actually not operational. Let's keep this. Setting this will potentially have an effect on the generated HTML output.
|
Actually, the more I think about it, we should consider CacheTools. It's just so convenient and we don't need to do any heavy lifting mess with the difference between Python 2 and 3. I still think the max cache size should be controlled from a documented environment variable though. |
|
By the way, see #203 |
|
cachetools is awesome. I think we should continue this effort over in #206 |
|
#206 landed. |
On cppreference-doc we are processing around 4000 comparatively large pages using premailer. This takes a while with the current implementation. After looking into profiler results and doing some experimental changes it turns out that simply caching results of more functions and increasing the size of the caches speed up premailer by around 2.5 times.
This PR implements the above-mentioned changes in a way that the new behavior can be enabled on runtime explicitly, old-behavior is default and existing users are almost not impacted.
The
function_cachedecorator is modified to accept the size of the cache as a argument to the wrapper, so that it can be changed on runtime. The size of the caches may now be controlled via additional parameter to thePremailer.__init__method.Also,
function_cacheis modified not to drop the contents of the cache when it overflows: this is the canonical behavior of caches. Small cache may still be very useful for handling the majority of requests and since it already reached the maximum size without inducing out of memory condition, being at the maximum size instead of zero will likely not induce out of memory condition later. If users are concerned about an out of memory condition arising later, when doing other tasks, we should provide a method to clear caches explicitly.Furthermore,
function_cacheis modified to convertlistarguments totuplesso that they can be hashed._HashedSeqcan be retired this way.Caching behavior has been added to the top functions that I saw in the profiling results. The cache itself has very low overhead, so we know that at worst case we will only increase memory usage without reducing performance too much.
Finally, python 2.6 and 3.3 have been removed from the travis.yml as the upstream testing tools no longer support these python versions.
Please let me know what do you think about this. Thanks for your time!