Caching

When you code a dynamic application, you will soon face its trade-off: it is dynamic.

Each time a user does a request, your server makes all sorts of calculations – database queries, template rendering and so on – to create the final response. For most web applications, this is not a big deal, but when your application starts becoming big and highly visited you will want to limit the overhead on your machines.

That's where caching comes in.

The main idea behind cache is simple: we store the result of an expensive calculation somewhere to avoid repeating the calculation if we can. But, sincerely speaking, designing a good caching scheme is mainly a PITA, since it involves many complex evaluations about what you should store, where to store it, and so on.

So how can weppy help you with this? It provides some tools out of the box that let you focus your development energy on what to cache and not on how you should do that.

Configuring cache

The caching system in weppy consist of a single class named Cache. Consequentially, the first step in configuring cache in your application is to create an instance of this cache in your application:

from weppy.cache import Cache

cache = Cache()

By default, weppy stores your cached content into the RAM of your machine, but you can also use the disk or redis as your storage system. Let's see these three handlers in detail.

RAM cache

As we just saw, this is the default cache mechanism of weppy. Initializing a Cache instance without arguments would be the same of using the RamCache handler:

from weppy.cache import Cache, RamCache

cache = Cache(ram=RamCache())

The RamCache also accepts some parameters you might take advantage of:

parameter default value description
prefix allows to specify a common prefix for caching keys
threshold 500 set a maximum number of objects stored in the cache
default_expire 300 set a default expiration (in seconds) for stored objects

Note on multi-processing: When you store data in RAM cache, you are actually using the python process' memory. If you're running your web application using multiple processes/workers, every process will have its own cache and the data you store wont be available to the other ones.
If you need to have a shared cache between processes, you should use the disk or redis ones.

Disk cache

The disk cache is actually slower than the RAM or the redis ones, but if you need to cache large amounts of data, it fits the role perfectly. Here is how to use it:

from weppy.cache import Cache, DiskCache

cache = Cache(disk=DiskCache())

The DiskCache class accepts some parameters too:

parameter default value description
cache_dir 'cache' allows to specify the directory in which data will be stored
threshold 500 set a maximum number of objects stored in the cache
default_expire 300 set a default expiration (in seconds) for stored objects

Redis Cache

Redis is quite a good system for caching: is really fast – really – and if you're running your application with several workers, your data will be shared between your processes. To use it, you just initialize the Cache class with the RedisCache handler:

from weppy.cache import Cache, RedisCache

cache = Cache(redis=RedisCache(host='localhost', port=6379))

As we saw with the other handlers, RedisCache class accepts some parameters too:

parameter default value description
host 'localhost' the host of the redis backend
port 6379 the port of the redis backend
db 0 the database number to use on the redis backend
prefix 'cache:' allows to specify a common prefix for caching keys
default_expire 300 set a default expiration (in seconds) for stored objects

Using multiple systems together

As you probably supposed, you can use multiple caching system together. Let's say you want to use the three systems we just described. You can do it simply:

from weppy.cache import Cache, RamCache, DiskCache, RedisCache

cache = Cache(
    ram=RamCache(),
    disk=DiskCache(),
    redis=RedisCache()
)

You can also tells to weppy what handler should be used when not specified, thanks to the default parameter:

cache = Cache(m=RamCache(), r=RedisCache(), default='r')

Basic usage

The quickier usage of cache is to just apply it on a simple action, such as a select on the database or a computation. Let's say, for example, that you have a blog and a certain function that exposes the last ten posts:

@app.route("/last")
def last():
    rows = Post.all().select(orderby=~Post.date, limitby=(0, 10))
    return dict(posts=rows)

Now, since the performance bottleneck here is the call to the database, you can limit the overhead by caching the select result for 30 seconds, so you decrease the number of calls to your database:

@app.route("/last")
def last():
    def _get():
        return Post.all().select(orderby=~Post.date, limitby=(0, 10))
    return dict(posts=cache('last_posts', _get, 30))

Here's how it works: you encapsulate the action you want to cache into a function, and then call your cache instance with a key, the function, and the amount of time in seconds you want to store the result of your function. weppy will take care of the rest.

– OK, dude. What if I have multiple handlers? where does weppy store the result?
you can choose that

As we saw before, by default weppy stores your cached content into the handler chosen as default. But you can choose on which handler you want to store data:

cache = Cache(
    ram=RamCache(),
    disk=DiskCache(),
    redis=RedisCache(),
    default='ram'
)
v_ram = cache('my_key', my_f, my_time)
v_ram = cache.ram('my_key', my_f, my_time)
v_disk = cache.disk('my_key', my_f, my_time)
v_redis = cache.redis('my_key', my_f, my_time)

Decorating methods

New in version 1.2

weppy's cache can also be used as a decorator. For example, we can rewrite the above example as follows:

@cache(duration=30)
def last_posts():
    return Post.all().select(orderby=~Post.date, limitby=(0, 10))

@app.route("/last")
def last():
    return dict(posts=last_posts())

and the result would be the same. The notation, in the case you want to specify the handler to use, is the same:

# use redis handler
@cache.redis()
# use ram handler
@cache.ram()

When using the decorator notation, weppy will use the arguments you pass to the decorated method to build different results. This means that if we decorate a method that accepts arguments like:

@cache()
def cached_method(a, b, c='foo', d='bar'):
    # some code

then weppy will cache different contents in case you call cached_method(1, 2, c='a') and cached_method(1, 3, c='b').

Caching routes

New in version 1.2

Sometimes you would need to cache an entire response from your application. weppy provides the Cache.response decorator for that. Let's rewrite the example we used above: this time, instead of caching just the database selection, we will cache the entire page that weppy will produce from our route:

@app.route("/last")
@cache.response()
def last():
    posts = Post.all().select(orderby=~Post.date, limitby=(0, 10))
    return dict(posts=posts)

The main difference from the above examples is that, in case of available cached content, everything that would happened inside your route and template code won't be executed; instead, weppy will return the final response body and its headers from the ones available in the cache.

Note: this means that also nothing contained in the pipe, on_pipe_success and on_pipe_failure methods of the pipes in your route pipeline won't be executed. In case you need execution of code on cached routes you should use the open and close methods of the pipes.

Mind that weppy will cache only contents on GET and HEAD requests that returns a 200 response code. This is intended to avoid unwanted cached mechanism on your application.

The Cache.response method accepts also some parameters you might want to use:

parameter default value description
duration 'default' the duration (in seconds) the cached content should be considered valid
query_params True tells weppy to consider the request's query parameters to generate different cached contents
language True tells weppy to consider the clients language to generate different cached contents
hostname False tells weppy to consider the path hostname to generate different cached contents
headers [] an additional list of headers weppy should use to generate different cached contents

Caching entire modules

In some cases, you might need to cache all the routes contained in an application module. In order to achieve this, you can use the cache parameter when you define your module:

mod = app.module(__name__, 'mymodule', cache=cache.response())

Low-level cache API

Changed in version 1.2

As we saw in the sections above, the common usage of cache is to call the Cache instance with a callable object that will produce the cached contents in case they are not available in the cache.

In all the cases you need to perform operations on the cache dirrently, you can use the exposed methods of the Cache instance and its handlers. Let's see them in detail.

Get contents

Every time you need to access contents from cache, you can use the get method:

value = cache.get('key')

If no contents are available, this method will return None.

Set contents

When you need to manually set contents in cache, you can use the set method:

cache.set('key', 'value', duration=300)

Note: if you want to store the result of a callable object, you should invoke it yourself.

You can implement a manual check-and-set policy using get and set methods:

value = cache.get('key')
if not value:
    value = 'somevalue'
    cache.set('key', value, duration=300)

Get or set

The last example can be written in a compact way using the get_or_set method:

value = cache.get_or_set('key', 'somevalue', duration=300)

Note: as we saw for the set method, if you want to store the result of a callable object, you should invoke it yourself.

Clearing contents

Whenever you need to manually delete contents from cache, you can use the clear method:

cache.clear('key')

And if you need to clear the entire cache you can invoke the clear method without arguments.

Note: on redis, a key containing * will mean clearing all the existing keys with that pattern. So calling cache.clear('user*') will delete all the contents for keys starting with user.