Tastypie caching kinda sucks
I'm using tastypie on Google's App Engine, which is preventing me from using Varnish, as suggested on the tastypie caching docs.
Tastypie's caching mechanism is based on Django's caching mechanism, which is mostly designed for hosting web pages rather than APIs. Django uses a single cache timeout for both memcache and HTTP caching of views, so the two are tied together when using Django's cache middleware.
As an example, of tastypie/django interaction:
Django's caching middleware looks for the cache-control: max-age=X header to cache a view. Tastypie's NoCache() class doesn''t specify a max-age, so the default for the Django installation is used, this means that the serialized form of the query will be cached by Django in memcache, as well as on the browser/proxies via HTTP caching.
As a default setting for APIs, this kinda sucks because you can't see the latest updates until the cache expires.
Tastypie's SimpleCache() is also based on cache timeouts, and there's no functionality for invalidating caches.
Tastypie has some support for etags. This is beneficial for saving bandwidth by sending 304 Not Modified responses. However, it still requires fetching from the database and serializing the results to compare against the etag hash, so there's still the latency of processing the request and the server processing time,.
In an API world, it seems to me that the preferred behavior would be:
1. show the latest data as soon as possible. this generally means minimizing HTTP caching.
2. serve requests from memcache if possible to minimize latency and database load. this generally means maximizing the time requests live in memcache.
3. As a corollary to #1, #2, agressively invalidate items in memcache when resources change. This allows setting a huge timeout in memcache, yet also allows showing recent results.
4. use etags to minimize bandwidth, but combine with 2 to minimize hits to the database.
Invalidation is probably the most difficult and crucial step. Tastypie doesn't cache queries because it's too hard to predict when resources need to be invalidated (but then SimpleCache doesn't do any invalidating at all, even for resource fetches).
I've found that with my mobile/web e-commerce app, the types of queries that are run are very predictable because I control the front end.
For example, any id__in query could have an infinite number of possible queries:
/api/v1/carcolors/?id__in=[1,2,3]
/api/v1/carcolors/?id__in=[2,1,3]
/api/v1/carcolors/?id__in=3,2,1
However, because I control the frontend, I know that I'd only use the first query.
Also, since I have a non-infinite number of cars, the number of color sets that I'd query are rather limited.
So when I update a carcolor resource, I know I have a limited number if id__in queries that MUST be updated, so I can manually invalidate those.
In this case, I must manually write the invalidate code for each resource class, but it's generally fairly limited, and if the boilerplate tools are available for invalidation, it's pretty easy.
Here's an example Resource class.
class SimpleNoHTTPCache(SimpleCache): """ Tastypie's SimpleCache, with HTTP caching disabled and invalidate exposed """ def cache_control(self): return { 'no_cache': True, 'max_age': 0, } def invalidate(self, key): return cache.incr_version(key) class OptionResource(CachedEtagJsonResource): class Meta(EatPublicMetaBase): queryset = Option.objects.filter(deleted=False) filtering = { 'id' : ['in'], 'choices' : ['in'] } excludes = ['deleted', 'undelete'] cache = SimpleNoHTTPCache(timeout=settings.THIRTY_DAYS, varies=[]) def get_list(self, request, **kwargs): if (len(request.GET) == 1 and request.GET.get("id__in")): # This represents the cached version return super(OptionResource, self)._get_list(request, **kwargs) else: # Grandparent is uncached return JsonModelResource.get_list(self, request, **kwargs) def invalidate(self, **kwargs): optionid = kwargs.get("optionid") menuitems = kwargs.get("menuitems") invalidated_strings = {} if optionid: # Invalidate cache. tmpstr = "/api/v1/option/?id__in=[" + str(optionid) + "]" self._invalidate_cache('serializedlist', **{"id__in": [unicode("[%s]"%optionid)]}) self._invalidate_cache('serializedlist', **{"id__in": [unicode(optionid)]}) self._invalidate_cache('serializeddetails', **{"pk":optionid}) invalidate_tasty_cache(tmpstr, 'etag') invalidated_strings[tmpstr] = True if not menuitems: menuitems = MenuItem.objects.filter(options=optionid, deleted=False) for menuitem in menuitems: tmpstr = "/api/v1/option/?id__in=[" + ",".join([str(i) for i in menuitem.options]) + "]" if not tmpstr in invalidated_strings: self._invalidate_cache('serializedlist', **{"id__in":[u"[%s]" % ",".join([str(i) for i in menuitem.options])]}) invalidate_tasty_cache(tmpstr, 'etag') invalidated_strings[tmpstr] = True
- CachedEtagJsonResource is my base class that generally handles all the caching of serialized data and etags. It's just slightly modified from the basic tastypie ModelResource class. It derives from JsonModelResource, which is a standard tastypie ModelResource class with the format forced to json.
- Each resource class needs to define get_list() to specify which queries should be cached. Other queries pass through and are uncached.
- The SimpleNoHTTPCache() is similar
- My API isn't fully restlike, and my resources are modified from other calls, so I expose an invalidate() call on the resource. Some logic would be needed in the POST/PUT/UPDATE handlers to call invalidate from there if you're using that to update your resource.
If you've read all the way here, I'd love feedback.