Monday, May 29, 2017

Re: Should I store offline calculation results in the cache?

Hello,

thanks to everyone who replied. Here are some conclusions of mine:

Today's filebased-cache code seems to be suffering from the same problems it was suffering 7 years ago. Every time you .set() the cache it asks the OS to provide a list of files, just for counting them (for the purpose of culling). This is slow. The culling strategy is to delete a random sample of cache entries. So Russell's comment seems valid today, at least with respect to culling. Of Django's included cache backends, apparently only memcached is suitable for a large cache in production. Redis could be a good idea for adding persistence, but it is non-standard (not included with Django).

Redis is anyway not appropriate for my use case because I don't need the speed, so storing the information in RAM, which has a larger cost than the filesystem, is suboptimal.

The fact that a cache knows how to get the information if it doesn't have it is an interesting observation that I hadn't thought about, but appears to be true for most uses of "cache" that I can think of (it doesn't apply to write caches). Therefore I'm using the cache for a different purpose than the one for which it was designed, which can create all sorts of problems (such as a new administrator—or even an old one—not knowing or forgetting they can't just delete the cache). However I will take my risks and continue using it for a while, as for these two small projects implementing a more complicated solution, or adding another component and thus raising the bar for other people to replace me, isn't worth it.

Antonis Christofides  http://djangodeployment.com

On 2017-05-27 12:25, Antonis Christofides wrote:

Hello all,

I have an application that calculates and tells you whether a specific crop at a specific piece of land needs to be irrigated, and how much. The calculation lasts for a few seconds, so I'm doing it offline with Celery. Every two hours new meteorological data comes in and all the pieces of land are recalculated.

The question is where to store the results of the calculation. I thought that since they are re-creatable, the cache would be the appropriate place. However, there is a difference with the more common use of the cache: they are re-creatable, but they are also necessary. You can't just go and delete any item in the cache. This will cripple the website, which expects to find the calculation results in the cache. Viewing something on the site will never trigger a recalculation (and if I make it trigger, it will be a safety procedure for edge cases and not the normal way of doing things). The results must also survive reboots, so I chose the file-based cache.

I didn't know about culling, so when the pieces of land grew to 100, and the items in the cache to 400 (4 items need to be stored for each piece of land), I spent a few hours trying to find out what the heck is going on. I solved the problem by tweaking the culling parameters. However all this has raised a few issues:

  1. The filesystem cache can't grow too much because of issue 11260, which is marked wontfix. According to Russell Keith-Magee,
"the filesystem cache is intended as an easy way to test caching, not as a serious caching strategy. The default cache size and the cull strategy implemented by the file cache should make that obvious. If you need a cache capable of holding 100000 items, I strongly recommend you look at memcache. If you insist on using the filesystem as a cache, it isn't hard to subclass and extend the existing cache."
If these comments are correct, then the documentation needs some fixing, because not only does in not say that the filesystem cache is not for serious use, but it implies the opposite:
"Without a really compelling reason, ... you should stick to the cache backends included with Django. They've been well-tested and are easy to use."
Is Russell not entirely correct perhaps, or is the documentation? Or am I missing something?
  1. In the end, is it a bad idea to use the cache for this particular case? I also have a similar use case in an unrelated app: a page that needs about a minute to render. Although I've implemented a quick-and-dirty solution of increasing the web server's timeout and caching the page, I guess the correct way would be to produce that page offline with Celery or so. Where would I store such a page if not in the cache?
--   Antonis Christofides  http://djangodeployment.com
--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/a5a8d1ab-f4e0-a6b5-b1da-acc9dc2dbf9d%40djangodeployment.com.
For more options, visit https://groups.google.com/d/optout.

No comments:

Post a Comment