Thursday, November 3, 2011

Re: Caching at model class level

On Thu, Nov 3, 2011 at 2:22 PM, Thomas Guettler <hv@tbz-pariv.de> wrote:
> Hi,
>
> I try to reduce the number of db-queries in my app.
>
> There is a model which changes almost never. It is like a "type of ticket"
> in a trouble ticket system.
>
> On one page there are seven SQL-Queries (SELECT .... FROM ticket_type where id=123) which of course always return
> the same result.
>
> I want to cache the objects:
>
> t1=TicketType.objects.get(id=123)
> t2=TicketType.objects.get(id=123)
>
> t1 and t2 should be the identical python object, not just objects containing the same data.
>
> Has someone done this before?
>
> Here is my first version:
>
> class ThreadLocalQueryset(models.query.QuerySet):
>    _threadlocal=threading.local()
>    _threadlocal.cache=dict() # The cache is at class-level. It survives a request.
>    def get(self, **kwargs):
>        kwargs_tuple=tuple(kwargs.items())
>        obj=self._threadlocal.cache.get(kwargs_tuple)
>        if obj is not None:
>            return obj
>        obj=models.query.QuerySet.get(self, **kwargs)
>        self._threadlocal.cache[kwargs_tuple]=obj
>        return obj
>
> class ThreadLocalManager(models.Manager):
>    use_for_related_fields = True
>    def get_query_set(self):
>        return ThreadLocalQueryset(self.model, using=self._db)
>
> class TicketType(models.Model):
>    objects=ThreadLocalManager()
>
> If there would be many TicketTypes, the interpreter would use more and more memory, but there are few.
>
> Feedback welcome,
>  Thomas Güttler
>

Hey

I wouldn't use this approach - caching in threadlocals is the wrong
place to cache. Primarily, this is because the cache is in the wrong
location - it is per thread, which means you must have n caches for n
threads. This is an inefficient use of memory, it would be better to
cache at a different level, eg memcached, redis, etc.

Secondly, this caching is very naïve. Queries are never evicted from
the cache, and so any updates to the underlying table will not be
reflected in versions that are cached. Different threads could then
cache different results for the same query, and you will end up with
heisenbugs - where the behaviour changes the closer you examine it.

You can fix all these issues by using a real cache engine like
memcached, set appropriate timeouts etc. Items retrieved from the
cache will be consistent, even amongst different threads. You can use
the query from the queryset as the key.

Cheers

Tom

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to django-users+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-users?hl=en.

No comments:

Post a Comment