Django talk: Multiprocess Queryset memory consumption increase (DEBUG=False and using iterator)

Monday, May 28, 2012

Multiprocess Queryset memory consumption increase (DEBUG=False and using iterator)

I am stumped. I am trying process a lot of data (2 million records and
up) and once I have a QuerySet, I immediately feed it to a
queryset_iterator that fetched results in chunks of 1000 rows each. I
use MySQL and the DB server is on another machine (so I don't think it
is MySQL caching).

I kick off 4 sub-processes using multiprocessing.Process, but even if
I keep it to 1, eventually, I will run out of memory.

RAM usage just steadily seems to increase. When I finish processing a
resultset, I would allocate a new resultset to the variable and I
would expect gc to get my memory back.

Any ideas?

def run(self):
.....

messages=queryset_iterator(MessageDAO.get_all_messages_for_date(now))
self.process_messages(job,messages)
messages=None
gc.collect()
...

def queryset_iterator(queryset, chunksize=1000):
pk = 0
last_pk = queryset.order_by('-pk')[0].pk
queryset = queryset.order_by('pk')
while pk < last_pk:
for row in queryset.filter(pk__gt=pk)[:chunksize]:
pk = row.pk
yield row
gc.collect()

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to django-users+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-users?hl=en.

Django talk

Monday, May 28, 2012

Multiprocess Queryset memory consumption increase (DEBUG=False and using iterator)

No comments:

Post a Comment

Followers

Blog Archive

About Me