I am stumped. I am trying process a lot of data (2 million records and
up) and once I have a QuerySet, I immediately feed it to a
queryset_iterator that fetched results in chunks of 1000 rows each. I
use MySQL and the DB server is on another machine (so I don't think it
is MySQL caching).
I kick off 4 sub-processes using multiprocessing.Process, but even if
I keep it to 1, eventually, I will run out of memory.
RAM usage just steadily seems to increase. When I finish processing a
resultset, I would allocate a new resultset to the variable and I
would expect gc to get my memory back.
Any ideas?
def run(self):
.....
messages=queryset_iterator(MessageDAO.get_all_messages_for_date(now))
self.process_messages(job,messages)
messages=None
gc.collect()
...
def queryset_iterator(queryset, chunksize=1000):
pk = 0
last_pk = queryset.order_by('-pk')[0].pk
queryset = queryset.order_by('pk')
while pk < last_pk:
for row in queryset.filter(pk__gt=pk)[:chunksize]:
pk = row.pk
yield row
gc.collect()
--
You received this message because you are subscribed to the Google Groups "Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to django-users+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-users?hl=en.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment