On May 28, 5:42 pm, pc <pie...@musmato.com> wrote:
> I am stumped. I am trying process a lot of data (2 million records and
> up) and once I have a QuerySet, I immediately feed it to a
> queryset_iterator that fetched results in chunks of 1000 rows each. I
> use MySQL and the DB server is on another machine (so I don't think it
> is MySQL caching).
>
> I kick off 4 sub-processes using multiprocessing.Process, but even if
> I keep it to 1, eventually, I will run out of memory.
>
> RAM usage just steadily seems to increase. When I finish processing a
> resultset, I would allocate a new resultset to the variable and I
> would expect gc to get my memory back.
>
> Any ideas?
>
> def run(self):
> .....
>
> messages=queryset_iterator(MessageDAO.get_all_messages_for_date(now))
> self.process_messages(job,messages)
> messages=None
> gc.collect()
> ...
>
> def queryset_iterator(queryset, chunksize=1000):
> pk = 0
> last_pk = queryset.order_by('-pk')[0].pk
> queryset = queryset.order_by('pk')
> while pk < last_pk:
> for row in queryset.filter(pk__gt=pk)[:chunksize]:
> pk = row.pk
> yield row
> gc.collect()
Are you sure the leak is not in process_messages?
I tested something similar on PostgreSQL, and the queryset_iterator
doesn't seem to leak memory:
def queryset_iterator(queryset, chunksize=100):
pk = 0
last_pk = queryset.order_by('-pk')[0].pk
queryset = queryset.order_by('pk')
while pk < last_pk:
print len(connection.queries)
for row in queryset.filter(pk__gt=pk)[:chunksize]:
pk = row.pk
yield row
gc.collect()
for i in queryset_iterator(TestModel.objects.all()):
print memory()
where memory() is from http://stackoverflow.com/questions/938733/python-total-memory-used
The result seems stable, and don't indicate any memory leak.
TestModel contains 100000 objects.
- Anssi
--
You received this message because you are subscribed to the Google Groups "Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to django-users+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-users?hl=en.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment