Thursday, December 1, 2011

Re: Bulk import of data

would interfacing with SQL via C or C++ be faster to parse and load
data in bulk? I have files that are only a few MB worth of text, but
can take hours to load due to the amount of parsing I do, and the
number of database entries each item in a file makes

On Mon, Nov 28, 2011 at 3:28 AM, Anler Hernandez Peral
<anler86@gmail.com> wrote:
> Hi, this is probably not your case, but in case it is, here is my story:
> Creating a script for import CSV files is the best solution as long as they
> are few, but in my case, the problem was that I need to import nearly 40
> VERY BIG CSV files, each one mapping a database table, and I needed to do it
> quickly. I thought that the best way was to use MySQL's "load data in
> local..." functionality since it works very fast and I could create only one
> function to import all the files. The problem was that my CSV files were
> pretty big and my database server were eating big amounts of memory and
> crashing my site so I ended up slicing each file in smaller chunks.
> Again, this is a very specific need, but in case you find yourself in such
> situation, here's my base code from which you can extend ;)
>
> https://gist.github.com/1dc28cd496d52ad67b29
> --
> anler
>
>
> On Sun, Nov 27, 2011 at 7:56 PM, Andre Terra <andreterra@gmail.com> wrote:
>>
>> This should be run asynchronously (i.e. celery) when importing large
>> files.
>> If you have a lot of categories/subcategories, you will need to bulk
>> insert them instead of looping through the data and just using
>> get_or_create. A single, long transaction will definitely bring great
>> improvements to speed.
>> One tool is DSE, which I've mentioned before.
>> Good luck!
>>
>> Cheers,
>> AT
>>
>> On Sat, Nov 26, 2011 at 8:44 PM, Petr Přikryl <prikryl@atlas.cz> wrote:
>>>
>>> >>> import csv
>>> >>> data = csv.reader(open('/path/to/csv', 'r'), delimiter=';')
>>> >>> for row in data:
>>> >>> category = Category.objects.get_or_create(name=row[0])
>>> >>> sub_category = SubCategory.objects.get_or_create(name=row[1],
>>> >>> defaults={'parent_category': category})
>>> >>> product = Product.objects.get_or_create(name=row[2],
>>> >>> defaults={'sub_category': sub_category})
>>>
>>> There are few potential problems with the cvs as used here.
>>>
>>> Firstly, the file should be opened in binary mode.  In Unix-based
>>> systems, the binary mode is technically similar to text mode.
>>> However, you may once observe problems when you move
>>> the code to another environment (Windows).
>>>
>>> Secondly, the opened file should always be closed -- especially
>>> when building application (web) that may run for a long time.
>>> You can do it like this:
>>>
>>> ...
>>> f = open('/path/to/csv', 'rb')
>>> data = csv.reader(f, delimiter=';')
>>> for ...
>>> ...
>>> f.close()
>>>
>>> Or you can use the new Python construct "with".
>>>
>>> P.
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "Django users" group.
>>> To post to this group, send email to django-users@googlegroups.com.
>>> To unsubscribe from this group, send email to
>>> django-users+unsubscribe@googlegroups.com.
>>> For more options, visit this group at
>>> http://groups.google.com/group/django-users?hl=en.
>>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Django users" group.
>> To post to this group, send email to django-users@googlegroups.com.
>> To unsubscribe from this group, send email to
>> django-users+unsubscribe@googlegroups.com.
>> For more options, visit this group at
>> http://groups.google.com/group/django-users?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To post to this group, send email to django-users@googlegroups.com.
> To unsubscribe from this group, send email to
> django-users+unsubscribe@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/django-users?hl=en.
>

--
Nathan McCorkle
Rochester Institute of Technology
College of Science, Biotechnology/Bioinformatics

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to django-users+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-users?hl=en.

No comments:

Post a Comment