If it's only a few MB, I see little reason to go as far as to writing it in C. Unless you are performing the same import tens of thousands of times, and the overhead in Python adds up so much that you get problems.
But, quite frankly, you'll max out MySQL INSERT performance before you max out Pythons performance lol - as long as you don't use the ORM for inserts :)
Cal
-- On Fri, Dec 2, 2011 at 5:21 AM, Nathan McCorkle <nmz787@gmail.com> wrote:
would interfacing with SQL via C or C++ be faster to parse and load
data in bulk? I have files that are only a few MB worth of text, but
can take hours to load due to the amount of parsing I do, and the
number of database entries each item in a file makes
On Mon, Nov 28, 2011 at 3:28 AM, Anler Hernandez Peral
<anler86@gmail.com> wrote:
> Hi, this is probably not your case, but in case it is, here is my story:
> Creating a script for import CSV files is the best solution as long as they
> are few, but in my case, the problem was that I need to import nearly 40
> VERY BIG CSV files, each one mapping a database table, and I needed to do it
> quickly. I thought that the best way was to use MySQL's "load data in
> local..." functionality since it works very fast and I could create only one
> function to import all the files. The problem was that my CSV files were
> pretty big and my database server were eating big amounts of memory and
> crashing my site so I ended up slicing each file in smaller chunks.
> Again, this is a very specific need, but in case you find yourself in such
> situation, here's my base code from which you can extend ;)
>
> https://gist.github.com/1dc28cd496d52ad67b29
> --
> anler
>
>
> On Sun, Nov 27, 2011 at 7:56 PM, Andre Terra <andreterra@gmail.com> wrote:
>>
>> This should be run asynchronously (i.e. celery) when importing large
>> files.
>> If you have a lot of categories/subcategories, you will need to bulk
>> insert them instead of looping through the data and just using
>> get_or_create. A single, long transaction will definitely bring great
>> improvements to speed.
>> One tool is DSE, which I've mentioned before.
>> Good luck!
>>
>> Cheers,
>> AT
>>
>> On Sat, Nov 26, 2011 at 8:44 PM, Petr Přikryl <prikryl@atlas.cz> wrote:
>>>
>>> >>> import csv
>>> >>> data = csv.reader(open('/path/to/csv', 'r'), delimiter=';')
>>> >>> for row in data:
>>> >>> category = Category.objects.get_or_create(name=row[0])
>>> >>> sub_category = SubCategory.objects.get_or_create(name=row[1],
>>> >>> defaults={'parent_category': category})
>>> >>> product = Product.objects.get_or_create(name=row[2],
>>> >>> defaults={'sub_category': sub_category})
>>>
>>> There are few potential problems with the cvs as used here.
>>>
>>> Firstly, the file should be opened in binary mode. In Unix-based
>>> systems, the binary mode is technically similar to text mode.
>>> However, you may once observe problems when you move
>>> the code to another environment (Windows).
>>>
>>> Secondly, the opened file should always be closed -- especially
>>> when building application (web) that may run for a long time.
>>> You can do it like this:
>>>
>>> ...
>>> f = open('/path/to/csv', 'rb')
>>> data = csv.reader(f, delimiter=';')
>>> for ...
>>> ...
>>> f.close()
>>>
>>> Or you can use the new Python construct "with".
>>>
>>> P.
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "Django users" group.
>>> To post to this group, send email to django-users@googlegroups.com.
>>> To unsubscribe from this group, send email to
>>> django-users+unsubscribe@googlegroups.com.
>>> For more options, visit this group at
>>> http://groups.google.com/group/django-users?hl=en.
>>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Django users" group.
>> To post to this group, send email to django-users@googlegroups.com.
>> To unsubscribe from this group, send email to
>> django-users+unsubscribe@googlegroups.com.
>> For more options, visit this group at
>> http://groups.google.com/group/django-users?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To post to this group, send email to django-users@googlegroups.com.
> To unsubscribe from this group, send email to
> django-users+unsubscribe@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/django-users?hl=en.
>
--
Nathan McCorkle
Rochester Institute of Technology
College of Science, Biotechnology/Bioinformatics
--
You received this message because you are subscribed to the Google Groups "Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to django-users+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-users?hl=en.
You received this message because you are subscribed to the Google Groups "Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to django-users+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-users?hl=en.
No comments:
Post a Comment