Thursday, June 23, 2011

Ann: DSE v.3.0.0 Beta #1

For the impatient:

http://pypi.python.org/pypi/dse/3.0.0.Beta-1
Source at https://bitbucket.org/weholt/dse2/src
Modified BSD license.

New in the 3.x version of DSE is the bulk_update-method, more
intuitive syntax and code clean up.
NB! The new syntax is not backwards compatible so existing code using
DSE must be updated to work.

New syntax:

with Person.delayed as d:
d.insert({'name': 'Thomas', 'age': 36, 'sex': 'M'})
d.update({'id': 1, 'name': 'John'})
d.delete(10) # Deletes record with id 10

I hope the syntax is more intuitive and easy to read. Comments wanted.

Bulk update It takes a dictionary of values to update, requires a
value for the primary key/id of the record, but uses the django orm's
own update method
instead of plain sql to reduce number of statements to execute. This
is helpful when your fields can have a limited set of values, like
EXIF-data from photos or metadata from mp3s.

An example::

with Photo.delayed as d:
d.update({'id': 1, 'camera_model': 'Nikon', 'fnumber': 2.8,
'iso_speed': 200})
d.update({'id': 2, 'camera_model': 'Nikon', 'fnumber': 11,
'iso_speed': 400})
d.update({'id': 3, 'camera_model': 'Nikon', 'fnumber': 2.8,
'iso_speed': 400})
d.update({'id': 4, 'camera_model': 'Canon', 'fnumber': 3.5,
'iso_speed': 200})
d.update({'id': 5, 'camera_model': 'Canon', 'fnumber': 11,
'iso_speed': 800})
d.update({'id': 6, 'camera_model': 'Pentax', 'fnumber': 11,
'iso_speed': 800})
d.update({'id': 7, 'camera_model': 'Sony', 'fnumber': 3.5,
'iso_speed': 1600})
# and then some thousand more lines like that

Internally DSE will construct a structure like this::

bulk_updates = {
'camera_model': {
'Nikon': [1,2,3],
'Canon': [4,5],
'Pentax': [6],
'Sony': [7],
},
'fnumber': {
2.8: [1,3],
11: [2,5,6],
3.5: [4,7],
},
'iso_speed': {
200: [1,4],
400: [2,3],
800: [5,6],
1600: [7]
}
}

And then execute those statements using::

# pk = the primary key field for the model, in most cases id
for field, values in bulk_updates.iteritems():
for value, ids in values.iteritems():
model.objects.filter(**{"%s__in" % pk:
ids}).update(**{field: value})

For huge datasets where the fields can have limited values this has a
big impact on performance. So when to use
update or bulk_update depends on the data you want to process. For
instance importing a contact list where most
of the fields had almost unique values would benefit from the
update-method, but importing data from photos, id3-tags
from your music collection etc would process much faster using bulk_update.

Thanks to Cal Leeming [Simplicity Media Ltd] for inspiration on this one :-)

--
Mvh/Best regards,
Thomas Weholt
http://www.weholt.org

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to django-users+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-users?hl=en.

No comments:

Post a Comment