Thursday, March 24, 2016

UnicodeEncodeError: 'ascii' codec can't encode characters in position,,, on handling serialized post data

I have been puzzled for some time now on a UnicodeEncodeError exception that i get when unicode characters are getting posted to the server.

After testing several options i can work around the exception but it does not feel right so i thought to post the issue + the solution that seems to work for me here.

On this website a form gets posted to the server through an ajax call. Basically the form is serialized using jquery ($(".form").serialize();) and then send to the server as one parameter. Since the whole form is posted as one parameter, the standard decoding on .GET/.POST won't help me but return the (still) urlencoded form data. So i need to decode it myself which i do as follows:

from urllib import unquote
return QueryDict(unquote(request.POST["f"])))

Now the constructor of QD raises an exception only if there are unicoded characters in the form. As an example one field can be "Kroati%C3%AB"

  File "/env/python/local/lib/python2.7/site-packages/django/http/request.py", line 357, in __init__      value = value.decode(encoding)    File "/env/python/lib/python2.7/encodings/utf_8.py", line 16, in decode      return codecs.utf_8_decode(input, errors, True)  

Now i found that the input query-string to the QD is actually a unicode string (u'....') and it's encoding is utf-8. And i think that is where the problem lies, because utf-8 encoded can be expressed in a 8-bit string (str(...)) iso a unicode(...).

And i can solve this exception by casting the input to str().

Simplified in a shell session:

>>> unquote(u'Kroati%C3%AB').decode('utf-8')
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/home/pbor/djangoVillaSearch/site/env/python/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 6-7: ordinal not in range(128)
>>> unquote('Kroati%C3%AB').decode('utf-8')
u'Kroati\xeb'

Does this make sense?

Paul

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/b5e5f605-682e-4887-800a-4dbf6f007d64%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

No comments:

Post a Comment