Thursday, September 21, 2017

Re: utf-8' codec can't decode byte 0x93 in position 31 even though i use read().decode('utf-8') when i read a UploadedFile i get from a file selector in a View in django

Hi,

in addition to what James said, it is unusual to write f.read().decode('utf-8'). Usually the result of f.read() is already decoded (unless you opened the file as a binary file, but you would normally not do that for a file containing text). It depends on how you opened the file, and the mechanics are different in Python 2 and Python 3.

BTW, I've been writing a series of posts on encodings, starting from https://djangodeployment.com/2017/06/19/encodings-part-1/. I'm interested in feedback.

Regards,

Antonis

Antonis Christofides  http://djangodeployment.com
On 2017-09-22 02:07, fábio andrews rocha marques wrote:
I have a template called cadastrarnovomaterial.html that is a page with a textfield and a file selection button. The user should select a .csv file with the file selector. It is like this:
<meta charset="utf-8"/>
<h1>Cadastrar um novo material</h1>
<b>'Cadastre um novo material que pode ser usado para criar provas'</b>
<form action="{% url 'terminarcadastronovomaterial' %}" method="post" enctype="multipart/form-data" class="form-horizontal" charset='UTF-8'>
{% csrf_token %} 
<label for="nome">Nome: </label>
{% if nomematerialcadastrar %}
<input id="nomematerial" type="text" name="nomematerial" value={{ nomematerialcadastrar }}>
{% else %}
<input id="nomematerial" type="text" name="nomematerial" value=""/>
{% endif %}
<div class="form-group">
    <label for="arquivocsv" class="col-md-3 col-sm-3 col-xs-12 control-label">Arquivo csv: </label>
    <div class="col-md-8">
        <input type="file" name="arquivocsv" id="arquivocsv" required="True" class="form-control">
    </div>                    
</div>
<input type="submit" value="Cadastrar novo material"/>
</form>

Then, i have the View associated with the url from the action of the above form 'terminarcadastronovomaterial'. It basically tries to first check if the .csv file is really a csv, it opens the file and starts to read it line by line:
def terminarcadastronovomaterial(request):
if request.method == 'POST':
nomematerial = request.POST['nomematerial']
arquivo = request.FILES.get('arquivocsv')
if arquivo.name.endswith('.csv'):
file_data = arquivo.read().decode('utf-8')
lines = file_data.split("\n")
for line in lines:
print("nova linha")
fields = line.split(";")
for field in fields:
if ',' in field:
distratores = field.split(",")
print("distratores")
for distrator in distratores:
print(distrator)
print("fim distratores")
else:
print(field) 

This code is working very well for non utf-8 characters inside de .csv. The problem is with utf-8 characters. In my .csv, i have a table and in this table there's utf-8 characters like "ônibus" and じてんしゃ. The thing is, when i run my django app and i try to read my csv, django is returning this message utf-8' codec can't decode byte 0x93 in position 31(which is "ônibus" in csv).
I've tried to search and search how to use utf-8 csv reading on python, but it only leads to using open file without a file selector screen(which is the case for django). How do i solve my problem? is the file.read().decode('utf-8') correct? Should i use another thing?

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/2838894a-3a6b-4712-a53a-6b30ba53474f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

No comments:

Post a Comment