Monday, September 25, 2017

Re: Best option to check and/or convert encoding for csv files

On 26/09/2017 1:53 AM, Fellipe Henrique wrote:
> Thanks Mike,
>
> But my problem Ithink is more deep...
>
> I use pyexcel to try to open a CSV file, to work with that on my
> software...
>
> I receive these error everytime: 'utf-8' codec can't decode byte 0xa1
> in position 14: invalid start byte
> my code is abouve:

Fellipe

My heartfelt sympathy. Surely there is a robot somewhere which can fix
coding and decoding so us poor humans don't have to think about it. That
is the perfect use-case for AI in Python's standard library. Anyone
listening? Has anyone done it already?

Recently someone here posted
https://djangodeployment.com/2017/06/19/encodings-part-1/ and in
particular http://www.i18nqa.com/debug/utf8-debug.html

I think the right approach is to recognise that the problem is in
reality and you have to cope with it. I have never properly dealt with
it myself. Just writing this makes me realise it. I will have to use
those debuggiing charts and gradually build a set of functions to
"magically" recover from decode exceptions. But not until later next month.

I haven't used the csv library you mention. All my conversions so far
have been small-scale LibreOffice save-as-csv which have let me use my
editor's replace function to get rid of problem chars and repair csv
files prior to importing data. I know this is the wimp's way out but I
have never had time to do it properly.

I'm really sorry I can't help

Mike


>
> @property def sheet(self):
> """ Returns the file content in a format based on pyexcel api. """ if
> not self.file:
> return None file_extension =self.file.name.split('.')[-1]
> file_type = mimetypes.types_map['.' + file_extension]
> if file_typenot in settings.ALOWED_DATA_FILE_CONTENT_TYPES:
> return None if self.transpose_columns:
> data_file =self._get_transposed_file()
> file_extension = data_file.name.split('.')[-1]
> else:
> data_file =self.file
>
> self._sheet = pyexcel.get_sheet(file_type=file_extension, file_content=data_file.read(), name_columns_by_row=0)
> data_file.seek(0)# it is necessary to return file cursor after read return self._sheet.to_dict()
>
>
> It's works fine with the 0.4.4 pyexcel version.. but, I don't know, I
> start to have other issue with the old version, make me to update to
> the new version... here is the error:
>
>
> Inline image 1
>
> Do you see these error before?
>
> I spent more then 2 weeks to try to solve that, and nothing.. :(
>
>
> Thanks a lot
>
> Regards!
>
>
>
> T.·.F.·.A.·.     S+F
> *Fellipe Henrique P. Soares*
>
> e-mail: > echo "lkrrovknFmsgor4ius" | perl -pe \
> 's/(.)/chr(ord($1)-2*3)/ge'
> /Fedora Ambassador: https://fedoraproject.org/wiki/User:Fellipeh/
> /Blog: /http:www.fellipeh.eti.br
> /GitHub: https://github.com/fellipeh/
> /Twitter: @fh_bash/
>
> On Sun, Sep 24, 2017 at 5:02 AM, Mike Dewhirst <miked@dewhirst.com.au
> <mailto:miked@dewhirst.com.au>> wrote:
>
> On 22/09/2017 10:32 PM, Fellipe Henrique wrote:
>
> Hello guys,
>
> So, I have several csv files, to open using pyexcel... but I
> start to have issues with CSV saved from Excel, with other
> encoding...
>
> There's any option to verify the encoding of file, or change
> the encoding?
>
>
> I use LibreOffice which provides an option to set one of any
> number of encodings including utf-8 when saving xlsx Excel files
> as csv
>
>
>
> regards
>
>
> T.·.F.·.A.·.     S+F
> *Fellipe Henrique P. Soares*
>
> e-mail: > echo "lkrrovknFmsgor4ius" | perl -pe \
> 's/(.)/chr(ord($1)-2*3)/ge'
> /Fedora Ambassador:
> https://fedoraproject.org/wiki/User:Fellipeh/
> <https://fedoraproject.org/wiki/User:Fellipeh/>
> /Blog: /http:www.fellipeh.eti.br <http://www.fellipeh.eti.br>
> /GitHub: https://github.com/fellipeh/
> /Twitter: @fh_bash/
> --
> You received this message because you are subscribed to the
> Google Groups "Django users" group.
> To unsubscribe from this group and stop receiving emails from
> it, send an email to django-users+unsubscribe@googlegroups.com
> <mailto:django-users%2Bunsubscribe@googlegroups.com>
> <mailto:django-users+unsubscribe@googlegroups.com
> <mailto:django-users%2Bunsubscribe@googlegroups.com>>.
> To post to this group, send email to
> django-users@googlegroups.com
> <mailto:django-users@googlegroups.com>
> <mailto:django-users@googlegroups.com
> <mailto:django-users@googlegroups.com>>.
> Visit this group at
> https://groups.google.com/group/django-users
> <https://groups.google.com/group/django-users>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-users/CAF1jwZFrM%2Bze3LQ6bXkhXOAA_LmSrbXYLFYzhF63KHNweAB_jw%40mail.gmail.com
> <https://groups.google.com/d/msgid/django-users/CAF1jwZFrM%2Bze3LQ6bXkhXOAA_LmSrbXYLFYzhF63KHNweAB_jw%40mail.gmail.com>
> <https://groups.google.com/d/msgid/django-users/CAF1jwZFrM%2Bze3LQ6bXkhXOAA_LmSrbXYLFYzhF63KHNweAB_jw%40mail.gmail.com?utm_medium=email&utm_source=footer
> <https://groups.google.com/d/msgid/django-users/CAF1jwZFrM%2Bze3LQ6bXkhXOAA_LmSrbXYLFYzhF63KHNweAB_jw%40mail.gmail.com?utm_medium=email&utm_source=footer>>.
> For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
>
>

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/a797c520-eda8-26a4-1fd1-ece51c03dba5%40dewhirst.com.au.
For more options, visit https://groups.google.com/d/optout.

No comments:

Post a Comment