Tuesday, November 27, 2012

Re: How to get the source code of an url?

You're not parsing XML, it's HTML and it's not well formed, for example your title and author tags have closing tags that don't match. Your HTML needs to be valid XHTML before trying to use an XML parser on it. You might want to try something else to parse this, like Scrapy or Beautiful Soup.

On Tuesday, November 27, 2012 3:32:16 AM UTC-8, wbc wrote:

I'm trying to parse an xml url with minidom. I have an url with my xml data.

This is my code:

url = "http://myurl.com/wsname.asp"      datasource = urllib2.urlopen(url)    dom = parse(datasource)  handleElements(dom)

my handleElements function to parse xml:

def handleElements(dom):      Elements = dom.getElementsByTagName("book")      for item in Elements:          getText(item.getElementsByTagName("id")[0].childNodes)          ....

My xml:

<html><head><style type="text/css"></style></head>  <body>  <bibliothque>   <book>   <id>747</id>   <title>L'alchimiste</nomclient>   <author>Paulo Cohelo </nomposte>   </book>    ...   </bibliothque>    </body>

I get no error, but no result!

my handleElements() works fine because when I copy the same data from my url put it in a string and use parseString instead of parse everything works fine and I get my results.

But when trying to openurlElements is empty and the loop is not even started


Seems that I need to get the sourcecode of the url (not it's content) (like the view-source in chrome) How can I do that?

Thanks

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To view this discussion on the web visit https://groups.google.com/d/msg/django-users/-/aYygL7amauAJ.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to django-users+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-users?hl=en.

No comments:

Post a Comment