Saturday, June 30, 2012

Re: Use regular expression to retrieve all image tags from a given content

You can try the following two suggestions:

1. Try removing the "^" from the pattern and match only r"<img". I believe that the image tag might not be coming at the start of the string.
2. Try printing the value of "content" to check that the "<img" pattern exist in it. The match will be case sensitive, so even <IMG will not be matched.

On a sidenote, you should not be using regular expressions if you are doing anything complex that what you are doing right now.
HTML is not a regular language. So, you will be better off using an xml parser (like lxml or elementtree) or an html parser (BeautifulSoup)

-Sandeep

On Saturday, June 30, 2012 6:07:13 PM UTC+5:30, mo.mughrabi wrote:
Hello, 

am really a noob with regular expressions, I tried to do this on my own but I couldn't understand from the manuals how to approach it. Am trying to find all img tags of a given content, I wrote the below but its returning None

 content = i.content[0].value
            prog
= re.compile(r'^<img')
            result
= prog.match(content)
           
print result

any suggestions?


--
You received this message because you are subscribed to the Google Groups "Django users" group.
To view this discussion on the web visit https://groups.google.com/d/msg/django-users/-/URj9ESCdOaYJ.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to django-users+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-users?hl=en.

No comments:

Post a Comment