-
Notifications
You must be signed in to change notification settings - Fork 275
Closed
Labels
Description
I'm trying to parse Azerbaijani forum
url = "http://www.disput.az/index.php?app=forums&module=forums&controller=topic&id=1051606"
g = Grab()
g.go(url)
messages = g.doc.pyquery('[data-role="commentContent"]>p')
for message in messages:
print(message.text_content())
This code prints many strange characters (encoding is broken).
Setting Grab charset/document_charset has no effect.
I've found this fix:
for message in messages:
print(str(message.text_content()).encode("iso-8859-1").decode())
But it's rather strange.
If I try do the same thing with requests, everythin is ok:
r = requests.get(url)
print(r.text)
prints clean Azerbaijani text (html)
Is there any normal solution for this problem?
Info:
Ubuntu 16.04
Grab 0.6.38 (current)
Python 3.5.2