解决问题学到东西的过程是最开心的过程:
从昨天到今天,突然发现string里面有很多html的tag,然后想去除掉,找了一圈说用beautifulsoup可以很方便的搞,用了后发现tag去除后,tag后面的字符串直接与前面的字符串搞到一起去了。如"I am a<p>good</p>guy!" 变成了"I am agoodguy!". 百思不得其解怎么搞,后面终于发现,有参数可以控制,哈哈哈,解决方案在这:
https://stackoverflow.com/questions/31140143/how-to-add-space-around-removed-tags-in-beautifulsoup 感恩!!
get_text() in beautifoulsoup4 has an optional input called separator. You can use it as follows :
soup = BeautifulSoup(html)text = soup.get_text(separator=' ')
效果如下:
>>> s = BeautifulSoup("I am a<p>good</p>guy!")
>>> s.get_text()
u'I am agoodguy!'
>>> s.get_text(separator=" ")
u'I am a good guy!'
>>>