Beautiful Soup

**Beautiful Soup**
原作者	Leonard Richardson
穩定版本	4.9.0[1] （2020年4月5日）
源代码库	code.launchpad.net/beautifulsoup/;
编程语言	Python
类型	HTML解析库、网络数据采集
许可协议	Python软件基金会许可证（Beautiful Soup 3及以前）; MIT許可證（Beautiful 4及以后）[1]
网站	www.crummy.com/software/BeautifulSoup/

Beautiful Soup是一个Python包，功能包括解析HTML、XML文档、修复含有未闭合标签等错误的文档（此种文档常被称为tag soup）。这个扩展包为待解析的页面建立一棵树，以便提取其中的数据，这在网络数据采集时非常有用。[1]

该扩展包可用于Python 2.7（或以上版本）与Python 3。

示例代码

# Python 2.6+
# anchor extraction from html document
from bs4 import BeautifulSoup
import urllib2

webpage = urllib2.urlopen('https://en.wikipedia.org/wiki/Main_Page')
soup = BeautifulSoup(webpage,'html.parser')
for anchor in soup.find_all('a'):
    print(anchor.get('href', '/'))

参见

HTML解析器对比

参考资料

. [18 April 2012]. （原始内容存档于2017-02-03）. Beautiful Soup is licensed under the same terms as Python itself

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[crummy.com-1] . [18 April 2012]. （原始内容存档于2017-02-03）. Beautiful Soup is licensed under the same terms as Python itself