Beautiful Soup

Beautiful Soup是一个Python包,功能包括解析HTMLXML文档、修复含有未闭合标签等错误的文档(此种文档常被称为tag soup)。这个扩展包为待解析的页面建立一棵,以便提取其中的数据,这在网络数据采集时非常有用。[1]

Beautiful Soup
原作者Leonard Richardson
穩定版本
4.9.0[1]
(2020年4月5日2020-04-05
源代码库
编程语言Python
类型HTML解析库、网络数据采集
许可协议Python软件基金会许可证 (Beautiful Soup 3及以前)
MIT許可證(Beautiful 4及以后)[1]
网站www.crummy.com/software/BeautifulSoup/

该扩展包可用于Python 2.7(或以上版本)与Python 3。

示例代码

# Python 2.6+
# anchor extraction from html document
from bs4 import BeautifulSoup
import urllib2

webpage = urllib2.urlopen('https://en.wikipedia.org/wiki/Main_Page')
soup = BeautifulSoup(webpage,'html.parser')
for anchor in soup.find_all('a'):
    print(anchor.get('href', '/'))

参见

参考资料

  1. . [18 April 2012]. (原始内容存档于2017-02-03). Beautiful Soup is licensed under the same terms as Python itself
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.