是能快速且智能获取指定网站上的数据,这些数据可以是网页文本、url 地址或者其它 HTML 元素。
省去手动解析网页及写规则。
github:
https://github.com/alirezamika/autoscraper
安装:
git clone https://github.com/alirezamika/autoscraper.git
cd autoscraper
python setup.py install
示例:
from autoscraper import AutoScraper
url = 'https://stackoverflow.com/questions/2081586/web-scraping-with-python'
# We can add one or multiple candidates here.
# You can also put urls here to retrieve urls.
wanted_list = ["How to call an external command?"]
scraper = AutoScraper()
result = scraper.build(url, wanted_list)
print(result)
返回示例:
[
'How do I merge two dictionaries in a single expression in Python (taking union of dictionaries)?',
'How to call an external command?',
'What are metaclasses in Python?',
'Does Python have a ternary conditional operator?',
'How do you remove duplicates from a list whilst preserving order?',
'Convert bytes to a string',
'How to get line count of a large file cheaply in Python?',
"Does Python have a string 'contains' substring method?",
'Why is “1000000000000000 in range(1000000000000001)” so fast in Python 3?'
]
上一篇:
ibmcloud目前可用脚本下一篇:
开启arch的chroium浏览器看视频gpu加速