autoscraper网页智能爬虫，省去手动解析网页及写规则-搞机-姿势论坛

autoscraper网页智能爬虫，省去手动解析网页及写规则 win

zsxwz 20/09.13 08:45 1524 只看Ta

是能快速且智能获取指定网站上的数据，这些数据可以是网页文本、url 地址或者其它 HTML 元素。

省去手动解析网页及写规则。

github：
https://github.com/alirezamika/autoscraper

安装：

git clone https://github.com/alirezamika/autoscraper.git

cd autoscraper

python setup.py install

示例：

from autoscraper import AutoScraper
url = 'https://stackoverflow.com/questions/2081586/web-scraping-with-python'

# We can add one or multiple candidates here.
# You can also put urls here to retrieve urls.

wanted_list = ["How to call an external command?"]
scraper = AutoScraper()
result = scraper.build(url, wanted_list)
print(result)

返回示例：

[
    'How do I merge two dictionaries in a single expression in Python (taking union of dictionaries)?', 
    'How to call an external command?', 
    'What are metaclasses in Python?', 
    'Does Python have a ternary conditional operator?', 
    'How do you remove duplicates from a list whilst preserving order?', 
    'Convert bytes to a string', 
    'How to get line count of a large file cheaply in Python?', 
    "Does Python have a string 'contains' substring method?", 
    'Why is “1000000000000000 in range(1000000000000001)” so fast in Python 3?'
]

论坛有你更精彩

姿势小店，提供各种远程服务和商品。

论坛注册邀请码购买

上一篇：ibmcloud目前可用脚本
下一篇：开启arch的chroium浏览器看视频gpu加速