Scrapy
https://github.com/scrapy/scrapy
Scrapy is a fast high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.
pyspider (已停更)
https://github.com/binux/pyspider
A Powerful Spider(Web Crawler) System in Python.
Playwright (多语言)
https://github.com/microsoft/playwright-python
Playwright is a Python library to automate Chromium, Firefox and WebKit browsers with a single API. Playwright delivers automation that is ever-green, capable, reliable and fast. See how Playwright is better.
Selenium (多语言:Java,Python,CSharp,Ruby,JavaScript,Kotlin)
https://github.com/seleniumhq/selenium
https://www.selenium.dev/
Selenium is an umbrella project encapsulating a variety of tools and libraries enabling web browser automation. Selenium specifically provides an infrastructure for the W3C WebDriver specification — a platform and language-neutral coding interface compatible with all major web browsers.
feapder
https://github.com/Boris-code/feapder
feapder是一款上手简单,功能强大的Python爬虫框架,内置AirSpider、Spider、TaskSpider、BatchSpider四种爬虫解决不同场景的需求。
支持断点续爬、监控报警、浏览器渲染、海量数据去重等功能。
更有功能强大的爬虫管理系统feaplat为其提供方便的部署及调度