Hey all,This is Jan, the founder of Apify (https://apify.com/) — a full-stack web scraping platform. After the success of Crawlee for JavaScript (https://github.com/apify/crawlee/) and the demand from the Python community, we're launching Crawlee for Python today!The main features are:- A unified programming interface for both HTTP (HTTPX with BeautifulSoup) & headless browser crawling (Playwright)- Automatic parallel crawling based on available system resources- Written in Python with type hints for enhanced developer experience- Automatic retries on errors or when you’re getting blocked- Integrated proxy rotation and session management- Configurable request routing - direct URLs to the appropriate handlers- Persistent queue for URLs to crawl- Pluggable storage for both tabular data and filesFor details, you can read the announcement blog post: https://crawlee.dev/blog/launching-crawlee-pythonOur team and I will be happy to answer here any questions you might have.
Users discuss Crawlee's usability, comparing it to Scrapy, Selenium, and other scraping tools. They praise its easy setup, performance, and configurability, but note documentation issues and a lack of examples. Some inquire about specific features like web scraping opt-out, 2FA handling, and anti-blocking. Others question the ethics of scraping and the respect for host settings. There's interest in Python support and requests for code snippets in documentation. The tool is noted to be free, open-source, and Python-compatible, with some users planning to try it for projects.
Users criticized the product for inadequate documentation, particularly missing test case snippets and unclear feature descriptions. The coding style and lack of unique features were also noted. Concerns about ethical considerations, such as web scraping opt-out protocols and bot detection avoidance, were raised. Technical issues mentioned include insufficient type hint coverage, customization difficulties, and inadequate handling of proxies, caching, and 2FA. Comparisons with Scrapy and Crawlee suggest confusion about the product's unique value proposition, and there were requests for better automation and examples, especially for dynamic sites and non-HTML content.