Scrapy
| |
Developer(s) | Scrapinghub, Ltd. |
---|---|
Initial release | 26 June 2008 |
Stable release |
1.5.1
/ 12 July 2018[1] |
Repository |
|
Written in | Python |
Operating system | Windows, macOS, Linux |
Type | Web crawler |
License | BSD License |
Website |
scrapy |
Scrapy (/ˈskreɪpi/ SKRAY-pee)[2] is a free and open-source web-crawling framework written in Python. Originally designed for web scraping, it can also be used to extract data using APIs or as a general-purpose web crawler.[3] It is currently maintained by Scrapinghub Ltd., a web-scraping development and services company.
Scrapy project architecture is built around "spiders", which are self-contained crawlers that are given a set of instructions. Following the spirit of other don't repeat yourself frameworks, such as Django,[4] it makes it easier to build and scale large crawling projects by allowing developers to reuse their code. Scrapy also provides a web-crawling shell, which can be used by developers to test their assumptions on a site’s behavior.[5]
Some well-known companies and products using Scrapy are: Lyst,[6] CareerBuilder,[7] Parse.ly,[8] Sayone Technologies[9], Sciences Po Medialab,[10] Data.gov.uk’s World Government Data site.[11]
History
Scrapy was born at London-based web-aggregation and e-commerce company Mydeco, where it was developed and maintained by employees of Mydeco and Insophia (a web-consulting company based in Montevideo, Uruguay). The first public release was in August 2008 under the BSD license, with a milestone 1.0 release happening in June 2015.[12] In 2011, Scrapinghub became the new official maintainer.[13][14]
References
- ↑ "Release notes — Scrapy documentation". doc.scrapy.org. Retrieved 2018-08-13.
- ↑ How do you pronounce "Scrapy"?
- ↑ Scrapy at a glance.
- ↑ "Frequently Asked Questions". Retrieved 28 July 2015.
- ↑ "Scrapy shell". Retrieved 28 July 2015.
- ↑ Bell, Eddie; Heusser, Jonathan. "Scalable Scraping Using Machine Learning". Retrieved 28 July 2015.
- ↑ Scrapy | Companies using Scrapy
- ↑ Montalenti, Andrew. "Web Crawling & Metadata Extraction in Python".
- ↑ "Scrapy Companies". Scrapy website.
- ↑ Hyphe v0.0.0: the first release of our new webcrawler is out!
- ↑ Ben Firshman [@bfirsh] (21 January 2010). "World Govt Data site uses Django, Solr, Haystack, Scrapy and other exciting buzzwords bit.ly/5jU3La #opendata #datastore" (Tweet) – via Twitter.
- ↑ Medina, Julia (19 June 2015). "Scrapy 1.0 official release out!". scrapy-users (Mailing list).
- ↑ Pablo Hoffman (2013). List of the primary authors & contributors. Retrieved 18 November 2013.
- ↑ Interview Scraping Hub.