SpaCy

spaCy
Original author(s)	Matthew Honnibal
Developer(s)	Explosion AI, various
Initial release	February 2015[1]
Stable release	2.0 / 7 November 2017
Repository	github.com/explosion/spaCy
Written in	Python, Cython
Operating system	Linux, Windows, macOS, OS X
Platform	cross-platform
Type	Natural language processing
License	MIT
Website	spacy.io

spaCy (/speɪˈsiː/ spay-SEE) is an open-source software library for advanced Natural Language Processing, written in the programming languages Python and Cython.^[2]^[3] The library is published under the MIT license and currently offers statistical neural network models for English, German, Spanish, Portuguese, French, Italian, Dutch and multi-language NER, as well as tokenization for various other languages.^[4]

Unlike NLTK, which is widely used for teaching and research, spaCy focuses on providing software for production usage.^[5]^[6] As of version 1.0, spaCy also supports deep learning workflows^[7] that allow connecting statistical models trained by popular machine learning libraries like TensorFlow, Keras, Scikit-learn or PyTorch.^[8] spaCy's machine learning library, Thinc, is also available as a separate open-source Python library.^[9] On November 7, 2017, version 2.0 was released.^[10] It features convolutional neural network models for part-of-speech tagging, dependency parsing and named entity recognition, as well as API improvements around training and updating models, and constructing custom processing pipelines.

Main features

Non-destructive tokenization
Named entity recognition
"Alpha tokenization" support for over 25 languages^[11]
Statistical models models for 8 languages^[12]
Pre-trained word vectors
Part-of-speech tagging
Labelled dependency parsing
Syntax-driven sentence segmentation
Text classification
Built-in visualizers for syntax and named entities
Deep learning integration

Extensions and visualizers

Dependency parse tree visualization generated with the displaCy visualizer

spaCy comes with several extensions and visualizations that are available as free, open-source libraries:

Thinc: A machine learning library optimized for CPU usage and deep learning with text input.
sense2vec: A library for computing word similarities, based on Word2vec and sense2vec.^[13]
displaCy: An open-source dependency parse tree visualizer built with JavaScript, CSS and SVG.
displaCy^ENT: An open-source named entity visualizer built with JavaScript and CSS.

References

↑ "Introducing spaCy". explosion.ai. Retrieved 2016-12-18.
↑ Choi et al. (2015). It Depends: Dependency Parser Comparison Using A Web-based Evaluation Tool.
↑ "Google's new artificial intelligence can't understand these sentences. Can you?". Washington Post. Retrieved 2016-12-18.
↑ "Models & Languages | spaCy Usage Documentation". spacy.io. Retrieved 2017-11-08.
↑ "Facts & Figures - spaCy". spacy.io. Retrieved 2017-11-08.
↑ Bird, Steven; Klein, Ewan; Loper, Edward; Baldridge, Jason (2008). "Multidisciplinary instruction with the Natural Language Toolkit" (PDF). Proceedings of the Third Workshop on Issues in Teaching Computational Linguistics, ACL.
↑ "explosion/spaCy". GitHub. Retrieved 2016-12-18.
↑ "Facts & Figures | spaCy Usage Documentation". spacy.io. Retrieved 2017-11-08.
↑ "explosion/thinc". GitHub. Retrieved 2016-12-30.
↑ spaCy: 💫 Industrial-strength Natural Language Processing (NLP) with Python and Cython, Explosion AI, 2017-11-08, retrieved 2017-11-08
↑ "Models & Languages - spaCy". spacy.io. Retrieved 2017-11-08.
↑ "Models & Languages | spaCy Usage Documentation". spacy.io. Retrieved 2017-11-08.
↑ Trask et al. (2015). sense2vec - A Fast and Accurate Method for Word Sense Disambiguation In Neural Word Embeddings.

External links

Official website
spaCy source code on GitHub
Official blog by the creators
spaCy author Matthew Honnibal on the origin of the name

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[1] "Introducing spaCy". explosion.ai. Retrieved 2016-12-18.

[2] Choi et al. (2015). It Depends: Dependency Parser Comparison Using A Web-based Evaluation Tool.

[3] "Google's new artificial intelligence can't understand these sentences. Can you?". Washington Post. Retrieved 2016-12-18.

[4] "Models & Languages | spaCy Usage Documentation". spacy.io. Retrieved 2017-11-08.

[5] "Facts & Figures - spaCy". spacy.io. Retrieved 2017-11-08.

[Bird-Klein-Loper-Baldridge-6] Bird, Steven; Klein, Ewan; Loper, Edward; Baldridge, Jason (2008). "Multidisciplinary instruction with the Natural Language Toolkit" (PDF). Proceedings of the Third Workshop on Issues in Teaching Computational Linguistics, ACL.

[7] "explosion/spaCy". GitHub. Retrieved 2016-12-18.

[8] "Facts & Figures | spaCy Usage Documentation". spacy.io. Retrieved 2017-11-08.

[9] "explosion/thinc". GitHub. Retrieved 2016-12-30.

[10] spaCy: 💫 Industrial-strength Natural Language Processing (NLP) with Python and Cython, Explosion AI, 2017-11-08, retrieved 2017-11-08

[11] "Models & Languages - spaCy". spacy.io. Retrieved 2017-11-08.

[12] "Models & Languages | spaCy Usage Documentation". spacy.io. Retrieved 2017-11-08.

[13] Trask et al. (2015). sense2vec - A Fast and Accurate Method for Word Sense Disambiguation In Neural Word Embeddings.

SpaCy

Main features

Extensions and visualizers

See also

References

External links