Taxonomy (general)

Taxonomy is the practice and science of categorization based on discrete sets. The word is also used as a count noun: a taxonomy, or taxonomic scheme, is a particular categorisation. The word finds its roots in the Greek language τάξις, taxis (meaning 'order', 'arrangement') and νόμος, nomos ('law' or 'science'). Originally, taxonomy referred only to the categorisation of organisms or a particular categorisation of organisms. In a wider, more general sense, it may refer to a categorisation of things or concepts, as well as to the principles underlying such a categorisation. Taxonomy is different from meronomy, which is dealing with the categorisation of parts of a whole.

Many taxonomies have a hierarchical structure, but this is not a requirement. Taxonomy uses taxonomic units, known as "taxa" (singular "taxon").

Applications

Wikipedia categories illustrate a taxonomy,[1] and a full taxonomy of Wikipedia categories can be extracted by automatic means.[2] As of 2009, it has been shown that a manually-constructed taxonomy, such as that of computational lexicons like WordNet, can be used to improve and restructure the Wikipedia category taxonomy.[3]

In a broader sense, taxonomy also applies to relationship schemes other than parent-child hierarchies, such as network structures. Taxonomies may then include single children with multi-parents, for example, "Car" might appear with both parents "Vehicle" and "Steel Mechanisms"; to some however, this merely means that 'car' is a part of several different taxonomies.[4] A taxonomy might also simply be organization of kinds of things into groups, or an alphabetical list; here, however, the term vocabulary is more appropriate. In current usage within knowledge management, taxonomies are considered narrower than ontologies since ontologies apply a larger variety of relation types.[5]

Mathematically, a hierarchical taxonomy is a tree structure of classifications for a given set of objects. It is also named containment hierarchy. At the top of this structure is a single classification, the root node, that applies to all objects. Nodes below this root are more specific classifications that apply to subsets of the total set of classified objects. The progress of reasoning proceeds from the general to the more specific.

By contrast, in the context of legal terminology, an open-ended contextual taxonomy is employed—a taxonomy holding only with respect to a specific context. In scenarios taken from the legal domain, a formal account of the open-texture of legal terms is modeled, which suggests varying notions of the "core" and "penumbra" of the meanings of a concept. The progress of reasoning proceeds from the specific to the more general.[6]

History

Anthropologists have observed that taxonomies are generally embedded in local cultural and social systems, and serve various social functions. Perhaps the most well-known and influential study of folk taxonomies is Émile Durkheim's The Elementary Forms of Religious Life. A more recent treatment of folk taxonomies (including the results of several decades of empirical research) and the discussion of their relation to the scientific taxonomy can be found in Scott Atran's Cognitive Foundations of Natural History. Folk taxonomies of organisms have been found in large part to agree with scientific classification, at least for the larger and more obvious species, which means that it is not the case that folk taxonomies are based purely on utilitarian characteristics.[7]

In the seventeenth century the German mathematician and philosopher Gottfried Leibniz, following the work of the thirteenth-century Majorcan philosopher Ramon Llull on his Ars generalis ultima, a system for procedurally generating concepts by combining a fixed set of ideas, sought to develop an alphabet of human thought. Leibniz intended his characteristica universalis to be an "algebra" capable of expressing all conceptual thought. The concept of creating such a "universal language" was frequently examined in the 17th century, also notably by the English philosopher John Wilkins in his work An Essay towards a Real Character and a Philosophical Language (1668), from which the classification scheme in Roget's Thesaurus ultimately derives.

Use of taxonomies in various disciplines

Taxonomies in software engineering

Vegas et al.[8] make a compelling case to advance the knowledge in the field of software engineering through the use of taxonomies. Similarly, Ore et al.[9] provide a systematic methodology to approach taxonomy building in software engineering related topics.

Software testing taxonomies

Several taxonomies have been proposed in software testing research to classify techniques, tools, concepts and artifacts. The following are some example taxonomies:

  1. A taxonomy of model-based testing techniques[10]
  2. A taxonomy of static-code analysis tools[11]

Engström et al.[12] suggest and evaluate the use of a taxonomy to bridge the communication between researchers and practitioners engaged in the area of software testing. They have also developed a web-based tool[13] to facilitate and encourage the use of the taxonomy. The tool and its source code are available for public use.[14]

Taxonomies in research publishing

Citing inadequacies with current practices in listing authors of papers in medical research journals, Drummond Rennie and co-authors called in a 1997 article in JAMA, the Journal of the American Medical Association for

a radical conceptual and systematic change, to reflect the realities of multiple authorship and to buttress accountability. We propose dropping the outmoded notion of author in favor of the more useful and realistic one of contributor.[15]:152

Since 2012, several major academic and scientific publishing bodies have mounted Project CRediT to develop a controlled vocabulary of contributor roles.[16] Known as CRediT (Contributor Roles Taxonomy), this is an example of a flat, non-hierarchical taxonomy; however, it does include an optional, broad classification of the degree of contribution: lead, equal or supporting. Amy Brand and co-authors summarise their intended outcome as:

Identifying specific contributions to published research will lead to appropriate credit, fewer author disputes, and fewer disincentives to collaboration and the sharing of data and code.[15]:151

As of mid-2018, this taxonomy apparently restricts its scope to research outputs, specifically journal articles; however, it does rather unusually "hope to … support identification of peer reviewers".[16] (As such, it has not yet defined terms for such roles as editor or author of a chapter in a book of research results.) Version 1, established by the first Working Group in the (northern) autumn of 2014, identifies 14 specific contributor roles using the following defined terms:

  • Conceptualization
  • Methodology
  • Software
  • Validation
  • Formal Analysis
  • Investigation
  • Resources
  • Data curation
  • Writing – Original Draft
  • Writing – Review & Editing
  • Visualization
  • Supervision
  • Project Administration
  • Funding acquisition

Reception has been mixed, with several major publishers and journals planning to have implemented CRediT by the end of 2018, whilst almost as many aren't persuaded of the need or value of using it. For example,

The National Academy of Sciences has created a TACS (Transparency in Author Contributions in Science) webpage to list the journals that commit to setting authorship standards, defining responsibilities for corresponding authors, requiring ORCID iDs, and adopting the CRediT taxonomy.[17]

The same webpage has a table listing 21 journals (or families of journals), of which:

  • 5 have, or by end 2018 will have, implemented CRediT,
  • 6 require an author contribution statement and suggest using CRediT,
  • 8 don't use CRediT, of which 3 give reasons for not doing so, and
  • 2 are uninformative.

The taxonomy is an open standard conforming to the OpenStand principles,[18] and is published under a Creative Commons licence.[16]

Taxonomy for the web

Websites with a well designed taxonomy or hierarchy are easily understood by users, due to the possibility of users developing a mental model of the site structure.[19]

Guidelines for writing taxonomy for the web

  • Mutually exclusive categories can be beneficial. If categories appear several places, it's called cross-listing or polyhierarchical. The hierarchy will lose its value if cross-listing appears too often. Cross-listing often appears when working with ambiguous categories that fits more than one place.[19]
  • Having a balance between breadth and depth in the taxonomy is beneficial. Too many options (breadth), will overload the users by giving them too many choices. At the same time having a too narrow structure, with more than two or three levels to click-through, will make users frustrated and might give up.[19]

Is-a and has-a relationships, and hyponymy

Two of the predominant types of relationships in knowledge-representation systems are predication and the universally quantified conditional. Predication relationships express the notion that an individual entity is an example of a certain type (for example, John is a bachelor), while universally quantified conditionals express the notion that a type is a subtype of another type (for example, A dog is a mammal, which means the same as All dogs are mammals).[20]

Taxonomies are often represented as is-a hierarchies where each level is more specific (in mathematical language "a subset of") the level above it. For example, a basic biology taxonomy would have concepts such as mammal, which is a subset of animal, and dogs and cats, which are subsets of mammal. This kind of taxonomy is called an is-a model because the specific objects are considered instances of a concept. For example, Fido is-a instance of the concept dog and Fluffy is-a cat.[21]

In linguistics, is-a relations are called hyponymy. Words that describe categories are called hypernyms and words that are examples of categories are hyponyms. In the simple biology example dog is a hypernym and Fido is one of its hyponyms. A word can be both a hyponym and a hypernym. For example, dog is a hyponym of mammal and also a hypernym of Fido.

See also

Notes

  1. Zirn, Cäcilia, Vivi Nastase and Michael Strube. 2008. "Distinguishing Between Instances and Classes in the Wikipedia Taxonomy" (video lecture). 5th Annual European Semantic Web Conference (ESWC 2008).
  2. S. Ponzetto and M. Strube. 2007. "Deriving a large scale taxonomy from Wikipedia". Proc. of the 22nd Conference on the Advancement of Artificial Intelligence, Vancouver, B.C., Canada, pp. 1440-1445.
  3. S. Ponzetto, R. Navigli. 2009. "Large-Scale Taxonomy Mapping for Restructuring and Integrating Wikipedia". Proc. of the 21st International Joint Conference on Artificial Intelligence (IJCAI 2009), Pasadena, California, pp. 2083-2088.
  4. Jackson, Joab. "Taxonomy's not just design, it's an art," Government Computer News (Washington, D.C.). September 2, 2004.
  5. Suryanto, Hendra and Paul Compton. "Learning classification taxonomies from a classification knowledge based system." University of Karlsruhe; "Defining 'Taxonomy'," Straights Knowledge website.
  6. Grossi, Davide, Frank Dignum and John-Jules Charles Meyer. (2005). "Contextual Taxonomies" in Computational Logic in Multi-Agent Systems, pp. 33-51.
  7. Kenneth Boulding; Elias Khalil (2002). Evolution, Order and Complexity. Routledge. ISBN 9780203013151. p. 9
  8. Vegas, S. (2009). "Maturing software engineering knowledge through classifications: A case study on unit testing techniques". IEEE Transactions on Software Engineering. 35 (4): 551–565. CiteSeerX 10.1.1.221.7589. doi:10.1109/TSE.2009.13.
  9. Ore, S. (2014). "Critical success factors taxonomy for software process deployment". Software Quality Journal. 22 (1): 21–48. doi:10.1007/s11219-012-9190-y.
  10. Utting, Mark (2012). "A taxonomy of model-based testing approaches". Software Testing, Verification & Reliability. 22 (5): 297–312. doi:10.1002/stvr.456.
  11. Novak, Jernej. "Taxonomy of static code analysis tools". Proceedings of the 33rd International Convention MIPRO: 418–422.
  12. Engström, Emelie (2016). "SERP-test: a taxonomy for supporting industry–academia communication". Software Quality Journal. 25 (4): 1269–1305. doi:10.1007/s11219-016-9322-x.
  13. "SERP-connect".
  14. Engstrom, Emelie. "SERP-connect backend".
  15. Brand, Amy; Allen, Liz; Altman, Micah; Hlava, Marjorie; Scott, Jo (1 April 2015). "Beyond authorship: attribution, contribution, collaboration, and credit". Learned Publishing. 28 (2): 151–155. doi:10.1087/20150211.
  16. "CRediT". CASRAI. CASRAI. 2 May 2018. Archived from the original (online) on 12 June 2018. Retrieved 13 June 2018.
  17. "Transparency in Author Contributions in Science (TACS)" (online). National Academy of Sciences. National Academy of Sciences. 2018. Retrieved 13 June 2018.
  18. "OpenStand". OpenStand. Retrieved 13 June 2018.
  19. Peter., Morville (2007). Information architecture for the World Wide Web. Rosenfeld, Louis., Rosenfeld, Louis. (3rd ed.). Sebastopol, CA: O'Reilly. ISBN 9780596527341. OCLC 86110226.
  20. Ronald J. Brachman; What IS-A is and isn't. An Analysis of Taxonomic Links in Semantic Networks. IEEE Computer, 16 (10); October 1983.
  21. Brachman, Ronald (October 1983). "What IS-A is and isn't. An Analysis of Taxonomic Links in Semantic Networks". IEEE Computer. 16 (10): 30–36. doi:10.1109/MC.1983.1654194.

References

  • Media related to Taxonomy at Wikimedia Commons
  • The dictionary definition of taxonomy at Wiktionary
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.