< Legal framework of textual data processing for Machine Translation and Language Technology research and development activities

Case #16 Lexicon Distribution (based on other datasets) I

Case description
ActorResearcher-Resource Compiler & Provider
Intended useDistribute a lexicon I have created on the basis of web crawled data (lemma selection, lemma frequency)
ConditionsI have no (or little) information on the legal info of the web crawled texts, since I only used the dataset to extract specific information from it
QuestionCan I distribute it under the licence of my choice?

Or Do I check again all the sources and ask for permission, which can be extremely time-consuming Or refrain from distributing it?

Suggested legal solution
Legal positionSee Cases #8 – 10
Suggested course of actionKey suggestions: (a) the closer to the original the greater the risk (b) if the original can be substituted by what I have produced, then I should not distribute my work (c) if the original cannot be recognised or reconstructed the risk is too low (d) ideally I should not be using it for commercial purposes (e) if the risk is high include a notice and take-down notice
Type of Terms and ConditionsMultiple. NC element suggested.
Legal basisCopyright Law. Emphasis on limitations and exceptions.

Case #17 Lexicon Distribution (based on other datasets) II

Case description
ActorResearcher-Resource Compiler & Provider
Intended useDistribute a lexicon I have created including web crawled data used in the form of examples
ConditionsI have no (or little) information on the legal info of the web crawled texts, since I only used the dataset to extract specific information from it
QuestionCan I distribute it under the licence of my choice?

Or Do I check again all the sources and ask for permission, which can be extremely time-consuming Or Refrain from distributing it?

Suggested legal solution
Legal positionSee Case #16.
Suggested course of actionSame as Case #16. The inclusion of my own or cleared content further reduces the risks of infringement.
Type of Terms and ConditionsMultiple. NC element suggested.
Legal basisCopyright Law. Emphasis on limitations and exceptions.

Case #18 Lexicon Distribution (based on other datasets) III

Case description
ActorResearcher-Resource Compiler & Provider
Intended useDistribute a lexicon I have created which includes definitions (paraphrased but also verbatim) from other lexica (one or more)
ConditionsThe licence of the original lexica is not always clear (e.g. there is nothing in the Dictionary of modern Greek, while the web site for the Triantafylides dictionary includes the following terms of use: http://www.greek-language.gr/greekLang/terms/index.html)
QuestionCan I distribute it under the licence of my choice?

Or Do I ask for permission from sources? Or Simply state the sources (attribution-like) Or Refrain from distributing it?

Suggested legal solution
Legal positionInclusion of definitions from another lexicon will most probably constitute extraction of substantial parts from another database, even if these are few. The reasons is that to extract them they are significant and thus by definition substantial. For these, permission should be sought. Apparently, the fewer the definitions, the lower the risk. If the definitions are paraphrased and there could not be a way to define a term differently and the structure of the lexicon is not copied, then there should be no problem, since facts and ideas are not copyright protected. The prohibition of paraphrasing in the terms and conditions of a web-site is only valid to the extent that your paraphrasing has copied original elements in the expression of the original definition. Otherwise, ideas are not protected.
Suggested course of actionParaphrase where there is not other way to say something. Avoid copying any original elements. Do not do excessive verbatim copying.
Type of Terms and ConditionsMultiple.
Legal basisCopyright Law. Emphasis on limitations and exceptions.
This article is issued from Wikibooks. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.