Catvar 2.0The Categorial Variation Database (English)Catvar2.1 (c) 2003 Copyright University of Maryland. All Rights Reserved.Licensed under the Open Software License version 1.1 What is a Catvar?A Categorial-Variation Database (or Catvar) is a database of clusters of uninflected words (lexemes) and their categorial (i.e. part-of-speech) variants. For example, the words hunger(V), hunger(N), hungry(AJ) and hungriness(N) are different English variants of some underlying concept describing the state of being hungry. Another example is the developing cluster:(develop(V), developer(N), developed(AJ), developing(N), developing(AJ), development(N)). The database was developed for English using a combination of resources and algorithms including the LCS Verb and Preposition Databases (Dorr 2001), the Brown Corpus section of the Penn Treebank (Marcus et al. 1994), an English morphological analysis lexicon developed for PC-Kimmo (ENGLEX) (Antworth 1990), WordNet1.6 (Fellbaum 1998), an English Verb-Noun list extracted from Nomlex (Macleod et al. 1998), a similar list extracted from LDOCE (Procter 1978) and the Porter stemmer (Porter 1980). In its first release, the database contained 28,305 clusters for 46,037 words. In this second release, there are 63,146 and 109,807 words. Access Catvar2.0 onlineDownload CatvarCatvar2.1 (c) 2003 Copyright University of Maryland. All Rights Reserved.Licensed under the Open Software License version 1.1 Catvar2.1 is now on Github (Jan 2, 2019). Credits
How to cite CATVAR
References
Contacts
Catvar2.1 (c) 2003 Copyright University of Maryland. All Rights Reserved. Licensed under the Open Software License version 1.1 |