Department of Languages, Linguistics and Area Studies
International Corpus Linguistics Research Unit (ICLRU)
Rationale for an International Corpus Linguistics Research Unit (ICLRU)
A legacy of the structural tradition in linguistics is the widespread acceptance of the premise that language structure is independent of language usage. This premise is codified in a variety of theoretical distinctions, from Saussure’s (1916) langue and parole to Chomsky’s (1965) competence and performance. This approach suggests that introspective study of language structure can lead us to just conclusions concerning the basic cognitive mechanisms that make human language possible.
In other fields, however, it is widely held that cognitive representations are highly affected by experience. In humans and non-humans detailed tracking of probabilities leads to behaviour which promotes survival (Kelly and Martin, 1994) and, even within linguistics, certain usage-based effects are widely accepted: unmarked members of categories are more frequent than marked members (Greenberg 1966); irregular morphological formations with high frequency are less likely to regularise; regular patterns have a wider range of applicability; and high frequency phrases undergo special reduction.
The relatively recent rise of Corpus Linguistics reflects and facilitates the study of such probabilistic frequency effects, and these are of interest not only to those seeking to understand basic cognitive mechanisms but also to scholars investigating sociolinguistic phenomena such as language variation and change and psycholinguistic or applied linguistic domains relating to language acquisition. Sociolinguistics as a discipline demonstrates the way in which variation in usage can lead to change in structures. The mechanisms of language change - grammaticalisation, pragmaticalisation, contact-induced change and changes in languages through (bilingual) language acquisition - constitute a particular research interest in the Unit. Complemented by experimental methods in which data are elicited in highly focused ways, the corpus approach lends itself to research studies in a number of linguistic fields.
Corpus Linguistics is a science and approach which has been championed in the English-speaking world. The British National Corpus, an electronic corpus of texts both written and spoken of some 100 million words, containing a balanced sample of genres from everyday life in Britain, was established in 1993. There are a number of Centres of Corpus Linguistics both in Britain and abroad which focus on English (the University of Lancaster - UCREL, Unit for Computer Research on the English Language, University of Nottingham’s CANCODE Corpus, the University of Birmingham’s COBUILD Project, the University of Louvain-la-Neuve’s CECL - Centre for English Corpus Linguistics) but, though developments are rapidly underway, particularly with respect to Spanish, Portuguese and German, less so for French, there is currently no centre which takes continental European languages or indeed languages other than English, such as Arabic, Chinese or Somali, as its primary focus of interest. The ECI - European Corpus Initiative - recognised that “It is generally agreed that there are not enough corpora in languages other than English” and has created a very useful and large corpus of written material. The corpus regrettably contains very few samples of spoken language.
In the International Corpus Linguistics Unit (ICLRU) at UWE, Bristol, our researchers have spent a number of years collecting and exploiting corpora of spoken French, Basque, Dutch, German and Turkish and have published articles in fields as diverse as language variation, contact and change, cross-cultural politeness, bilingual language acquisition and measures of lexical diversity in L2 learners. The Bristol Corpus of Spoken French has been made available on the Internet for other scholars to consult and we are establishing spoken learner corpora where the L2 is English, French, German or Spanish. Yet another highly promising avenue is the use of parallel corpora in translation studies and we have begun to compile such corpora in specific areas and to investigate the usefulness of computer tools in this field. Though the specific linguistic focus of our studies may be highly diverse, the underlying philosophy and techniques share a common purpose: the empirical investigation and quantification of language in use.
Kate Beeching
July, 2007

