Wednesday, December 23, 2015

What is a corpus? What is Corpus Linguistics?

A Corpus is a collection of texts (oral or written) produced by a proficient language speaker in a communicative situation made available in a machine readable form to allow linguistic analysis for different purposes. The first ever-made corpus was created in early 1960s and it was the Brown Corpus. It consisted of one million written words made available in an electronic format for use.


 in the image,  you can see Dr. Mark Davies of Brigham Young University in Provo, Utah, USA creating his Corpus of Contemporary American English (COCA)  which is a free, native, written, static, general corpus that consists of 450 million words (190,000 texts) dating from 1990 to 2012. He made other corpora (plural of Corpus) such as Wikipedia Corpus, Corpus of Historical American English (COHA), TIME Magazine Corpus and Corpus of American Soap Operas


Now that we know what is a corps we should know what is Corpus Linguistics (CL).  Corpus Linguistics is the study of field collected authentic language of native speakers.  It’s the study of field-collected authentic language of native speakers. Unlike sociolinguistics or psycholinguistics which are separate paradigms within linguistics, corpus linguistics is “a methodological basis for pursuing linguistic research” (Leech 1992: 105). It's a way of conducting linguistics. 

What makes corpus linguistics unique is its ability to deal with the daily authentic language of native speakers. It follows a descriptive approach to language that accepts all dialects of language as equal and rule-governed. Thus, you can search the use of any word or part of word to see how it's used by native speakers in real communicative situations.

If you are interested in spoken (taped) corpora you can visit Santa Barbara Corpus of Spoken American English in the link below to listen to real people speaking language in authentic situations.

corpus linguistics is more popular in genre analysis where linguists analyze a genre such as acknowledgments, abstracts, introduction so that learners can know the conventions of writing these genres. However, corpus linguistics is also used in designing reference books like dictionaries and language teaching materials.all of this could be done using a concordancer ( an application that allows you to do various types of searches to a previously loaded number of texts or texts you load as your own corpus. You can get a free concordancer from Laurence Anthony's Website .



References:

Leech, Geoffrey (1992) Corpora and Theories of Linguistic Performance. In Svartvik (1992). 105–22.

For further reading:

No comments:

Post a Comment