Corpus Linguistics: Readings in a Widening Discipline

Front Cover
A&C Black, Oct 1, 2005 - Language Arts & Disciplines - 542 pages
1 Review
Corpus Linguistics seeks to provide a comprehensive sampling of real-life usage in a given language, and to use these empirical data to test language hypotheses. Modern corpus linguistics began fifty years ago, but the subject has seen explosive growth since the early 1990s. These days corpora are being used to advance virtually every aspect of language study, from computer processing techniques such as machine translation, to literary stylistics, social aspects of language use, and improved language-teaching methods.

Because corpus linguistics has grown fast from small beginnings, newcomers to the field often find it hard to get their bearings. Important papers can be difficult to track down. This volume reprints forty-two articles on corpus linguistics by an international selection of authors, which comprehensively illustrate the directions in which the subject is developing. It includes articles that are already recognized as classics, and others which deserve to become so, supplemented with editorial introductions relating the individual contributions to the field as a whole.

This collection of readings will be useful to students of corpus linguistics at both undergraduate and postgraduate level, as well as academics researching this fascinating area of linguistics.
 

What people are saying - Write a review

LibraryThing Review

User Review  - Jewsbury - LibraryThing

Perhaps in 30 years time, we shall take it for granted that software will outperform humans when it comes to translating, précising, editing or simplifying texts. If this is the case, then it will be ... Read full review

Contents

1 Introduction
1
2 From The Structure of English 1952
9
3 A standard corpus of edited presentday American English 1965
27
4 On the distribution of nounphrase types in English clausestructure 1971
35
5 Predicting text segmentation into tone units 1986
49
6 Typicality and meaning potentials 1986
58
7 Historical drift in three English genres 1987
67
8 Corpus creation 1987
78
24 Why a Fiji corpus? 1996
276
25 Treebank grammars 1996
285
26 English corpus linguistics and the foreignlanguage teaching syllabus 1996
293
an overview 1996
304
A comparison of the verbal disputes between adolescent females in two corpora 1996
326
the kappa statistic 1996
335
30 Linguistic and interactional features of Internet Relay Chat 1996
340
New evaluation methods for wordsense disambiguation 1997
353

9 Cleft and pseudocleft constructions in English spoken and written discourse 1987
85
10 What is wrong with adding one? 1989
95
11 A statistical approach to machine translation 1990
103
an analysis of a dialect continuum 1991
113
13 Using corpus data in the Swedish Academy grammar 1991
122
14 On the history of thatzero as object clause links in English 1991
137
15 Encoding the British National Corpus 1992
149
16 Computer corpora what do they tell us about culture? 1992
160
17 Representativeness in corpus design 1992
174
Principles Methods and Examples 1993
198
19 Structural ambiguity and lexical relations 1993
212
20 Irony in the text or insincerity in the writer? The diagnostic potential of semantic prosodies 1993
229
the Penn Treebank 1993
242
22 Automatically extracting collocations from corpora for language learning 1994
258
23 Developing and evaluating a probabilistic LR parser of partofspeech and punctuation labels 1995
267
32 Qualification and certainty in L1 and L2 students writing 1997
371
33 Analysing and predicting patterns of DAMSL utterance tags 1998
387
34 Assessing claims about language use with corpus data swearing and abuse 1998
396
35 The syntax of disfluency in spontaneous spoken language 1998
404
36 The use of large text corpora for evaluating texttospeech systems 1998
421
how much of the underlying syntactic structure can be tagged automatically? 1999
427
38 Reflections of a dendrographer 1999
434
39 A generic approach to software support for linguistic annotation using XML 2000
449
40 Europes ignored languages 2001
460
41 Semiautomatic tagging of intonation in French spoken corpora 2001
462
42 Web as corpus 2001
471
43 Intonational variation in the British Isles 2002
474
Bibliography
483
URL List
509
Index
511

Other editions - View all

Common terms and phrases

About the author (2005)

Geoffrey Sampson is a former Professor of Natural Language Computing at the School of Informatics, University of Sussex. He is now a Research Fellow at the University of South Africa.

Diana McCarthy is a Royal Society Dorothy Hodgkin Fellow, in the Department of Informatics at Sussex University.

Bibliographic information