Workshops > Workshop 3C: 3:45-5:30

Using statistical tools to explain linguistic variation
A state of the art workshop for NWAV 38

Sali A. Tagliamonte
University of Toronto

Gregory Guy
Daniel Ezra Johnson
Harald Baayen
Bob Bayley
Katie Drager & Jen Hay
John Paolillo

There is growing controversy about the optimum tool for handling linguistic variation. The Variable Rule program in its several incarnations (Cedergren & Sankoff, 1974; Rand & Sankoff, 1990; Sankoff, Tagliamonte & Smith, 2005) has recently come under attack, in part for its aged pedigree but more critically for its (supposed) inferior ability to handle the multiplex (socio)linguistic data that is the standard substance of variation studies. A number of other statistical tools have been proposed, most notably R (Baayen, forthcoming; Team, 2007) and derivatives Rbrul (Johnson, 2009). Seasoned researchers and students alike are under increasing pressure to appropriate the newer tools and techniques. However, most users do not know the differences among these statistical packages nor the background to make informed decisions about how to use them most effectively.

The proposed workshop will consist of presentations by leading proponents of a range of different statistical tools and methods. What are the similarities and differences among them? What kinds of data can they handle? What are their advantages and disadvantages? What strategies does each tool make available for dealing with common data problems, e.g. interaction, independence, fixed vs. mixed effects, etc? What is the nature of the assumptions they require of the analyst? A core feature of the workshop will be to allow for ample discussion and interaction among panelists and participants. Our aim is to provide a state of the art forum that will enable researchers to make informed and confident decisions about the statistical tools they employ for explaining linguistic variation.

Selected References:

Baayen, H. (forthcoming). Analyzing linguistic data: A practical introduction to statistics. Cambridge: Cambridge University Press.

Cedergren, H. J. & Sankoff, D. (1974). Variable rules: Performance as a statistical reflection of competence. Language 50(2): 333-355.

Johnson, D. E. (2009). Getting off the GoldVarb standard: Inroducing Rbrul for mixed effects variabel rule analysis. Language and Linguistics Compass 3(1): 359-383.

Rand, D. & Sankoff, D. (1990). GoldVarb: A variable rule application for the Macintosh. Montreal, Canada: Centre de recherches mathématiques, Université de Montréal.

Sankoff, D., Tagliamonte, S. A. & Smith, E. (2005). Goldvarb X. Department of Linguistics, University of Toronto, Toronto, Canada.

Team, R Development Core (2007). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing


I. Introduction

II. Goldvarb: Still the Gold Standard
Gregory Guy, New York University

III. Mixed models and why sociolinguistic should use them
Daniel Ezra Johnson, University of York, England

IV. Modeling count data with R: random forests and mixed models
Harald Baayen, University of Alberta

V. Comparing SPSS and Goldvarb: The case of subject personal pronouns
Bob Bayley, University of California, Davis

VI. Exploiting Random Intercepts: Two case studies in socio-phonetics
Katie Drager & Jen Hay, University of Canterbury & University of Hawaiʻi at Mānoa

VII. Model vs. Software in Variation Analysis
John Paolillo, Indiana University

VIII. Discussion/Questions and answers