Lab 2: Acquisition of Auxiliaries

Posted Tuesday October 14th, due Tuesday October 21st.

Objectives of the lab:

This assignment carries double credit.

The CHILDES Database

For this lab, you will be using data from the CHILDES (Child Language Data Exchange System) database, which is a collection of 250MB+ of transcripts of adult­child interactions covering many children learning many different languages.

The transcripts are stored in a consistent format, which makes it very easy to run powerful computer searches for regularly occurring words or phrases. The database has been a very widely used tool in language acquisition research over the past half dozen years, and has led to many interesting discoveries about what young children know about their language, generally rather earlier than people had suspected (children's knowledge, not the discoveries).

Searching the Database

In each of the places where the database is installed, you will also find a copy of the program BBEdit Lite 4.0. This is a text editor program which comes with the advantage of having search capabilities that are extremely powerful and at the same time user-friendly, including the ability to search multiple files at once, and to search for complex patterns.

I have written a fairly detailed set of instructions for using BBEdit to search CHILDES: these instructions are available from my homepage. If you have further questions about searching with BBEdit, you should either consult the file "BBEdit Lite Quickstart" in the BBEdit folder, or send me email.

I strongly recommend that you work through the example searches in the CHILDES instructions to familiarize yourself with the capabilities of BBEdit Lite before attempting the assignment.

Finding the materials

You can do this lab in the Computer Room in the Linguistics Department, at #42 East Delaware Ave. You can sign out a key for this room from Jane Creswell in the Linguistics Dept. Office at #46 East Delaware Ave. Alternatively, you can download the necessary materials from the Linguistics Department web server, and work on the lab in the comfort of your own home.

Download:

Both of these are folders which are stored as a single file archive using the utility "MacTar". To extract the folder from the tar archive, download MacTar.

The database files are all text files, so are in principle usable on any computer platform. Unfortunately, though, BBEdit Lite is only available for Macs, so you can save a lot of time by working on this on a Mac. If, however, you have access to another powerful text-searching tool, then I can arrange to get you the database files.

Your task...

The lab focuses on English-speaking children's use of auxiliary verbs like be, have, can, must or do. You will be looking first at the form of these auxiliary verbs in children's speech, and then at the position of auxiliary verbs, to see how well young children master the rules for their use.

You should choose one child from among the many English-learning children in CHILDES. I highly recommend the child known as Adam (a pseudonym), whose data can be found in the folder DATA/ENG/BROWN/ADAM (i.e. the Adam folder inside the Brown folder, inside the English folder etc.). But if you have a good reason to look elsewhere, then feel free. [A useful property of Adam's speech is that he asks lots of questions­a rich source of data on the use of auxiliaries.]

Information on ages of the children at each of the recording sessions can be found in the file readme.txt in each folder (in the newer version of the database, the age information for the Brown corpus is in the file 00brown.doc

1. The first part of the lab is concerned with the form of auxiliary verbs:

1a. For at least 2 time periods - one before age 2;6 and one around age 3, how often are the auxiliary verbs be and do used and how often are they omitted in contexts where they are required? Give both raw numbers and percentages. Do you observe change over time (i.e. differences between your two time periods)?

1b. Based on the data you extracted to answer (1a), and focusing on just those occasions where the child does overtly use 'be' or 'do', how often is the auxiliary used the correct one for the context, e.g. when the form are is used, does it occur with an appropriate subject (i.e. you, they) or does it occur with an inappropriate subject (e.g. she)? Again give both raw numbers and percentages in your answer. Do you observe change over time using this measure?

In either case, if you notice any striking differences between different verbs in your results then mention this. If you do not find any differences among different auxiliaries, then mention this.

Tip: some contexts where auxiliary verbs should be found include:

Note: for the purposes of this assignment a 'time-period' may consist of just one recording session, or more, depending on whether there is enough data in a single session to draw reasonably confident conclusions about the child's proficiency. In general, using 2-3 adjacent recording sessions will help you gauge whether the child is performing fairly consistently from one session to the next.

In any case, once you have figured out what to search for, it is quite easy to run your search on an entire directory of files, so the job of picking out a time period should be easy.

What counts as 'enough data' for a given time-period? This depends on what you're trying to show. If after just 20-30 relevant utterances you find a quite consistent pattern emerging, then this may be enough. If, on the other hand, you suspect that there is a more subtle pattern, possibly involving contrasts in how well two different forms are used, then you may need a somewhat larger data sample (50+ relevant utterances). Of course, you are constrained by how often relevant utterances appear in the child's spontaneous speech.

Note: beware of things the child says which are mere copies of what an adult has just said. If a child produces a complex sentence by imitation, would you want to claim that this is part of his/her 'spontaneous speech'?

2. The second part of the lab is concerned with the position of auxiliary verbs:

In declarative sentences and in questions in which the subject of the sentence is being questioned, the auxiliary verb appears after the subject in English, if there is any auxiliary verb at all, e.g.

John is leaving.
Who has left?

However, in other kinds of questions, either yes-no questions or questions in which something other than the subject is being questioned, the subject and the auxiliary are inverted, so that the auxiliary precedes the subject. This is known as "subject-auxiliary inversion". In such questions an auxiliary verb is obligatory, e.g.

Is John leaving?
Does Bill need help?
Who did Liz leave this time?
Why has Harry left the room?
Where could Helen have gotten to?

This part of the lab asks basically the same questions as in (1), but about the position rather than the form of auxiliaries.

2a. For roughly the same two time periods that you looked at in (1), how often is subject-auxiliary inversion involving be or do found in obligatory contexts (i.e. yes-no questions and non-subject questions) in the child's speech? Give your answer both as raw numbers and as percentages. Do you observe any change over time?

Note: given that questions are generally rarer than declarative utterances in children's speech, you may need to look at a larger time-period here than in question (1) in order to find enough information.

2b. Based on the data you extracted to answer (2a), and focusing on just those occasions where the child does overtly use an auxiliary, how often is subject-auxiliary inversion correctly applied, i.e. how often do you see "What is John doing?" compared to "What John is doing?" Again give both raw numbers and percentages in your answer. Do you observe change over time using this measure?

3. Based on what you have discovered in answering questions (1-2), how good would you say that 2-3 year olds' knowledge of the rules concerning English auxiliaries is?

Note:

And remember, help is available if you send me email or call me (831-6809), or ask a classmate.