User Guide - Overview

2. Overview of ILDA

What type of data is included in ILDA? ILDA houses primary source material, which means that it houses copies of original historical documentation (both written and audio). The original copies of primary source materials are often housed at universities in collections, and may also be held by private institutions, tribes, and individuals. ILDA provides a digital space to increase accessibility to these materials, and to enable analysis and comparison across collections.

2.1. How Data is Organized in ILDA

Within ILDA, data is organized into Documents. Each document represents a discrete collection (e.g. a notebook or set of notebooks from a speaker, a linguists’ field notes, a tape series). Within documents, data is organized into phrases. Each phrase entry represents a token in the target language, and the relevant analysis that ILDA users would like to accompany that word or phrase

2.1.1. Documents

On ILDA, a Document is a user-defined collection of materials. It is up to you as the administrator of the archive to decide the scope of each document you create. For example, if working with the Harrington Collection, you may choose to combine all the reels from the National Anthropological Archives that feature your language into a single document on ILDA. Alternatively, you might choose to make each reel a document within ILDA. When defining documents, take time to build criteria that your team can use to determine how material should be divided into documents on ILDA. Ideally, your documents will be broad collections, which will keep the search function from becoming unwieldy. Documents can be uploaded, edited, deleted, and even combined, so your team will have the opportunity to refine your own process in deciding how to divide materials into documents.

Within a document each phrase, is marked three times using page, line and phrase numbers. These numbers are the coordinates that identify each token within a document in ILDA.

When working with data from existing archival collections, it is best practice to contact archivists at those institutions and request permission to re-publish copies of those materials. Potential obstacles include publishing restrictions and requests from tribal governments to redact or restrict sensitive information. This is another factor to consider when identifying documents for inclusion in ILDA. Because ILDA is a web-based tool, ensuring that materials hosted on ILDA are appropriate to share with a broad audience is essential. At this time ILDA does not have the capability to restrict access to materials hosted on the platform.

2.1.2. Phrase

The most basic unit within ILDA is an individual phrase. Each phrase corresponds to a token found in a primary source document. Within each phrase entry are several fields which allow ILDA users to add details including transcriptions, translations, glosses, metadata (e.g. speaker, dialect), and morpheme analysis. Phrases are tagged according to the page, line and phrase in which they appear in the primary source document (more detail in the following sections).

2.1.3. Layout within Phrase Entries

Phrases feature three ribbons near the top of the page (these are marked in the image below by red numbers 1,2 and 3). The first ribbon gives information on the coordinates (document, page, line, phrase number), administrative functions, and provides access to the digital surrogate. The second offers a summary of the transcription and gloss for a given token. The third provides a space for additional metadata (more on this in Section 8 Administrative Functions, subheading i. Defining Fields). Below those three ribbons are the main components of a phrase entry: five tabs which can be clicked on for an expanded view. Within the expanded view, you can see notes, and additional analysis.

 Fig. 2.1.3 - Basic Layout

Original Target Language can display a typed representation of the original text from the source document. For written materials, this field can display a digital reproduction of how the word or phrase was written by a given linguist (i.e. in the same orthography they used). For audio entries, this field is meant for your transcription of the token. When you transcribe an audio file, you will decide what orthography to use when filling out this field.

Original Gloss displays the how the language item was analyzed, generally by the linguist who elicited the data. For example, if a linguist asked “how do you say snake?,” this field would feature the word “snake.” This is a common place for confusion in the data. For example, a linguist may ask “how do you say sleep?” and a speaker may hear “how do you say sweep?” To correct for mistakes such as the example above, or for unclear, antiquated or offensive language, or when a token was originally elicited in a language other than English, you may choose to update the English Translation field (found within Original Target Language). English Translation is a field to hold your interpretation of the meaning of a given phrase after your own analysis, cross-reference, work with language experts, and consideration of the Original Gloss.

Modern Speech Form displays the word/phrase as it would be written using the modern orthography for your target language. If your community has not developed a standard orthography for your language, you will not immediately fill this field out. If your community has more than one orthography used to represent the language, you will have to settle on which orthography will be represented on ILDA. There is room in other fields to include alternative spellings/orthographies (see the Modern Spelling field). When filling out Modern Speech Form, you are taking a step towards developing a standard, or type-level, representation of a word or phrase.

This field reflects an extra editing step by the project team. For audio clips, this field is filled out after a clip has been compared to other phrases and after you have settled on a spelling that you believe accurately represents how the word is pronounced. For written materials, this field is filled after you transliterate the phrase from a linguists’ original orthography into the modern orthography.

It is important to note here that the linguists and ethnographers who recorded these archival materials, like all of us, did not do perfect work. It is certain their work contains transcription errors. When transliterating their work, you will have to make decisions about how to represent the data they collected, using your communities’ orthography. To do this, you will need to compare different tokens of the same word, develop a sense of what sounds and sound combinations a linguist may have struggled to perceive, and cross-reference how different linguists have transcribed the same words. You may also benefit from using a concordance orthography, as well as comparison with audio materials. We suggest implementing a policy to fill this field out only when you have a high degree of confidence about how a word is pronounced. That said, digital archiving is an ongoing process, and you will inevitably make changes as spelling conventions change, and as your analysis grows.

Original Gloss and Original Target Language are fields which aim to capture as accurately as possible the original source material, while English Translation and Modern Speech Form seek to bring together analysis to capture as accurately as possible the pronunciation and meaning of a given phrase. This means that Original Gloss and Original Target Language function mainly to preserve the original archival materials, while English Translation and Modern Speech Form function to provide high quality data for use in language revitalization activities following your analysis.

In addition to the three fields above, there is the Cognates field, which is used to keep track of similar terms in other related languages, and Semantics, Syntax, Speaker and Dialect field, which gives information about what type of token the phrase entry falls into, the speaker’s name, and what dialect group(s) they belong to.

By clicking on any of the five bolded terms above, you will open a drop-down menu with more detailed information. These five fields are the main where information is kept in ILDA. In Section 2.2, we use snapshots to show you how to navigate each of the five fields.

2.1.4. Digital Surrogates

Included in the top ribbon of each phrase entry is a plug-in that displays a digital surrogate of the original source material. This simply means that an image or sound clip which shows the token in its original context is viewable from within ILDA. This allows researchers to easily look or listen and see/hear each token as it exists within the source document.

Accessing Digital Surrogates

For written documents, you can look at a digital scan of the original document simply by clicking the eye icon, which is located on the right side of the first ribbon in the phrase entry.

This will open a window that you can drag and will allow you to zoom in and out of the digital surrogate. Please note that once this window is open, there may be two scroll bars on the right side of your web browser, which can complicate navigating the web page. If you are using a MacOS operating system and are having trouble with the image viewer, refer to the Troubleshooting section for a solution.

For audio materials, there is no eye icon. Instead, there is an audio plug-in that plays a recording of the speaker saying the phrase (refer to Layout of Entries section of this guide for more detail).

Navigating using Page, Line and Phrase Number

Each phrase entry has its own coordinate in the archive, which is marked by 1) the document name that the phrase entry is found within and 2) the page, line and phrase number. This establishes the order that phases appear within documents, from beginning to end, and mirrors the original layout of the primary source documents. In addition to navigating by using the search function, you can also “flip through” documents in the order in which they are laid out. To do this from a phrase entry, you simply click prev or next to take you to the next phrase. The coordinate for your current phrase entry is in the top ribbon of the phrase entry.

We use slightly different conventions for written versus audio materials. For written materials, the page, line and phrase number are meant to mirror the way that tokens appear on the original source document page. They are a re-creation of how you would number the pages if you were holding physical copies of the original document. For audio sources, the original source material is not laid out on a page. Therefore, we have a slightly different convention for numbering audio “documents” in ILDA. For audio, we fill “1” for all page and line numbers, and instead rely only on the phrase number to navigate. This means that within an audio file, phrases are numbered sequentially from the beginning to the end of the document. So, the first phrase in the audio file would be page: 1, line: 1, phrase: 1, while the fifteenth would be page: 1, line: 1, phrase: 15. By using this convention in your own ILDA database, you will be able to easily navigate audio files.

You might ask: why do audio “documents” still have page and line numbers? This is because ILDA software does not distinguish between written and audio materials; the program requires a page and line number to order entries relative to each other in the database.

You can also use Go to… to hop to any coordinate in the archive. Simply push the button, type in the page, line and phrase, and select the document, then click GO.

2.2. Description of Fields

Now that we have covered the basic organization and layout of ILDA, we will cover more in-depth descriptions of each field within a phrase. As described in Section 2.1, fields are divided into three ribbons and five tabs within ILDA.

Fig. 2.2.1. ILDA Ribbon

In the first ribbon, you will see the document name, the page, line and phrase coordinate, a tag identifying whether it is audio or written, and six icons. Written documents will feature page, line, and phrase numbers; audio documents will display only phrase numbers.

Eye icon: This icon opens an image with a digital surrogate of the original primary source material.

Expand icon: This icon opens the expanded view for all five tabs described below (Original Target Language, etc.)

Save icon (visible when logged in): This icon saves any changes that are typed into fields within the phrase

Manage icon(visible when logged in): Clicking the Manage icon will open a dropdown menu that allows you several options

Fig. 2.2.2 - ILDA Entry Manage

Split: This menu option allows you to split a phrase in two; it will create a blank entry with a sequential coordinate following the existing phrase.

Media: This option allows you to add or change media files associated with phrase entries.

Delete: this option allows you to delete a phrase entry

Open Document: This option opens the Digital Surrogate image in a separate window

Left and right arrows: These two buttons send you to the previous or next phrase according to the page, line and phrase number within each document. If you press this button on the first or last page of a document, it will automatically direct you into the preceding or next document.

The second ribbon summarizes the Original Target Language, Original Gloss, and Modern Speech Form.

The third ribbon features two user-defined fields (this means that administrators can change the name of these field names). In the example image above, these fields have been defined as “Reel/Notebook” and “Text Category”. These fields are meant to provide a space to tag important information that does not fit into Semantics, Syntax, or the two ribbons above. For example, the Nuu-wee-ya' language team has used one field to link phrases back to specific archival notebooks and reels, with the intent of helping to re-orient community language workers who have relied on notebook and reel numbers to cite their work. They used the other to denote when an phrase is a party of a traditional story or personal narrative, as entries from texts/stories are likely to have context-dependent translations and may feature unique grammar.

The expanded view of each “tab” is shown below, with a description of all fields contained within each tab.

Original Target Language

Fig. 2.2.3 - Original Language

Original Target Language: this field houses a typed representation of the original text from the source document.

For written materials, this will often be in a specialized orthography such as the International Phonetic Alphabet (IPA) or Americanist Phonetic Notation

Modern Spelling: This field allows users to transliterate into their modern orthography without filling out Modern Speech Form. Modern Spelling can be used to assist researchers in transliterating phrases that they have a low degree of confidence about or as a place to represent historical grammatical constructions in the language’s modern orthography.

English Translation: After analyzing and refining the Original Gloss, users will fill out an updated translation within the English Translation field.

Comment: Any comments by the researcher that deal with the Original Target Language data may be noted here. This field can also be used to represent notes from the original source document written in English.

Original Gloss

Fig. 2.2.4 - Original Gloss

Original Gloss: For all entries that feature a gloss, record it here.

English Translation of the Gloss: This field is only used when the Original Gloss is in a language other than English.

Comments: Any comments that deal with the Original Gloss may be noted here.

Modern Speech Form

Fig. 2.2.5 - Modern Speech Form

Modern Speech Form: This field will hold a transliteration of the Original Target Language into whatever modern orthography your research team decides to use for the language.

Comments: Any comments about the transliteration or changes between Original Target Language and Modern Speech Form may be noted here.

Add Stems: Clicking this button will open a box that will allow you to add a stem, and associated morphemes for a given phrase entry. A stem includes derivational morphemes which change the part of speech of a given root, for example to make a verb out of a noun. However, it does not capture inflectional morphology, such as that associated with tense or aspect in verbs.

Fig. 2.2.6 - Add Stems

Name: In this field, write the head word, in your target language

Translation: in this field, translate into English

Part of Speech: in this field, attach a part of speech. Note that the parts of speech available here must first be defined on the Parts of Speech page which can be found in the Admin menu.

Cognates

Fig. 2.2.7 - Cognates

Comments: any comments that have to do with the cognates for this phrase may be noted here

Add Cognate: clicking this button will open a box with four fields

Fig. 2.2.8 - Add Cognate

Form: this field is for the term/lexeme in the cognate language

Source: this field tracks the source material from which the cognate is derived

Language: in this field, write out the language of the cognate, e.g. Oneida

Translation: this field provides a translation for the cognate term

Semantics, Syntax, Speaker and Dialect

Fig. 2.2.9 - Semantics, Syntax, Speaker, and Dialect

Semantics: Any semantic categories that you want to attach to an phrase may be listed here. Semantic tags are useful when searching by topic (i.e. Animals, Body Parts, etc). Be thoughtful when deciding on semantic tags; avoid making duplicate or overlapping categories which may confuse search results

Syntax: If you want to tag a part of speech to an phrase, e.g. “verb,” “exclamation,” you can do so in this field. Like the Semantics field, be thoughtful when deciding on the tags for this field.

Speaker: This field tracks the language speaker who provided the phrase.

Dialect: Here list any relevant dialects you want to associate with the speaker and phrase.

Section 3 - Preparing Written Material for ILDA