Skip to Content

Multimedia Corpora of Mexican Sign Language (MSL) with Syntactic Functions.

NameCorpora of Mexican Sign Language (MSL)
Linkhttp://cienciadedatosupiita.com/content/bd_creation.sql
TitleMultimedia Corpora of Mexican Sign Language (MSL) with Syntactic Functions.
Presented byPichardo-Lagunas, O. Martinez-Seis, B.
LanguageMexican Sign Language
Language codesgn-MX
Categoryresource
Statusavailable
Typecorpora
Year2018

The sign languages around the world do not have an oral or written form what increases the difficulty to document them. In a sign language, the sign is not only the hands movement but also the face expressions, the body movement, the intensity, and other elements. The Mexican Signs Language (MSL) is different and independent from the Spanish spoken in Mexico. It has its own vocabulary and grammar.

A way to document the sign languages is with draws, but it makes difficult to show and to interpret the signs with movement, the intensity of the sign, and other no manual elements. In this sense we build corpora that includes the word in Spanish, definition in Spanish, videos, and images. Besides the words and their signs, the corpora include collocations, synonyms, and the syntactic function of each word according to the Royal Spanish Academy (RAE by its name in Spanish of Real Academia Española), according to Freeling, and a manual verification.

The corpus was raised with the support of the community of the Deaf Culture House of the Cuauhtemoc Delegation in Mexico City. This institution is respon- sible for providing basic education to adults with different types of disabilities, providing special attention to the deaf community in the capital city.

The corpora are presented in a multimedia database. It is in sql format with ten tables and references to videos and images. The general table has word, lemma, definition in Spanish, syntactic function according to Freeling, and ref- erences to videos and images. Linked to this one, there is the table of synonyms in Spanish, synonyms in MSL, collocations in Spanish, and a link to the tables that save the study of the syntactic function of the words.

There are 1,505 words in Spanish related to 1,019 videos of signs which were saved as sprites. All the videos are signs done by the same person. Each word has a definition. In the RAE, each word has one or more meanings that the word can have depending on the context in which it is used. The definition in the corpora includes only the meanings that are related to the word and its representation in MSL. Also, each word is related to two images that represent the meaning of the word. The images were extracted from Google and Bing. The lemma of each word is also presented in the corpora.

The synonyms in Spanish include 890 words that were validated. It means that specialists verified that a word, its synonyms, and its sign represent the same thing. The synonyms in MSL are related to words that are not synonyms in Spanish but are represented with the same sign in MSL. For example, the words motor (engine) and fábrica (factory) refer to different things but is express by the same sign.

The collocations included in the corpora are expressions of two or three words in Spanish that are represented only by one sing in MSL. We identified 37 collocations in the expressions and words that we studied. For example, the object máquina de coser (sewing machine) is represented by only one sign. It is important to mention that there are words in Spanish that are represented by more than one sign. For example, the word niñas (girls) is expressed by tree subsequent signs. The corpora do not include this kind of collocations.

The syntactic functions of the words presented in this corpora are the result of a manual evaluation verified by specialists from the syntactic function detected by Freeling and given by the RAE to each word. For example, the word frı́o (cold) and its corresponding sign can be used as a noun, an adjective, or a verb (the word frı́o is a verb conjugation of fry in Spanish).

The syntactic function in the corpora includes: 775 nouns, 55 pronouns (di- vided in 25 personal, 1 demonstrative, 2 possessive, 6 indefinite, 1 interrogative, and 2 comparative pronouns), 251 adjectives (divided in 240 qualifying, 1 demon- strative, 2 possessive, 1 interrogative, 3 indefinite, and 2 comparative adjectives), 171 verbs, 73 adverbs (21 quantity, 19 time, 18 place, and 40 other adverbs), 11 prepositions, 1 article, 40 determinant, 4 conjunctions, 9 dates, 43 numerals, and 17 interjections.

This corpus is not only a dictionary between Spanish and the Mexican Sign Language, they are also a verified compilation of the syntactic function of the words that could be used in natural language processing applications, in machine translation systems or in the learning of the sign language. The corpus is available in http : //cienciadedatosupiita.com/content/bdc reation.sql. If you need all the images and videos contact us. If you use the corpus, please cite this document.