Revistes Catalanes amb Accés Obert (RACO)

Le projet bourciez: traitement géolinguistique d’un corpus dialectal de 18951

Gotzon Aurrekoetxea, Charles Videgain


This contribution shows the way to carry out the geolinguistic exploitation of texts gathered in the 19th century by E. Bourciez, using different automatic tools. After typing the manuscript in a word processor, the “Simple Concordance Program” was used to create the list of words of the 150 Basque texts of the collection. Afterwards, all of the words were introduced in a database. Then, the lemmatisation process was done, with a program made ad hoc. This step is necessary to use data in a geolinguistic way. And the final step was to exploit the data in the dialectometric way, using the VDM program conceived by Hans Goebl.

Text complet: PDF