A database of Welsh tweets is being used to identify the characteristics of an evolving language.

If I want to find out whether a particular construction is emerging, I would normally have to conduct a time-consuming pilot study, but with Twitter I can get a rough and ready answer in 30 minutes

David Willis

Twitter keeps millions of people in touch, whether it’s sharing their politics with followers or updating their mates with the trivia of everyday life. These tweets are in Welsh: ‘loaaaads o gwaith i neud a di’r laptop ’cau gwithio!’, ‘dio cau dod on!! Mar bwtwm di tori.’ Roughly translated, they read: ‘loads of work to do and the laptop won’t work’ and ‘it won’t come on!! The button’s broke.’

How do you capture changes as they take place in the language we use in everyday life – from buzz words such as ‘sweet’ to tags such as ‘innit’? One answer is to look at tweets. Because they don’t follow the conventions of written language, tweets provide an authentic snapshot of the spoken language. By analysing the content of the 140-character messages, linguists can get to grips with the dynamics of the language played out in real time.

Welsh is spoken by 562,000 people in Wales; 8% of the country’s children learn it at home as their first language and 22% are educated in Welsh.

Like all living languages, Welsh is constantly changing and new varieties are emerging. When Dr David Willis from Cambridge’s Department of Theoretical and Applied Linguistics set out to research the shifts taking place in Welsh, he used a database of Welsh tweets as a means of identifying aspects of the language that were changing, and then used that information to devise the questionnaires used for oral interviews.

He explained: “When your intention is to capture everyday usage, one of the greatest challenges is to develop questions that don’t lead the respondent towards a particular answer but give you answers that provide the material you need.”

“If I want to find out whether a particular construction is emerging, and where the people who use it come from, I would normally have to conduct a time-consuming pilot study, but with Twitter I can get a rough and ready answer in 30 minutes as people tweet much as they speak,” he said. “My focus is on the syntax of language – the structure or grammar of sentences – and my long-term aim is to produce a syntactic atlas of Welsh dialects that will add to our understanding of current usage of the language and the multi-stranded influences on it. To do this relies on gathering spoken material from different sectors of the Welsh-speaking population to make comparisons across time and space.”

In the late 17th century, the antiquarian Edward Lhuyd conducted an investigation into the dialects of Wales. By the 19th century, Welsh was attracting the attention of European historical linguists such as Johann Kaspar Zeuss. Later, scholars all over Europe, realising that local dialects were receding in the face of industrialisation, sought to record variations in language. Large dialect atlases were undertaken in Germany and France, and speech archives were begun, such as the one that laid the foundations for the National History Museum at St Fagan’s near Cardiff.

In the 1960s the attention moved away from rural areas to the cities where most people by then lived – and researchers started to look at sentence structure, an area of language that presents particular challenges for investigators. Willis’s interest in syntax stemmed from his study of a wide range of minority languages, including Breton, which is, like Welsh, a Celtic language. To create the biggest possible picture of syntactic changes in Welsh as it’s spoken today, he decided to take an inclusive approach and set out to investigate day-to-day speech patterns of a broad range of speakers, aged 18–80.

British Academy funding for a year-long study has enabled Willis and assistant researchers to interview around 160 people across Wales, beginning his analysis with North Wales where the language is thriving and a significant number of children use Welsh as their home language. The study included both those who had acquired Welsh at home and at school.

The spoken questionnaire asked interviewees to repeat in their own words sentences that were presented to them in deliberately ‘odd’ Welsh that mixed different dialects, inviting the interviewee to rephrase the awkwardly phrased sentence to sound more ‘natural’. An example in English might be ‘we’ve not to be there yet, don’t we?’ which a British speaker might be expected to rephrase as ‘we haven’t got to be there yet, have we?’

The data from these interviews are a treasure trove of information in terms of the light their content can shine on how and why the structure of language shifts over time – and give the researcher a valuable database not just for the present study but also for future research.

Changes identified so far include use of pronouns and multiple negatives. An analysis of usage of the Welsh words for ‘anyone’, ‘someone’ and ‘no-one’ reveals that there are differences between those who learnt Welsh in the home (who are more likely to say the equivalent of ‘did someone come to the meeting?’ and ‘I didn’t see no-one’) and those who learnt it at school (who are more likely to say ‘did anyone come to the meeting?’ and ‘I didn’t see anyone’).

One example of multiple negatives reveals a shift in meaning of the Welsh word for refuse, ‘cau’. “We knew that people in the north used the word ‘cau’ to mean ‘won’t’, saying the equivalent of ‘the door refuses to open’ for ‘the door won’t open’. Negative concord – such as saying ‘I haven’t not seen no-one’ for ‘I haven’t seen anyone’ – is a strong feature of Welsh. We’ve now identified two groups in the north: one that still says ‘the door refuses to open’ and the other that have begun to say ‘the door doesn’t refuse to open’. The next step is to work out when and how this change occurred.”

In tracking shifts in the language, GIS mapping is used to plot where interviewees were brought up and enables researchers to look at the geographical spread of particular aspects of syntax, making comparisons between age groups, gender and mode of acquisition.

The research has revealed that, while Welsh does not vary much by social class, there are interesting differences between the variety of Welsh spoken by those who learn it as their first language in the home and that spoken by those who are first exposed to it in nursery or primary school.

“Those who acquire Welsh once they reach school are more likely to use English sentence constructions, which are perfectly good Welsh but differ significantly from the constructions used by those who acquired Welsh at home. For example, they tend to prefer standard focus particles – words that correspond to a strong stress in English sentences like ‘I know YOU’ll be on time’ – over the ones from their local dialect,” said Willis.

With around 22% of the Welsh population educated in Welsh at school, and all children learning it as a second language, data on this aspect of language acquisition may prove valuable in developing Welsh teaching policy – for example, in determining which forms to teach second-language learners or in promoting both dialect and standard written Welsh in schools.

Inset image credit: Howard Beaumont

This work is licensed under a Creative Commons Licence. If you use this content on your site please link back to this page.