Linguistics, technology and protecting your data
By A. Asohan January 7, 2015
- Linguistics has many similarities with tech, need for logical and analytical minds
- Tech companies always on the lookout for such talent, say two experts
IF you were asked to picture what a linguist does for a living, the first thing that would probably come to mind is being a translator or interpreter – those people with headsets speaking quietly at UN meetings, for example.
“That’s the most common misperception of what linguists do,” says Ekaterina Pshehotskaya, technology development director at Data Loss (or Leakage) Prevention (DLP) specialist InfoWatch, which was spun off from Russian security company Kaspersky Lab in 2012.
Linguistics is the scientific study of language, and while acknowledging that translating is a viable career choice, she argues out that the field is wider and more interesting than that.
“Linguistics isn’t just about grammar rules or translations. Language enters into almost every area of human activity, and that is why the application of linguistic analysis can be extremely broad – linguists can teach, they can write language systems, and more,” she says.
She points to famous linguistics experts who have applied their discipline elsewhere and became iconic in other fields: Philosopher, cognitive scientist and political activist Noam Chomsky; Alexander Graham Bell who is credited for inventing the telephone; and of course, philologist J.R.R. Tolkien, whose fascination for languages led to him creating his own and writing The Lord of the Rings.
She also points to Edward Sapir, the anthropologist-linguist who developed the discipline in its early days, and who in his Language: An Introduction to the Study of Speech, wrote, “Language is the most massive and inclusive art we know, a mountainous and anonymous work of unconscious generations.”
“Language is everywhere,” adds Pshehotskaya.
The formal sub-fields of linguistics include phonetics, phonology, morphology, syntax, semantics, pragmatics, discourse analysis, stylistics, semiotics and so on, while areas of research include comparative linguistics, sociolinguistics, neurolinguistics and others.
Pshehotskaya and her colleague Tamara Sokolova, lead linguist at InfoWatch, were in Malaysia late last year for International Conference on Cyber-Crime Investigation and Cyber Security, and were speaking at the Asia Pacific University in Kuala Lumpur at a special workshop.
“Our work at InfoWatch is in applied linguistics,” says Pshehotskaya.
In today’s world, where technology is part of our daily lives, the field has been thrown wide open, especially with increasing interest in the inter-related fields of natural language processing (NLP), machine learning and languages, and computational linguistics.
The intersection between linguistics and technology is perhaps no surprise, given that both rely on rules-based modelling. Pshehotskaya and Sokolova are part of a five-man team at InfoWatch who work with the developers there.
READ ALSO: The advancing storm: Key strategies for cyber-defence
Interest in languages
Both Pshehotskaya and Sokolova are professional linguists who specialise in NLP, and have written articles on theoretical and applied linguistics, computational intelligence and data protection. They are Moscow State University graduates, the former with a PhD and the latter with a Master’s.
“I was always interested in the structure of language, but it was only during my university days that I got interested in NLP and decided to go into technology, ending up in DLP,” says Sokolova (pic).
So what skills must a linguist possess? Perhaps it’s no surprise that many of them coincide with the skills you would expect from a technologist.
“Linguists need to combine logic, language and communicational skills,” says Sokolova.
“You need numeracy, because you always have to be precise in linguistics; you must have lots of logical thinking and analytical skills for data analysis.
“You will do lots of interesting things, but there are also lots of routine things that you must be able to do. For example, if you have to learn a new language, you would have to go to some exotic place where there are folks who speak a language that has not been described yet. You will have to speak to them and collect a lot of data.
“You will listen to them and learn about their traditions – and that’s great, that’s fantastic. But then you return home with all this data you’ve collected from these people, and you have to sit down and process all this data, even if it’s for your own sake.
“You have to analyse each word, and sometimes even the phonetics behind each word. It requires a lot of time and precision to do that. And a lot of routine, but then it becomes interesting again as you process and understand that data.
“Routine is inevitable, but it will lead to breakthroughs,” she adds.
As for communication and presentation skills, Sokolova says that even linguists who work alone will have to present their results to others, whether through their papers or at conferences, and make sure their findings are understood.
“If you work for some commercial organisation, it becomes even more essential because you will have to be a team player,” she says.
“For example, at InfoWatch, we work in the IT field. We have to communicate with developers because we will have to develop algorithms and they have to write the code, and then we piece it altogether.
“I have to understand them, and I have to be able to explain to them what we need that particular algorithm to achieve.
“You need to be able to analyse and understand complex systems, and explain the concepts to other people,” she adds.
Would programming skills be needed if you’re a linguist working with a technology company such as InfoWatch?
“You don’t need to know a programming language, but it helps,” says Sokolova. “When you work in IT, you will need to pick up some additional skills.
“Both of us [she and Pshehotskaya] did learn a bit of programming at university, but of course we had to pick up more at InfoWatch,” she adds.
Linguistics in IT
InfoWatch products deal with protecting data and preventing leakage, the number of incidents which, according to the company’s Global Data Leakage Report, is growing by about 20% each year, leading to billions of dollars in losses to companies and organisations.
Data can be leaked intentionally – your insider with a bone to pick, for instance – or even inadvertently because of how the data is classified and stored in an organisation.
Pshehotskaya and Sokolova work primarily with the team that develops the company’s flagship product Traffic Monitor, which protects against the leakage of data from not just an internal source (employees) but also external (partners, suppliers, etc.), and not just intentional but also accidental data loss from carelessness or mistakes.
The linguistics part comes in how data is classified, says Sokolova, and in developing language support for InfoWatch products.
“We participate in data classification or linguistic analysis – when we implement a system for a company, we develop particular topics of classified information for each unit within the organisation, or develop the perimeters for how data is stored,” she says.
“We also develop language support for the systems we implement – for example, if the company is in Russia, then we include support for the Russian language. It’s the same with a company in Saudi Arabia, where we will have to develop support for the Arabic language.
“We have support for a lot of languages at InfoWatch,” she adds.
The linguists at InfoWatch also analyse semantic fields and develop lexicons or lists of key words for how data is identified and classified, and also have touch-points with customers.
“We consult with clients to collect their requirements, understanding their needs and converting them into clear business priorities,” says Sokolova.
“We create and tune new technologies for filtering confidential content – we work on new technologies using different linguistic algorithms, test them and then fine-tune them,” she adds.
Their linguistics background give the two ladies insight into cluster analysis, which helps identify patterns in data and how they’re being moved about in an organisation.
Linguistics can also play a part in forensics when investigating a cybercrime, says Pshehotskaya (pic).
“It can be used to collaborate and examine communications between users, using linguistic markers to decipher encrypted data.
“There are opportunities for linguists in many technology companies around the world,” she says. “We at InfoWatch are always on the lookout for such talent.”
Preventing data loss, and the health of your business
Worldwide info-security spending to grow nearly 8%: Gartner
Security no longer about ‘no,’ but ‘know’
Mobile forensics to be shaped by smartphones, tougher encryption
For more technology news and the latest updates, follow us on Twitter, LinkedIn or Like us on Facebook.