i keep wanting to find out what exactly a computational linguist does, whether in relation to AI or not. pls share if you feel like it
a computational linguist essentially works with natural language processing (NLP) models and one responsibility is figuring out a way to abstract into and implement numerical representations of grammar, syntax, and pragmatics (all of which we usually just say are "syntactic units") so that a computer can understand it and basically create mathematical formulas for language that can also be transformed into language that a human understands. in early NLP work, you usually implemented dictionary databases of a language's words and then applied linguistic rules to it like what classification of word it is (ex. verb, noun, pronoun, etc.) and word order of a source/language. down the line, when computing costs became cheaper, these syntactic units and rules had an additional feature of probability added to it (ex. based on language data statistics, which is the most likely meaning for this word-unit?) and then computational linguists had to figure out a way to translate that into computer language and formulas to computers so (1) the computer can "understand" human language and (2) so it can output high-fidelity (aka correct grammar) language that a human could understand instead of just 0s and 1s.
an example of NLP architecture would be a parse tree (left image) linguistics version that is understandable to a human (right image) computational linguistics version that contains nodes that simply references a data point which holds a word. (similar to how symbols work/symbolic reference of say a node = symbolic reference of a letter/visual aesthetics of a word). that abstracted node version of a language tree can easily be translated into another language tree which is then decoded as another language. as a metaphor think of a german tree to french tree.
computational linguistics nowadays is centered on large language models (LLMs), a type of generative AI, and it takes the probability feature in previous NLP models to the max. it trains on millions of data and learns how important each syntactic unit is based on the other syntactic unit in a sentence and the entire paragraphβthis is called the "attention mechanism", or what people call "context"β(depending on how much electrical power and time you can spare) , then does that again dozens and dozens of times until it generates an output that we, humans, understand as language. and this comp lingustics isn't limited to visual language (reading/writing), there's some that deal with spoken or audio language; the ability to recognize or generate auditory language. it's based on the same premise except an additional axis of data are audio files. phonetics get involved so phoneticians and phonologists are valuable in that aspect of comp linguistics. one part of computational linguistics is trying to find a more efficient way to do all those attention mechanisms in a way that takes up less time and electricity. another part is trying to figure out a better way to represent language as a math formula (in a way). there's also fine-tuning any errors or improving the fidelity of NLP results. there's also the subfield of computational semantics which deals with the tokenization or quantification of the meaning/interpretation/semantics/pragmatics of language and embedding into or advancing current technology of NLP models. there's other parts too like cryptography in comp linguistics where you try to figure out a more abstracted model of language and how to assign different word classes to each word, or a more efficient way to do so. that specialty leans very heavily into "pure linguistics" even if you eventually have to turn it into symbolic mathematics.
i hope this level of abstraction makes sense. if not just ask! i'm literally doing research for this stuff right now and one of the issues with computational linguistics research, especially in AI, is the lack of interpretability T_T
















