Researchers are addressing a critical need for enhanced technical understanding of artificial intelligence within the language and translation industries. Ralph Krüger from the Institute of Translation and Multilingual Communication, University of Applied Sciences Cologne, in collaboration with colleagues, details a novel technical curriculum designed to cultivate domain-specific AI literacy amongst translation and specialised communication stakeholders. This curriculum focuses on core concepts including vector embeddings, tokenization and transformer architectures, aiming to develop computational thinking and algorithmic awareness. The significance of this work lies in its potential to bolster digital resilience within a rapidly evolving, AI-driven professional landscape, as demonstrated through initial testing within an MA course at TH Koeln.

Scientists are developing a new technical curriculum designed to enhance artificial intelligence (AI) literacy within the language and translation (L and T) industry, addressing a growing need for professionals to understand the underlying principles of modern language-oriented AI, particularly as large language models (LLMs) reshape workflows and automation capabilities. The curriculum moves beyond simply using AI tools and instead focuses on building a foundational understanding of how these technologies actually function, empowering stakeholders with the computational thinking and algorithmic awareness necessary to navigate increasingly AI-driven work environments and maintain agency in the face of rapid technological change. vector embeddings, a method of representing words and sentences as numerical data; the technical foundations of neural networks, complex computational systems inspired by the human brain; tokenization, the process of breaking down text into manageable units; and transformer neural networks, the architecture powering many of the most advanced LLMs currently available. By demystifying these concepts, the research seeks to foster a more informed and resilient workforce capable of critically evaluating and effectively utilising AI technologies. Researchers tested the curriculum’s effectiveness within an AI-focused Master of Arts course at the Institute of Translation and Multilingual Communication at TH Köln. The study constructed a series of interactive notebooks to deliver foundational knowledge of language-oriented AI, specifically targeting stakeholders needing to develop computational thinking and algorithmic awareness. Initial exploration began with vector embeddings, representing words and phrases as numerical vectors to facilitate computational analysis, introducing the concept of representing linguistic units in a multi-dimensional space to enable machines to understand semantic relationships. The curriculum then delved into the technical foundations of neural networks, including product calculation, non-linear and softmax activation functions, preparing users for the complexities of transformer architectures. The third notebook focused on tokenization, a crucial process for reducing the vocabulary size processed by neural networks, with users progressing from word- and character-based tokenization, evaluating the trade-offs of each approach, to subword tokenization, which combines the benefits of both. Three prominent subword algorithms, Byte-Pair Encoding (BPE), WordPiece, and Unigram, were implemented, allowing users to directly compare their outputs on example sentences and examine how each algorithm handles word boundaries. This hands-on experience extended to exploring token IDs, unique identifiers assigned to each token, and accessing the vocabulary of the GPT-2 language model to understand its internal representation of language. Building upon these foundations, the fourth notebook provided an in-depth examination of transformer neural networks, differentiating between encoder-decoder, encoder-only, and decoder-only models, illustrating their suitability for various tasks such as text summarization, question answering, and text generation. Detailed exploration of the encoder and decoder components within an encoder-decoder transformer model followed, focusing on the original architecture proposed by Vaswani et al. to provide a clear understanding of modern LLMs. Where significant differences exist between the original architecture and current models, such as position embedding creation or layer normalization, ‘architecture update alerts’ directed users to relevant literature. The self-attention process was visualized using the BertViz package, allowing users to dynamically explore how contextualized representations are created from initial input embeddings. Initial results indicate the curriculum successfully imparts technical knowledge, but participant feedback suggests integrating it within a broader didactic framework, such as enhanced lecturer support, would optimise the learning experience. This highlights the importance of not only delivering technical content but also providing the necessary scaffolding for students to effectively absorb and apply it. Participants demonstrated a substantial increase in self-assessed technical knowledge of language-oriented AI, moving from a mean score of 3.72 (standard deviation = 2.13) in the pre-test to a mean of 6.76 (standard deviation = 1.43) in the post-test, a highly statistically significant improvement with a p-value of less than 0.001. understanding the basic operating principles of language-oriented AI increased from a mean of 3.67 (SD = 2.24) to 6.73 (SD = 1.33); knowledge regarding the training and fine-tuning of these technologies improved from a mean of 3.04 (SD = 2.18) to 5.87 (SD = 1.77); and the ability to assess how language-oriented AI supports translation technologies rose from a mean of 4.46 (SD = 2.77) to 7.67 (SD = 2.09). A retrospective self-assessment, conducted within the post-test questionnaire, revealed an initial underestimation of existing knowledge, with participants retrospectively rating their pre-course knowledge at a mean of 2.93 (SD = 2.19), notably lower than the initial pre-test score of 3.72 (SD = 2.13). This response shift suggests that engagement with the curriculum fostered increased metacognitive awareness, allowing participants to more accurately gauge their own competence, with the effect size calculated from this retrospective assessment reaching d = 2.07, further supporting the conclusion of a substantial treatment effect beyond initial self-selection biases. The work represents a significant step towards bridging the gap between AI development and practical application within the L and T sector, ultimately aiming to cultivate algorithmic agency, the ability to understand, evaluate, and influence algorithmic systems, and contribute to the digital resilience of language professionals. By fostering technical AI literacy, the study suggests a pathway for stakeholders to move beyond being passive users of AI and become active participants in shaping its future within their field, offering a valuable resource for ensuring a human-centred approach to AI implementation in translation and specialised communication. The relentless advance of artificial intelligence demands a new kind of professional development, and this curriculum represents a pragmatic response to that need. For years, discussions around AI’s impact on work have focused on displacement and automation, often overlooking the crucial gap in understanding how these technologies actually function. Simply knowing that a large language model exists is insufficient for professionals who must collaborate with, manage, or even critique its output. The focus on core concepts, vector embeddings, tokenization, and the transformer architecture, is particularly astute, as these aren’t merely technical details, but the building blocks of modern AI, and understanding them fosters a crucial form of ‘algorithmic agency’. While the pilot study demonstrates the curriculum’s effectiveness, the call for greater didactic scaffolding is a vital observation. True literacy isn’t simply about absorbing information, but about integrating it into existing workflows and critical thinking processes. Looking ahead, this type of targeted AI literacy needs to move beyond isolated courses and become embedded within professional training standards. The challenge isn’t just teaching the ‘what’ of AI, but the ‘why’ and the ‘when’, when to trust its outputs, when to question them, and how to mitigate potential risks. Furthermore, broader efforts should explore how these technical foundations can be adapted for other professions facing similar disruptions, fostering a more resilient and informed workforce across multiple sectors.

👉 More information
🗞 A technical curriculum on language-oriented artificial intelligence in translation and specialised communication
🧠 ArXiv: https://arxiv.org/abs/2602.12251

Source link