Regex for Translation QA: Smarter Checks with Simpler Rules

Introduction

In this post, I share how I used regular expressions (Regex) to streamline my translation QA process. These automated checks helped me catch common issues like incorrect punctuation, untranslated text, and formatting inconsistencies. Designed with practical examples and intuitive logic, this project gave me hands-on insight into how small automations can make a big difference in quality assurance. Whether you're a new translator or a workflow optimizer, I hope this post offers helpful guidance for smarter QA practices.


Tips for Regex (Regular Expression)

When I am checking my translation (from Chinese to English), the first issue that comes to my mind is that I need to pay attention to punctuation, making sure that I haven’t input Chinese punctuation marks into my target text translation.

So I used the “find and replace” function in the top right corner of “Home” section. I input those most frequently used Chinese punctuation marks in “Find” blank, and replace them one by one with corresponding English punctuation marks:

The punctuation marks I input include: full stop, comma, colon, semicolon, question mark, exclamation mark, and suspension points.

Second, I want to make sure there is no missing punctuation mark at the end of a completed sentence. So I went to the “Verify” function in the top right corner of “Review” section, and added a new item to the Regular Expression. My description of this function is “Sentence ends without a punctuation mark”. It is set to find and warn me under the condition when the target text matches the Regex code “\w+(?=\s*$)”.

This code means to find any word or words that are at the end of a string that may have zero or more whitespaces. In other words, it checks if the word is at the end of a sentence with no punctuation mark following it.

The only problem with this code is that it can also find titles, which match the criterion exactly. Like in my translation, the first line is marked as warning, though it is the headline of this paragraph.

Third, I want to check if I mistakenly copied the source text into the target text without translating. So I need to find all the Chinese characters in my target text. And by this step, I realized that since I was checking my translation from Chinese into English, I could think from another direction: it seems difficult to list all the Chinese characters, but it would be simpler for me to find characters that are not English or Latin. So I added a second item to the Regular Expression with the description of “Mistakenly copied source text into target text”. The Regex code is “[^\p{IsBasicLatin}]+”.

“\p{IsBasicLatin}” can help me find all the Latin characters, and “[^…]” reversed this function, so I will get a warning for all the words that are not Latin characters. And it worked: line 14 is marked as warning.

Fourth, I want to check my translation that I haven’t mistakenly input two or more whitespaces. So I went back to the “Find and Replace” function. I input “\s(2,)” in “Find” blank, and input “ ”(a white space) in the “Replace” blank. “\s(2,)” is able to find two or more whitespaces in the text.

And it successfully found that in line 4, there are multiple whitespaces, and replaced them with one single white space.

While I was doing this project, I found that the website regex101.com was really helpful. It can not only demonstrate the outcomes of the codes but also provide useful coding information with detailed explanations.


Team CAT Project

In this group project, we have a simulation of a project from start to finish, which helped us know how to apply the skills learned in class and identify opportunities to fill in the knowledge gaps. I connected with our team, meet with our client with a scheduled time, and worked on the Proposal/SOW, translation, and presentation.

Specifications:

The Team CAT Project requires a team of students to complete a translation using Trados Studio (and any other technologies deemed necessary). The translation will simulate the experience of translating in a small, in-house translation team or in a small group of associated freelancers.

Project Proposal:

Project Deliverables:

Group Project Source Text:

Translation Style Guide:

Group Project Pseudo translation:

Glossary:

Target Text:

Previous
Previous

Web Game Localization: Gluttonous Snake

Next
Next

CAT Tool Group Project by “Creative Name”