- Published on
How to Fix Garbled Chinese Text and Convert TXT Files to UTF-8
If a Chinese TXT file opens as question marks, blocks, strange symbols, or mixed Latin-looking characters, the text is usually not destroyed. In many cases, the bytes are still there, but the app is reading them with the wrong character encoding. This is called mojibake.
The fastest fix is to open the file with the right source encoding and save a clean UTF-8 copy. You can do that with the WristTale TXT encoding converter. The converter runs locally in your browser, supports common legacy CJK encodings, and gives you a preview before you download the UTF-8 file.
If you want to understand the detection step first, read the file encoding detector guide.
What garbled Chinese text usually means
A .txt file is a stream of bytes. To display readable text, software has to know how those bytes map to characters. Modern apps usually expect UTF-8. Older Chinese files are often GBK, GB2312, GB18030, or Big5.
If a GBK novel is decoded as UTF-8, the result may look like corrupted text. If a Big5 file is decoded as GBK, simplified and traditional Chinese characters may turn into unrelated symbols. If a Japanese Shift_JIS file is decoded as a Chinese encoding, punctuation and kana can become unreadable.
The file may still be recoverable when:
- The original file was only opened with the wrong encoding.
- The file has not been saved again after becoming garbled.
- The same garbled pattern appears consistently through the file.
- Another editor or device can still display the text correctly.
Recovery is less likely when an app has already saved the visible garbled output back into the file. At that point, the original byte sequence may be gone.
Common source encodings to try
Start with auto detection, then test the most likely encoding manually if the preview still looks wrong.
| Source file type | Try first | Fallbacks |
|---|---|---|
| Simplified Chinese TXT novel | GB18030 | GBK, GB2312 |
| Traditional Chinese TXT novel | Big5 | GB18030 |
| Japanese TXT file | Shift_JIS | EUC-JP |
| Korean TXT file | EUC-KR | UTF-8 |
| Windows notes or old Western text | Windows-1252 | ISO-8859-1 |
| File with alternating blank-looking bytes | UTF-16 LE | UTF-16 BE |
GB18030 is a practical first choice for many simplified Chinese files because it covers a broad GB-family range. Big5 is the usual candidate for older traditional Chinese text from Taiwan or Hong Kong. Shift_JIS and EUC-JP are common suspects for older Japanese plain text.
Step-by-step fix
- Open the TXT encoding converter.
- Select the garbled
.txtfile. - Leave source encoding on auto detect and convert once.
- Read the preview. Do not download yet if the preview still looks wrong.
- Manually try GB18030, GBK, Big5, Shift_JIS, or UTF-16 depending on the file source.
- When the preview is readable, download the UTF-8 copy.
- Import the new
utf8_...txtfile into WristTale or your editor.
Do not repeatedly save the broken file in different desktop editors while testing. Keep the original file unchanged and create a separate UTF-8 copy only after the preview is correct.
How to tell the difference between encoding problems and font problems
Encoding problems and font problems can look similar, but they need different fixes.
An encoding problem often shows:
- Many unrelated symbols across the whole file.
- Chinese characters replaced by question marks or replacement characters.
- Chapter titles and body text broken in the same pattern.
- Different results when you choose a different source encoding.
A font problem often shows:
- Boxes for only some rare characters.
- Most common Chinese text still readable.
- The same file displaying correctly on another device with better fonts.
- No improvement when changing the source encoding.
The converter can help with encoding problems. It cannot add missing fonts to a device, and it cannot reconstruct text that was already overwritten after corruption.
Why UTF-8 is the target format
UTF-8 is the safest output format for modern web, mobile, editor, and Garmin watch workflows. It supports Chinese, Japanese, Korean, Latin text, punctuation, and rare Unicode characters in one format.
For WristTale, a clean UTF-8 file also makes chapter detection easier. If a chapter heading is garbled, WristTale cannot reliably recognize it as a chapter even when the original title format was correct.
The converter downloads UTF-8 with a BOM because some Windows tools still use it as a hint for plain text files. Unicode describes BOM as a signature that can identify the encoding of otherwise unmarked text files; in UTF-8 it is not about byte order.
Preparing the file for WristTale
After conversion, import the UTF-8 copy into WristTale and check the preview before syncing to your Garmin watch.
For long novels, the most stable workflow is:
- Keep the original TXT file as a backup.
- Convert a copy to UTF-8.
- Preview the converted text.
- Fix obvious repeated ads or boilerplate if needed.
- Check chapter headings.
- Sync a small batch of chapters to the watch first.
If chapter detection still misses sections, the problem may be the heading format rather than the encoding. In that case, convert the file to Markdown and use # Chapter title headings before importing.
Privacy note
The WristTale TXT encoding converter processes the file on your current device. The browser reads the file, decodes it, shows a preview, and creates the downloaded UTF-8 copy locally. The TXT file is not uploaded to a WristTale server.
That matters for personal notes, legally owned ebooks, training plans, race manuals, study materials, and other text that should stay on your computer.
References
- WHATWG Encoding Standard defines the browser encoding API and legacy encoding labels used by
TextDecoder. - Unicode UTF-8, UTF-16, UTF-32 & BOM FAQ explains how byte order marks work and why UTF-8 does not have byte-order ambiguity.