Text file encoding

You can use this calculators to encode a text with an encoding.

In the previous article, I already touched on the topic of text encodings, described in more detail Unicode and its UTF-8 representation as a sequence of variable length characters.
This calculator can convert text to several outdated encodings. I call them outdated because, in modern applications, it is possible to use Unicode and its most convenient representation, UTF-8.
However, old encodings can also be useful when you need to compactly encode the text, for example, for subsequent compression and transmission, when the receiving party knows for sure in what encoding the text is transmitted. For example, Russian text encoded in Windows-1251 will take up half the space than text in UTF-8.
So the calculator below allows you to download a file in the selected encoding or view a hexadecimal dump of the encoded text.

PLANETCALC, Encoded text

Encoded text

File
 
Hex dump
 



You can view the created file using the Text file decoder.

The calculator will return an error if an incompatible encoding is selected. In the case of Unicode, this is not possible - it contains characters from all modern languages. But outdated 8-bit encodings contain a limited set of characters. For text in several languages, the required encoding may not be found at all.
Many encodings were invented for different languages and character sets in the years before Unicode, so choosing the right encoding for your text can be a daunting task. The following calculator finds all encodings compatible with the entered text.

PLANETCALC, Choose text encoding

Choose text encoding

The file is very large. Browser slowdown may occur during loading and creation.

The calculators support 70 different encodings:

IBM EBCDIC

EBCDIC - standard 8-bit encoding developed by IBM for use on IBM mainframes.

Encoding Languages / Countries
EBCDIC 424 Hebrew Hebrew
EBCDIC 037 USA/Canada USA, Canada, Portugal, Brazil, Australia, New Zealand, South Africa
EBCDIC 1026 Turkish Turkish
EBCDIC 500 International International
EBCDIC 875 Greek Greek

ISO 8859 encodings

Family of ASCII compliant encodings developed by International Organization for Standardization (ISO) and International Electrotechnical Commission (IEC)

Encoding Languages / Countries
ISO 8859-2 (Latin-2) Eastern European languages using the Latin alphabet
ISO 8859-5 Cyrillic
ISO 8859-6 Arabic
ISO 8859-7 Modern Greek
ISO/IEC 8859-1 (Latin-1) Western European languages
ISO/IEC 8859-10 (Latin-6) Northern European languages
ISO/IEC 8859-11 Thai
ISO/IEC 8859-13 (Latin-7) Estonian, Latvian, Lithuanian
ISO/IEC 8859-14 Celtic languages
ISO/IEC 8859-15 (Latin-9) Western European languages
ISO/IEC 8859-16 (Latin-10) Eastern European languages using the Latin alphabet
ISO/IEC 8859-3 Turkish, Maltese, Esperanto
ISO/IEC 8859-4 (Latin-4) Estonian, Latvian, Lithuanian, Greenlandic, Sami
ISO/IEC 8859-8 Hebrew
ISO/IEC 8859-9 Turkish

KOI8 encoding family

KOI8 - 8-bit ASCII compatible encoding to represent letters of Cyrillic alphabets

Encoding Languages
KOI8-R Russian
KOI8-U Ukrainian

Mac OS Encodings

Encoding Languages / Countries
Mac OS Celtic Celtic languages
Mac OS Gaelic Gaelic
Mac OS Central European Central European languages
Mac OS Croatian Croatian
Mac OS Cyrillic Cyrillic
Mac OS Greek Greek
Mac OS Icelandic Icelandic
Mac OS Inuit Inuit
Mac OS Roman Western European languages
Mac OS Romanian Romanian
Mac OS Turkish Turkish

DOS Cod Pages

Encodings for MS-DOS and similar operating systems.

Encoding Languages / Countries
DOS Latin US (CP437) Eastern European languages using the Latin alphabet
DOS Greek (CP737) Greek
DOS Baltic Rim (CP775) Estonian, Latvian, Lithuanian
DOS Latin 1 (CP850) Western European languages
DOS Latin 2 (CP852) Eastern European languages using the Latin alphabet
DOS Cyrillic (CP855) Cyrillic
CP 856 Hebrew Hebrew
DOS Turkish (CP857) Turkish
DOS Portuguese (CP860) Portuguese
DOS Icelandic (CP861) Icelandic
DOS Hebrew (CP862) Hebrew
DOS French Canada (CP863) French
DOS Arabic (CP864) Arabic
DOS Nordic (CP865) Nordic
DOS Cyrillic Russian (CP866) Russian
DOS Greek 2 (CP869) Greek

Windows encodings

Encoding Languages / Countries
Windows-1250 Central and Eastern European languages
Windows-1251 Russian, Ukrainian Belarusian, Serbian, Macedonian, Bulgarian
Windows-1252 Western European languages
Windows-1253 Modern Greek
Windows-1254 Turkish
Windows-1255 Hebrew
Windows-1256 Arabic
Windows-1257 Estonian, Latvian, Lithuanian
Windows-1258 Vietnamese
Windows-874 Thai
Windows-932 Japanese
Windows-936 Simplified Chinese
Windows-949 Korean
Windows-950 Traditional Chinese
KZ-1048 Kazakh

Others

Encoding Description
Atari ST Encoding used in Atari home personal computers
GSM 03.38 The encoding was used in GSM networks for SMS, CB (broadcast short messages), and USSD
KPS 9566 An encoding developed in North Korea to support Hangul Korean characters
ISO 8-bit Urdu (IBM CP1006) The encoding used by IBM on the AIX operating system in Pakistan for the Urdu language
ISO-IR-68 Encoding for representing characters in the APL programming language

The rules for converting encodings to Unicode were obtained from the unicode.org1 site.


  1. Unicode encoding mappings http://www.unicode.org/Public/MAPPINGS/ 

URL copied to clipboard
PLANETCALC, Text file encoding

Comments