Project Nayuki

What are binary and text files?


On a computer, every file is a long string of ones and zeros. Specifically, a file is a finite-length sequence of bytes, where each byte is an integer between 0 and 255 inclusive (represented in binary as 00000000 to 11111111). Files can be broadly classified as either binary or text. These categories have different characteristics and need different tools to work with such files. Knowing the differences between binary and text files can save you time and mistakes when reading or writing data.

Here is the primary difference: Binary files have no inherent constraints (can be any sequence of bytes), and must be opened in an appropriate program that knows the specific file format (such as Media Player, Photoshop, Office, etc.). Text files must represent reasonable text (explained later), and can be edited in any text editor program.

Remember that all files, whether binary or text, are composed of bytes. The difference between binary and text files is in how these bytes are interpreted. Every text file is indeed a binary file, but this interpretation gives us no useful operations to work with. The reverse is not true, and treating a binary file as a text file can lead to data corruption. As a method of last resort, a hex editor can always be used to view and edit the raw bytes in any file.

File extensions

We can usually tell if a file is binary or text based on its file extension. This is because by convention the extension reflects the file format, and it is ultimately the file format that dictates whether the file data is binary or text.

Common extensions that are binary file formats:

Common extensions that are text file formats:

Binary file characteristics

Binary file in application (good)

Binary file in hex editor (okay)

Binary file in text editor (bad)

For most software that people use in their daily lives, the software consumes and produces binary files. Examples of such software include Microsoft Office, Adobe Photoshop, and various audio/video/media players. A typical computer user works with mostly binary files and very few text files.

A binary file always needs a matching software to read or write it. For example, an MP3 file can be produced by a sound recorder or audio editor, and it can be played in a music player or audio editor. But an MP3 file cannot be played in an image viewer or a database software.

Some binary formats are popular enough that a wide variety of programs can produce or consume it. Image formats like JPEG are the best example – not only can they be used in image viewers and editors, they can be viewed in web browsers, audio players (for album art), and document software (such as adding a picture into a Word doc). But other binary formats, especially for niche proprietary software, might have only one program in the world that can read and write it. For example, a high-end video editing software might let you save your project to a file, but this software is the only one that can understand its own file format; the binary file will never be useful anywhere else.

If you use a text editor to open a binary file, you will see copious amounts of garbage, seemingly random accented and Asian characters, and long lines overflowing with text – this exercise is safe but pointless. However, editing or saving a binary file in a text editor will corrupt the file, so never do this. The reason corruption happens is because applying a text mode interpretation will change certain byte sequences – such as discarding NUL bytes, converting newlines, discarding sequences that are invalid under a certain character encoding, etc. – which means that opening and saving a binary file will almost surely produce a file with different bytes.

Text file characteristics

Text file in text editor (good)

Text file in hex editor (inconvenient)

By convention, the data in every text file obeys a number of rules:

Observations regarding the general computing environment around text files:

Programming considerations

Every practical programming language provides separate facilities for working with binary versus text files. Generally speaking, if you read a binary file in text mode you will get unhelpful data that looks like garbage, if you write a binary file in text mode it will probably be corrupt, if you read a text file in binary mode you can’t perform any useful text operations on the bytes, and if you write a text file in binary mode you will need to manually convert characters to bytes. So it pays to use the right tools for the right job. To illustrate with concrete examples, let’s look briefly at how binary vs. text files work in three popular programming languages.