What is file compression

Data compression definition & explanation of terms

The storage space on a computer is always limited. It is therefore always important to keep the stored data as small as possible. The same applies to the transmission of data. For example, if data is transmitted over the Internet, the longer the transmission takes, the more data is transmitted. In order to utilize the capacities for the data storage and for the data transmission as little as possible, methods were developed to store or transmit the same information content with a smaller amount of data. These procedures are summarized under the term data compression.

There are many different ways to compress the data. When choosing the procedure, it is always important to consider the type of data. The different compression methods are differently well suited for different types of data, for example there are special methods for compressing text, while others are suitable for compressing images or audio data.

An important point with the various compression methods is always whether the data compression is lossy or not. With lossless compression, the data is available again in its original state after decompression. This is particularly important for documents or for executable files, since a change in the data can change the functionality of the same. In the case of lossy data compression, on the other hand, the compressed data cannot be returned to their original state. These methods are appropriate for data where a slight variation does not cause major problems.

Another distinction is made between asymmetrical and symmetrical data compression. With asymmetric compression, the process of compression takes considerably longer than decompression. With symmetrical compression, on the other hand, both processes take about the same time.

Lossless data compression

In many cases it is very important that the data can be exactly restored to its original state. In the case of executable files, for example, a single changed bit can render the entire program unusable. It is therefore very important to choose lossless data compression here. With this form of compression, the aim is to avoid redundancies. Redundancies are repetitions of the same data.

To compress a file, the compression software searches through it for repeating segments. If some segments are found that are repeated several times in the file, they are replaced by an abbreviation. The compression program notes which abbreviation stands for which data segment and passes this information on.

To make this process more pictorial, the file can be thought of as text. In any longer text it happens that many words are repeated. If you now replace the most frequent words with a number, this takes up considerably less storage space than the original word. When compressing, it must be noted which number stands for which word. To decompress the text, simply replace the numbers with the words again. There are many other methods of data compression, but the basic principles are always similar. With this form of compression it is possible to restore the original data true to the original.

Lossy data compression

The lossy compression is usually due to a so-called irrelevance reduction. Information that is not considered important is no longer saved. In this way, the amount of data can be reduced considerably. When reducing irrelevance, it is always important to adapt the form of compression to the type of data. Therefore, there are also very different procedures. However, it often happens that the discarded information is not completely irrelevant, which can lead to a loss of quality in the data.

One example of lossy data compression for image files is JPEG compression. A digital image consists of many individual color points. In particular, if the neighboring color points are very similar, the human eye cannot perceive the differences. The JPEG format therefore defines fields with a similar color and then gives them a new color that corresponds to an average value of the original colors. This significantly reduces the amount of data. If the areas for unification are small in size, the difference is hardly noticeable to humans. However, the amount of data that is saved is also quite small. With a stronger compression, the amount of data can be reduced more, but the loss of quality becomes clearly visible.

Another example of a compression method is the MP3 format. This contains both lossless and lossy compression. The lossless part of the compression consists in the formation of the difference in the stereo channels. To record a piece of music in stereo, two full channels are usually used. However, the information is often very similar in both channels. In order to reduce the data, it is possible to map the second channel as the difference to the first.

If the process is similar, the amount of data can be reduced considerably. This process is lossless. The lossy element of compression is removing areas of sound that the human ear cannot perceive. In addition, tones that are at the limits of the audible range and therefore cannot be so clearly differentiated can be masked by other tones. This means that instead of many different tones, a single tone is displayed. The effect is barely noticeable to humans.

Source for your quote: The above definition may be quoted in commercial and non-commercial publications (thus also in term papers, forums, social media pages) without further inquiry. Just copy and paste the link below for your quote:


If necessary, please use "o.verf." For your bibliography. and "o.J." to declare the internet source.