Medium2 marksStructured
Fundamentals of data representationGeneralcompressionRLErun length encoding

AQA GCSE · Question 17.2 · Fundamentals of data representation

Figure 2 shows a string.
Figure 2
MISSISSIPPI
One method for compressing data is run length encoding (RLE). When using RLE, the data in Figure 2 become:
1M 1I 2S 1I 2S 1I 2P 1I
Explain why RLE is not a suitable method for compressing the data in Figure 2.

How to approach this question

1. **Understand RLE:** Run Length Encoding works by replacing consecutive sequences ("runs") of the same character with a count and the character itself. For example, `AAAAA` becomes `5A`. 2. **Analyze the input string:** Look at "MISSISSIPPI". Are there any long runs of identical characters? The longest run is "SS" (twice). Most characters appear individually. 3. **Analyze the output:** The RLE output is `1M 1I 2S 1I 2S 1I 2P 1I`. 4. **Compare sizes:** Count the characters in the original (11). Count the characters in the compressed version (16, if you count each number and letter). 5. **Formulate the explanation:** Conclude that RLE is unsuitable because it has made the data larger, and explain that this is because the original string lacks long runs of repeating characters.

Full Answer

RLE is not suitable because the compressed data is longer than the original data. The original string "MISSISSIPPI" is 11 characters long. The RLE version "1M 1I 2S 1I 2S 1I 2P 1I" (ignoring spaces) is 16 characters long (8 pairs of count and character). Therefore, it has increased the file size instead of reducing it. This happens because RLE is only effective when there are long runs of repeating consecutive characters, which this string does not have.
Run Length Encoding (RLE) is a compression algorithm that works by finding "runs" of consecutive, identical data values and storing them as a single value and a count. For example, `WWWWWHH` would be compressed to `5W2H`. In the case of the string `MISSISSIPPI`: - The original string has 11 characters. - The RLE output is `1M 1I 2S 1I 2S 1I 2P 1I`. If we count the characters needed to store this (e.g., 8 numbers and 8 letters), we get 16 characters. The primary reason RLE is unsuitable here is that **the compressed data is larger than the original data**. This phenomenon, where a compression algorithm increases the data size, is known as expansion. It occurs with RLE when the data does not contain many consecutive runs of repeating characters. In "MISSISSIPPI", most characters are not followed by the same character, so representing them as `1M`, `1I` etc., takes two characters instead of one.

Common mistakes

✗ Just stating "it doesn't work" without explaining why. ✗ Incorrectly calculating the lengths of the strings. ✗ Confusing RLE with other compression methods like Huffman coding.

Practice the full AQA GCSE Computer Science Paper 2

46 questions · hints · full answers · grading

More questions from this exam