Run-Length Encoding
About the code
Introduction to the method
#include <stdio.h>
#include "main.h"
#include <string.h>
typedef unsigned char u8;
int main(int argc, char* argv[])
{
This is our string stored in an array. And we store its length in a variable:
u8 input[] = "AAAAAEEEFFFFAAAHHHHJJJJJHHHH";
int inputSize = strlen((char*)input);
We will store the encoded result in another array. In the worst case it will use 2 times more bytes.
u8 output[inputSize * 2];
int outputSize = 0;
These 2 lines will display the input string on the screen:
printf("input: %s\n", input);
printf("input size: %d\n", inputSize);
We will parse the input string characters one by one.
int runStart = 0;
int currentByte = 1;
Then we start the loop over each characters, and whe calculate the length of the current run.
while (true)
{
int runLength = currentByte - runStart;
If the current byte is different from the first of the run.
if (input[currentByte] != input[runStart] ||
currentByte == inputSize ||
runLength == 255)
{
... Then we write the run to the output array, and we start a new one.
output[outputSize++] = runLength;
output[outputSize++] = input[runStart];
runStart = currentByte;
}
If we hit the end of the input string, we exit the loop.
if (currentByte == inputSize)
break;
Else we increment the current byte index and we loop back
currentByte++;
}
Finally, we print the result:
printf("output: ");
for (int i = 0; i < outputSize; i += 2)
{
printf("(%d)", output[i]);
printf("%c ", output[i + 1]);
}
printf("\n");
printf("output size: %d\n", outputSize);
return 0;
}
Download source code
Download executable for Windows
As expected, the result of this program is:
input: AAAAAEEEFFFFAAAHHHHJJJJJHHHH
input size: 28
output: (5)A (3)E (4)F (3)A (4)H (5)J (4)H
output size: 14
Now this example works, but what happens if we try with a string like "AEFAHJHABGHOZ" where there is no repetition?
input: AEFAHJHABGHOZ
input size: 13
output: (1)A (1)E (1)F (1)A (1)H (1)J (1)H (1)A (1)B (1)G (1)H (1)O (1)Z
output size: 26
Download source code
Download executable for Windows
The output is twice as long as the input. This is not a very efficient compression...
Codes, counts, runs and strings
#include <stdio.h>
#include "main.h"
#include <string.h>
typedef unsigned char u8;
#define MIN_RUN 3
#define MAX_RUN 127
#define MIN_STRING 1
#define MAX_STRING 127
I have put the compression routine in a seperate function for clarity:
void compress(u8* input, int& inputSize, u8* output, int& outputSize)
{
As before, we will use an index for the start of the current run and for the current character.
int stringStart = 0;
int runStart = 0;
int currentByte = 1;
bool isInRun = false;
We begin the parsing loop.
while (true)
{
If we are in the "string" state...
if (isInRun == false)
{
// string mode
At first we will update the runStart index if we found a character that is not the same of the current run.
if (input[currentByte] != input[runStart])
runStart = currentByte;
Then I tried to write this code in the same way as the one we saw in the first chapter.
int stringLength = runStart - stringStart;
int runLength = currentByte + 1 - runStart;
Then we check the conditions to end the string:
if (currentByte == inputSize ||
runLength == MIN_RUN ||
stringLength == MAX_STRING)
{
If one of these conditions is fulfilled, we output the "string".
// emit string
if (stringLength >= MIN_STRING)
{
output[outputSize++] = stringLength;
for (int i = 0; i < stringLength; i++)
output[outputSize++] = input[stringStart++];
}
Then, only if the minimal run length condition was true, we switch to the "run" state.
// go to run mode
if (runLength == MIN_RUN)
isInRun = true;
}
}
The "run" state is nearly identical to the code in the first chapter. The main difference is that we add 128 to the
else
{
// run mode
stringStart = currentByte;
int runLength = currentByte - runStart;
if (input[currentByte] != input[runStart] ||
currentByte == inputSize ||
runLength == MAX_RUN)
{
// emit run
output[outputSize++] = 128 + runLength;
output[outputSize++] = input[runStart];
runStart = currentByte;
// go to string mode
isInRun = false;
}
}
Then the end of the loop is the same as what we saw in the first chapter.
if (currentByte == inputSize)
break;
currentByte++;
}
}
The main function only feeds the input string to the compression function and prints out the result, taking into
int main(int argc, char* argv[])
{
u8 input[] = "ABCDAAAABBCDDDDEEEEE";
int inputSize = strlen((char*)input);
u8 output[inputSize * 2];
int outputSize = 0;
printf("input: %s\n", input);
printf("input size: %d\n", inputSize);
compress(input, inputSize, output, outputSize);
printf("output: ");
int i = 0;
while (i != outputSize)
{
printf("(%d)", output[i]);
if (output[i] < 128)
{
int length = output[i++];
for (int j = 0; j < length; j++)
printf("%c", output[i++]);
}
else
{
i++;
printf("%c", output[i++]);
}
printf(" ");
}
printf("\n");
printf("output size: %d\n", outputSize);
return 0;
}
Download source code
Download executable for Windows
The result of this code shows us a real RLE encoding.
input: ABCDAAAABBCDDDDEEEEE
input size: 20
output: (4)ABCD (132)A (3)BBC (132)D (133)E
output size: 15
Small optimisation
#define MIN_RUN 3
#define MAX_RUN 127
#define MIN_STRING 1
#define MAX_STRING 127
The minimal length of a run is 3. So the "count" will never be 128, 129 or 130.
// emit run
output[outputSize++] = 128 + runLength - MIN_RUN;
Similarly we will substract MIN_STRING from the "string" count.
output[outputSize++] = stringLength - MIN_STRING;
And now the counts can have greater maximum values
#define MIN_RUN 3
#define MAX_RUN (127 + MIN_RUN)
#define MIN_STRING 1
#define MAX_STRING (127 + MIN_STRING)
Download source code
Download executable for Windows
We also had to make a small modification when we print out the result on the screen, but that's not the most
input: ABCDAAAABBCDDDDEEEEE
input size: 20
output: (3)ABCD (129)A (2)BBC (129)D (130)E
output size: 15
Notice the first code (3)ABCD means that we will copy (3 + 1) characters.
Uncompressing
void uncompress(u8* input, int& inputSize, u8* output, int& outputSize)
{
int i = 0;
outputSize = 0;
Then we will loop until we reach the end of the input:
while (i != inputSize)
{
The first byte we "read" is the counter. If it is less than 128, we have a string and we output it.
if (input[i] < 128)
{
int length = input[i++] + MIN_STRING;
for (int j = 0; j < length; j++)
output[outputSize++] = input[i++];
}
If the counter was greater or equal to 128, we have a run and we output it.
else
{
int length = input[i++] - 128 + MIN_RUN;
for (int j = 0; j < length; j++)
output[outputSize++] = input[i];
i++;
}
}
}
So we will simply call this function after we compressed our string and see what we get:
uncompress(output, outputSize, input, inputSize);
printf("uncomp: %s\n", input);
printf("uncompSize: %d\n", inputSize);
Download source code
Download executable for Windows
And here is the result of this program. As expected we find our starting string.
input: ABCDAAAABBCDDDDEEEEE
input size: 20
output: (3)ABCD (129)A (2)BBC (129)D (130)E
output size: 15
uncomp: ABCDAAAABBCDDDDEEEEE
uncompSize: 20
Compressing files
int fileGetSize(char* fileName)
{
int fileSize;
FILE* f = fopen(fileName, "rb");
fseek(f, 0, SEEK_END);
fileSize = ftell(f);
fclose(f);
return fileSize;
}
The fileLoad function allocates memory and loads the file into it.
int fileLoad(char* fileName, u8** mem)
{
int fileSize = fileGetSize(fileName);
*mem = (u8*)malloc(fileSize);
FILE* f = fopen(fileName, "rb");
fread(*mem, 1, fileSize, f);
fclose(f);
return fileSize;
}
The fileSave function will be used to save both a compressed or an uncompressed file.
void fileSave(char* fileName, u8* mem, int fileSize, int originalSize)
{
FILE* f = fopen(fileName, "wb");
if (originalSize != 0)
fwrite(&originalSize, sizeof(int), 1, f);
fwrite(mem, 1, fileSize, f);
fclose(f);
}
Now using these utility functions we can write the functions that will compress or uncompress a file.
void fileCompress(char* srcName, char* destName)
{
u8* input;
int inputSize;
u8* output;
int outputSize = 0;
inputSize = fileLoad(srcName, &input);
printf("original size: %d\n", inputSize);
output = (u8*)malloc(inputSize * 2);
compress(input, inputSize, output, outputSize);
fileSave(destName, output, outputSize, inputSize);
printf("compressed size: %d\n", outputSize);
free(input);
free(output);
}
Then the fileUncompress function.
void fileUncompress(char* srcName, char* destName)
{
u8* input;
int inputSize;
u8* output;
int outputSize = 0;
inputSize = fileLoad(srcName, &input);
outputSize = *((int*)input);
output = (u8*)malloc(outputSize);
inputSize -= sizeof(int);
uncompress(input + sizeof(int), inputSize, output, outputSize);
fileSave(destName, output, outputSize, 0);
printf("uncompressed size: %d\n", outputSize);
free(input);
free(output);
}
To test these functions I chose to compress a 16 colors uncompressed Targa image.
int main(int argc, char* argv[])
{
fileCompress("image.tga", "comp.rle");
fileUncompress("comp.rle", "uncomp.tga");
return 0;
}
Download source code
Download executable for Windows
And here is the output of this program.
original size: 459986
compressed size: 170209
uncompressed size: 459986
Variants