UNCOMMON SENSE
Welcome to my website, largely devoted to showcasing some of the algorithms I have designed. Presently, it features the following two coding methods namely,
To discover this form of coding, we adopt a practical, hands on, touch and feel approach first which paves the way for a more rigorous theoretical approach later. Interestingly, the practical solution to this problem is a common geometric pattern and the theoretical solution can be a geometric progression so if one chooses, one can call this form of coding as geometric coding.
The key idea behind this is, if we are trying to code a longer bit string as a shorter bit string, then we need to find a way of redistributing the weights of the various positions of the longer bit string among the various positions of the shorter bit string. Ideally, some of the desirable properties that we would like such a redistribution of weights to possess are,
Conservation of sum : The sum of the weights assigned to the bit positions of the shorter string must add up to the sum of the weights of the longer string.
Preservation of Order : For any given position in the shorter bit string, it's weight must be larger(or at least not less than) the sum of the weights of all the positions preceding it. If is the weight of the position of the shorter string and , then must be satisfied.
Optimality : The growth of these weights is fairly optimal. This means, as decreases, even though decreases yet, increases and does so fairly optimally.
Minimality : The least weight is not too large.
We find the following different triangular distributions of weights, which map a bit string of length to a bit string of length possess most of the above characteristics.
The weights of the successive positions of the longer bit string are clearly . For the sake of convenience we can use the subscript to denote the weight .
Let us begin by examining the various possible distributions for the simple case where . We start off with the vertical distribution which is as follows,
14 
13 
11 
8 
4 

12 
10 
7 
3 


9 
6 
2 



5 
1 




0 
Bear in mind that the weights for the five individual positions of the shorter string are the sum of the weights of the five respective columns of the above triangle.
But this distribution is not too promising. So, next we try a horizontal distribution which is from right to left as follows,
4 
3 
2 
1 
0 
8 
7 
6 
5 

11 
10 
9 


13 
12 



14 




This is certainly better, but we find that moving from left to right is the most ideal and the best as follows,
0 
1 
2 
3 
4 
5 
6 
7 
8 

9 
10 
11 


12 
13 



14 




which is also the same as the diagonal distribution as follows,
14 
13 
11 
8 
4 
12 
10 
7 
3 

9 
6 
2 


5 
1 



0 




In practise, if we wish to code a number which is less than , we first distribute as weights by succesive halving among its various bit positions. This length will never be of the form . So we could try extending or reducing, whichever is more closer and hence more suitable. For instance consider reducing a bit string of arbitrary length . We first determine the largest such that . Then we take the weights of the leading positions of the bit string and distribute them in the chosen manner. Thereafter, we distribute the remaining weights of the trailing bit positions again in some(possibly same or possibly different) chosen manner, either among a few of the lower rungs of the triangle or over all of it's rungs whichever is possible. Extending needs no elaboration.
Other possibilities conceivable are, increasing the vertical step which may not be all that helpful on it's own, or increasing the horizontal step which is reasonable, or optimally doing both. For example the following is horizontal left to right distribution of the weights with a vertical step of two,
0 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 

14 
15 
16 
17 

18 
19 
20 


21 
22 
23 


24 
25 



26 
27 



28 




29 




Although this increases the compression ratio, it also increases the ratio considerably which is not so promising on it's own.
Increasing the horizontal step performs better. Consider a left to right horizontal distribution with horizontal step two of the weights 19, 18, 17, ......, 3, 2, 1, 0 .
We first choose alternative weights and form two triangles as below,
0 
2 
4 
6 
8 
10 
12 

14 
16 


18 



and,
1 
3 
5 
7 
9 
11 
13 

15 
17 


19 



Next, we merge the two triangles by laying the columns alternatively side by side as follows,
1 
0 
3 
2 
5 
4 
7 
6 
9 
8 
11 
10 
13 
12 


15 
14 
17 
16 




19 
18 






As is clearly evidenced, increasing the horizontal step decreases the compression ratio but also decreases the ratio. So, one could try striking a balance.
Again, this right triangle pattern need not be the only one. One can lay two right triangles back to back to form the shape of an isosceles triangle. If the horizontal steps of both those triangles are different this will result in a shape that resembles a scalene triangle.
However, this is a one time fixed redistribution of the weights of the longer string alone. To achieve compression we need to repeat the process as follows,
Suppose has been distributed among bit positions in a triangular fashion and a number has been coded as and the rightmost in that code occurs at bit position , we retain the bit code for the leading bits of , note , discard the trailing of the code and repeat the triangular coding for the new value of which is now exactly for the corresponding new value of which is now where here of course is the complete code with the trailing included.
Now we attempt to adopt a more theoretical approach. If is distributed among bits then we require the following conditions to be satisfied,
, assuming a geometric progression for the ratio's involving the succesive weights.
We need to determine the optimal value of for given value's of and . Working upwards we have,
Similarly we need,
Likewise,
We gather that the factor is common for all and the other factor is a series involving and since , for the largest weight , this series can be approximated by it's most dominant or largest term. So, we can say .This means when we sum up all the there will be a minimal constant such that .Now by mere observation of the terms , we can see that taking the upper bound of to be will be safe and reasonably close to optimal. Besides, the inequality can be proven indirectly by Mathematical Induction. Hence, we need to determine such that,
need not be too small although it can be as it is in the case of the practical approach. A smaller value of will favour coding of smaller numbers.
Reason tells us that this form of coding can only be a special case of a more general form of coding for any arbitrary base not less than 2, and that increasing the base should yield better compression.
The following familiar observations about the growth of numbers as a result of performing the common arithmetic operations of addition, multiplication and exponentiation on them is the motivation behind discovering this form of coding. These observation are pretty plain but I list them below nonetheless.
Addition increments lengths. If we add two numbers of length bits each, the resulting sum will be a number of length not more than bits.
Multiplication adds lengths. If we multiply two numbers of length bits each, the resulting product will be a number of length not more than bits.
Exponentiation multiplies lengths by powers of 2. If we raise to the power two numbers of length bits each, the resulting power will be a number of length no more than bits.
This form of coding replaces the more expensive operation of exponentiation by the less expensive operations of multiplication and addition. The idea for this is very simple but does take a flash of inspiration and a bit of imagination on one's part. Loosely speaking, this form of coding can be called guess work coding.
Suppose we wish to code a number , we first choose a base (arbitrary for now, optimal later). Clearly, we will have a length such that,
This means there will exist largest digits such that,
We bear in mind that is always and not .
Let be one of the above digits for which is the closest value not more than . In all, there are digits so can be coded as a value which will be the base of the code. Now there will exist some digit along with some radix such that is the smallest value larger than . So the new value of will now simply be and the new value of will be . Plainly speaking, this code is simply based on the near identity,
An Improvement
By enlarging the base of the code for we should get much better compression. In general each of the terms can be replaced by one of the values in the range over where . This increases the base of the code to , but should offer way much better compression.