Choreonumeric Entropic Data Reduction

In May 1972, Schalkwijk, Antonio, Petry published an article on variable-to-fixed-length source encoding data-reduction,- by my design. [Proceedings of the Fifth Hawaii International Conference on System Sciences, pg. 498] Ca 1975, I proposed an optimization and generalization to my [*UCSD] algorithm

REVIEW:

A simple algorithm approximating arithmetic coding (multiplicative fractionating) by selective index cumulation:

[entropic progress encoding] The advance of signal-information entropy (-log p_k) in a sequence of signals p, probability p, taken of a repertoire, can be plotted as a tree of branches of various lengths to nodes of more or less comparable progress -coinciding most commonly if the p have coincident powers- in an array of multiplicative terms p₁^i₁·p₂^i₂·p₃... (or logarithm thereof, a more information-obvious representation of the cumulating entropy) ... if all-but-one p are small the practical tree is a one-branch run-length code; In our run-skip-length codes it is trimmed triangular.

In our coding trellis (a tree reconnected at coincident progress) [ibid] this was represented as counts of paths-out at each node:

Eg. for the case of p,q where q¹=p², drawing several views of the same triangle:

pºqº p¹qº p²qº p³qº p⁴q° 1.000 .618 .382 .236 .146  8 5 3 2 1  8
pºq¹ p¹q¹ p²q¹(p³q¹)     0.382 .236 .146(.090)      3 2 1 ¹   3 5  (3 via q + 5 via p)
pºq²(p¹q²)               0.146(.090)                1 ¹      1 2 3
                                                              ¹ 1 2  (¹,1, are 0-info)
[NB The bottom-edge steps can be omitted by boundary-checking]   ¹ 1

The coding/decoding procedure selectively cumulates/decumulates the path-counts skipped stepwise (depending on which step path is taken),- thus partitioning the total path-count (which should fill a power of two for efficiency in binary signalling) and distinguishing each path uniquely, efficiently. Eg. code value 6 of 0-7 here represents the path q-p-q (skip 5 p + take 2 p + skip 1 p = 6 p skipped).

... which summarily reconstitutes our article of May 1972.

The triangular form obviously extended to multidimensional alphabets of p_n, planar-faceted, hypercube corners, and to sources with hysteresis, pre and post memory, cusped long-tail triangles where p-probabilities tend to favorably remember their usage statistics, prompt-adaptive, which get at the run-lengths when those occur (*) ... but specializations though obvious were not articled there.

* (A prompt-adaptive algorithm needs the probabilities of the probabilities.)

THE NEW VIEW:

Linearization of the multidimension array, and generalization of algorithmic progress:

Obviously in the regular-edge triangle case the shorter row-runs of probability terms or our path-counts can be compactly stored as end-runs of the longer rows (eg. 3-2-1 in 8-5-3-2-1): a fold-up. And similarly at higher dimensions, spreads stored as end-fans ... etc.

But by interpretation this is a linear hopscotch of paths representing run-skip-length progress on an information-entropy -log- plot, with distinct nodes regrouped with near nodes as integral steps (a slight quantization loss). This interpretation generalizes, no longer requiring strictly regular skips nor particular shape but overall entropy progress: At each step is a vector of the path-out count and skip-length (plural for higher dimensions; shortest first). And the technique is applicable to anisotonic memory-chained stochastics.

PROCESS SUMMARY, EXAMPLE:

A coding pass is a linear choreograph hopscotch of steps-taken, as indicated by the total coding sum; a uniquely, lossless, compressed message. Each step is a quantum approximation-estimation, with direction 'arrows' to its immediate subsequents; (frequency-weighted quantization error being its loss of data-compression efficiency).

Choreonumeration simply counts any course across the 'floor'. The algorithm is essentially fast linear run-step/skip ... for any signal repertoire, spelling alphabet, or sampling domain. Furthermore, steps through entropy space need not be regular,- though that was our introduction: This algorithm is not, constrained to memoryless codes.

Where needed, path-lengths can be resolved to fractional steps of entropy on parallel paths until remerged at whole common steps, as essentially integerized logarithmically implemented arithmetic coding. Even the final whole step can be fractionally trimmed by having the next coding begin an alternative hopscotch (in both sender and receiver) assigned and tuned to effect probability concatenation.

(The resulting data structures are fast additive rather than multiplicative, and can be simply implemented by simple machines.)

LONGER ARRAYS, EFFICIENTLY:

As choreonumeration is an approximation to arithmetic coding, its subcounts need not be compacted integer but "hull"-measure, each bounding its sum, to keep path enumerations unique and stepping efficient albeit not exactly all utilized. This allows floating-integer format, where the scale is about the number of digits to process and the fraction approximates the stepped coding precision keeping the fill tight. Scale can be assumed along the way with cumulative tweaking; and long run-arrays can thus be utilized very compactly. For the process-summary example above, the array extended leftward,--

... 978793 532159 289329 157305 85525 46499 25281 13745 7473 4063 2209 1201 653 355 193 105 57 31 17 9 5 3
--might be compacted (in decimal-based radix for quick illustration) to:--
... 995000 541000 294000 160000 86600 47100 25600 13900 7520 4090 2220 1210 653 355 193 105 57 31 17 9 5 3
= 3:995 3:541 3:294 3:160 2:866 2:471 2:256 2:139 1:752 1:409 1:222 1:121 0:653 0:355 0:193 0:105 0:57 0:31 0:17 0:9 0:5 0:3
--which can be futher compacted by recording cumulative tweaks, to:--
= [4] [-1] 995 541 294 160 [-1] 866 471 256 139 [-1] 752 409 222 121 [-1] 653 355 193 105 [-1] 570 310 170 [-1] 900 500 300

It is conceivable also, from this example, that an efficient run-skip-hop array can contain the entire cycle (1000-long here), and be reused cyclically (thereby more effectually) ... Thus again indicating this process is simply arithmetic coding done by array-lookup. And at this point it may be worth noting that the compact cyclic array can be built from top-down rather than bottom up; or both; And in this case, the array extended rightward might be:-- [under construction momentarily]

= [4] [-1] 995 541 294 160 [-1] 868 472 257 139 [-1] 756 412 222 122 [-1] 660 355 201 107 [-1] 580 315 171 [-1] 932 507 276

FINAL COMMENTS; More to do; More mathematics; More under construction:

1. This algorithm approaches fixed-length-to-variable coding, Huffman or a variant, as triangles become shorter and the various tails concatentate onto alternative short triangles (the -group- not the triangles themselves being closed under the concatenation process).

2. One more insight I considered at the time believing we were probably implementing arithmetic coding despite a given necessity to examine an algorithm [*1972], was to track the interval with cumulations of log₂ p,q to many places: bounding and sending bits in the clear not straddling. A way to prevent straddling appeared to involve flipflop the code domain so that the smaller q always decoded, and the larger p was always delayed, until the end-flush. A priori the algorithm looked like it may have needed to stack intermediate cumulations. [under construction]

* [Our original undergraduate work came about because arithmetic coding by partitioning the unit probability interval was computationally expensive in floating point multiplications and Dr. Schalkwijk's designed savings using combinatoric integer factoring of n!/k!n-k! effecting a rectangular section of Pascal's Triangle approximation to a probability-coefficients triangle was the subject of our group study;- but noticing that his procedure was equivalent to index-cumulation in a trellis, I cut his rectangle into a triangle and applied Pascal's technique from '1's on the jagged hypotenuse:- a small efficient memory eliminating multiplications altogether (selective-cumulation itself constitutes a model-adapted multiplication). I left multidimensional adaptation as obvious; and later simplified the process by folding up a regular-edge triangle, ca 1975;- and thence simplified by choreonumerating altogether. Recently I added mention of floating-integer formatting]

A premise discovery under the title,