movergence compressed video

high-powered video compression can achieve picture-phone ideals directly upon extant low-bandwidth digital telephony circuits We examine an implementable near-occular model of vision

[REF: the tele-phane/vid-link distance conferencing model]

The ideal video display (not too presumptiously estimated) consists of 3-D objects situated in foreground and background scenarios, moving horizontally and vertically, and approaching and receding, turning freely, and exhibiting (self-similar) sub-object sub-motion: eg. a speaker's face attending left and right, lips moving, eyes glancing and blinking, backgrounded indoors by people milling about.

The ideal image recognition and video compression will similarly objectify the speaker and people, and stationary objects in the room, and reconstruct the image at the receiver, having sent only (minimally) continual motion and change informations, dominated by movements and convergences, in combination called, movergence.

A rudimentary object cognizer (as we discuss here) detects image-local translatory motions, directions and rates, compares them for local convergences, and sends regular update-reports (signals): receiving and retaining the same information at the receiver for reconstitution and re-apparition as objects turn away and back-again in view.

A typically medium-resolution [500x500=250K] digital color camera for the tele-phane/vid-link application presenting 4 individuals plus diagrams in conference on a standard high-resolution display - about 2K x 1K (=2M) pixels / 4+4 (=8) sub-frames = 500x500 (=250K) pixels each - captures moving imagery rapidly [typically 30-60 frames/sec] for picture clarity and motion estimation, from which a full image must be built-up to good resolution in a fraction of a second, and then maintained and sharpened. The sender and the receiver must also identify overlapping portions (typically one partially transparent) moving at distinct rates. And, individual pixels are in good color, typically 8-bit [4+2+2]. This is data bandwidth easily exceeding 56KB/sec typical of modem-telephony, but pre-loading images is facilitated by the image-caching model in the tele-phane/vid-link design, and thereby sped-up for initial view: Subsequent viewing image resolution is maintained by cache on the receiver side.

THEORY [compressing image information]

Image pixels are sent sparcely (similar to video-scan interleaving), with subsequent transmissions including interstitial pixels, until the whole image is sent. This would take 40 seconds at 56KB/sec for one medium resolution color image direct pixel-by-pixel, and telephany needs 30 images per second, and 8 pictures: a factor of 10K speed.

Instead: By sending each pixel with its spatio-temporal (positional) offset relative to its placement with respect to the preceding frame (which thus requires a high-powered computational image processing engine) the pixels can be retained for subsequent reuse, even when hidden momentarily, until they expire by refreshment or replacement.

Thus a typical connection can send a simple full-color good resolution image in the first 1 sec of connection, and then move and update it continually thereafter about 1 sec lag behind live-direct.

VISIBLE RESOLUTION

Stationary objects require the highest resolution. Very slow linearly motion still requires high resolution, but faster motion less, as the object moves against an interesting background, and quickly out of range. Accelerated objects need much less resolution, as these are not readily anticipated, nor followed, though at slow speeds simple linear motion tracks them sufficiently. Turning objects require fairly high resolution, as the turn is within the focal span, and expected with visual practice: Turning motion is non-linear, non-simple acceleration - it's labelled, vergence: as edges disappear and reappear.

The simplest image reduction for telephany would be to capture a complete image of the speaker's face, estimate the shape of the head, and affix the face-image to the head, letting a cartoon-script [JAVA] match the face-motions to the audio channel.

[under construction]

. . .

Grand-Admiral Petry
'Majestic Service in a Solar System'
Nuclear Emergency Management

© 2001 GrandAdmiralPetry@Lanthus.net