One of the technical challenges in the VR/voice-recognition vocoder and music audio compression industry is the cost of detecting pitch and computing harmonic content and formant identities: We examine a lower-cost mathematical solution |
The audio (listening) spectrum is logarithmic---the steps in a tempered music "octave" (cf piano key half-steps) progess by multiplicative factor of 21/12~1.06 (1.0595 but both are more precise than noticeable over the 6-7 octave range of music). [Nom. "octave" is a closed range by a factor of 2x]
It is usually estimated that music does not need further definition---indeed, adjacent mid-range audio tones "beat" instead of distinguish well (eg. A-440 beats with A#-466 at 26 cps: within the phoneme rate)---harmonic correlations reveal more distinguishability than dissonant beating (but slow beating is sometimes musically desireable undulation like organ "Leslie"). For utility, quarter-notes logarithmically 1.03x apart are deemed indistinguishable beyond acute tuning requirements.
It is however only slightly discriminating: Average amplitude is 50% about 1.5x/ frequency. And detection-points are harmonic sensitive: High frequencies have repetitious nulls at lower sub-harmonic frequencies---but various further techniques can reduce these: Eg. add-back the upper harmonics to their sub-harmonics' pre-outputs, or, include simple parameters that adjust the reflection in the delay line (the low frequency detectors are at multiplied wavelengths of the higher, and so are multipliedly desensititized by their slight smearing---but such smearing must increase successively for lower frequencies), or, filtering using octaval filters (simply efficient with single-bit multiply: power-of-2 fast-shifting and single addition).
Or, both distinctness and harmonic reduction can be handled by cumulating more "taps" along the sample line: About 16, with the signal over-sampled about 8x, keep the filtering efficient within 3% amplitude and 3% aurum frequency spacings.
And this is efficiently applicable to logarithmic scales where frequency (sub)steps are ordered by a (constant) ratio: For example, the audio discriminator receives 5% of its 3%-shy adjacent neighbor frequencies (a 95%-null) at half the ordinary music-pitched half-steps (half of 5.95%) -- needing a dozen cycles to resolve it. Discriminating has significantly faster sensitivity than ordinary transform detectors, but does take a longer dwell to resolve it against the noisy average total. Also, by slightly smearing the reflection to reduce sub-harmonic detection, the high-end frequency spectrum is also smeared, reducing its nulling depth -- requiring longer detection dwell, but overall detecting a thick (sub)step of close frequencies (which ordinary transform discriminators overly detail at the high end).
The QNL method made simplistic in digital, tracks a key band of frequencies (which can be a full key step 1.06x rather than half-step) and finds the best tracking, yielding that amplitude---it also tends to find the central frequency when compounded signals are presented at the input, thus resulting in proper "audio beating", replacing the common Fourier/Hilbert/McClaurin/Laplace transforms pure-spectral analyses significantly lacking the temporal responsivity component and so misrepresenting "beating" as modulation (typically half the beat-frequency).
The QNL tends insensitive to integral harmonics, and if computed peak-to-peak cyclic, insensitive to subharmonics. Thus it is ideal for power-aurum processing. Further, what harmonics do get through the use of squarish sampling (in the simplest implementation) are further reduced by statistical dither in the time base as the QNL tracks.
A premise discovery under the title,