computer processor theory history notes

Before these things disappear utterly from standard use, because, today these are typically subcircuits, one-of-a-thousand inside a larger computer in a distributed-processing-network there-of...

THE EARLY CONCEPT: CPU Central Processing Unit, today's PU/LPU/SPU Local Sub-Processing Unit, not necessarily central:

ALU/AU/ALFU/FCU/NLU Arithmetic Numeric Logic Function Calculation Unit; also digit-manipulation (bit, binary; trit, trinary); numbers in sub-process format (unnormalized-carry), basic arithmetic, fast-as-possible, hardware-efficient, chip-realestate-efficient;
REGISTERS: first-concept memory, fast-as-possible vs. convenient-as-possible; machine state, flags, tags, numbers in fixed-width sub-process format, (typ. 4-32: operator, operand, result, instruction-program-address, data-pointer/index), typ. interchangeable, multi-register extended-precision; loaded from instruction, via ALU, via other register, loaded-saved via memory, via IOP, vs. same-as-lowest-memory (typ. the first 4-32 multibyte-'words' thereof);
CENTRAL CONTROL AND INSTRUCTION DECODE: interpretation of instructions to hardware-action, primarily linear-sequential-program memory addressing with conditional secondary address jumping;
MEMORY: bulk-large memory, slower than registers; programs, data, data-arrays; numbers in compact format, tagged, parity-detection, error-correction, protected areas, segment-paging, hot-paging, other-process buffering, alteration-change-save-monitoring; short-range 'relative'-addressing vs. large-field 'absolute'-addressing;
IOP Input-Output Processor: slower peripheral interface processes, box-internal or external, virtual-extended memory, data-storage (e.g. disks), other fast-processors;
MORE DETAIL:
Finite State Machine: usually referring only to Program Location and Register values, IOP likewise; allowed to run ad infinitem (or finitely large approximation thereto); (theory-technically all data values comprise the machine state);
Interrupt structure: levels, arm, enable, trigger, corresponding memory allocations;
ALU: arithmetic add-subtract-negate, logical and-or-exor-complement, shift-left-carry, shift-right-signextend-carry, partial-word-access;
Instruction Set: typically vertical extent, constant-width, some multiples-thereof, minorly variable subpart-fields, 1-of-N-decoded (e.g. 256 instructions per 8-bit-width);
multiple-length instructions: instruction with long numeric literal, 'Huffman'-coded instruction set (less-often-used are longer instructions);
compact number format: typ. binary, short-integer aka indice, short zero-fill, short sign-extended, long-integer, probability-fraction, floating-point integer-scale-characteristic-exponent and fraction-mantissa, implicit high-order-bit, significant-digit-precision-monitoring, 2's-complement arithmetic/1's-complement (high-order-carry-wraparound, unused/wiped-zero);
Assistant-instructions: cache-control-functions, software-where-hardware-needs-more-time;
MODERN EXTENSIONS:
REGISTER-STACKS (3-4 x 16-64), REGISTER-QUEUE-STREAMS (3 x 64-1024), REGISTER-STRING-DESCRIPTORS (3-4 x 10MB-1GB); half-as-many top-registers, one-bit sub-access-control (push-append-shift-'output', pop-depend-deshift-'input'); overflow-exception-handling; power-time-efficient large-addressing (adjacency-decoding); N.B. codependency;
INSTRUCTION SUBPROCESS PIPELINING (where not-already parallel);
SUB-INDEXED-REGISTER-PAIRS: (pipeline decision-streamlining avoiding conditional-program-flow-jumping: by equal-purpose-alternate-register-pairing) one-bit switched-sub-selection, key-and-indice streams, stream-length-cumulators, key-statistics/averaging-cumulators (n-log-n-sorting-thresholds);
SUB-INDEXED-FUNCTION-PAIRS: (pipeline decision-streamlining avoiding conditional-program-flow-jumping: by equal-purpose-alternate-function-pairing)
MULTI-ENVIRONMENT: banked machine-state-register-tags-flags, bank-per-environment (typ. 4-16), assignable by task-priority-intensity-IOP-level, minimax-scheduling; instantaneous low-overhead bank-switching, extensible into main memory, stacks, pages; (N.B. secure independency for unrelated tasks; codependency with single kernel task; shallower stacks-per);
TRAPPING MECHANISM: instantaneous memory-indirection redirection-step, subprocess-emulation-simulation-implementation-upgrade, occasional-overflow-error-prevention processing; N.B. event-conditional precedence-priority levels; N.B. a stack is queued in overflow;
EXTENDED-ALU: multiplication (fast direct, efficient double-complement-algorithm, semi-fast-or-serial e.g. 'Booth'-algorithm), conditional-subtract-shift-flag, division (usually by SAR Successive Approximation Register), encryption, DES;
COMPOUND-FUNCTIONS: (multi-register-read/write) complex-number arithmetic, quotient-&-remainder, rotation-sum-&-difference, full-precision-multiplication, address-descriptors, radix-(e.g. binary/bit)-interleaving/deinterleaving;
EXPANSIVE-FUNCTIONS: decode/encrypt-wide, multiplicative-squaring, (additive carry-and-overflow are like extra-bits just not full-word-extra);
PASS-CACHE: auto-access copy-aside-and-reuse passing data (one-bit sub-access-control reread-prior vs. reread-updated value; can obviate read-to-register);
MEMORY-CACHE: volatile fast-memory; array-processing, instruction-program-loops;
LOOP-CACHE: auto-access high efficiency program loops;
ARRAY-CACHE: efficient array-based process structuring, 'matrix-row-cache', 'submatrix-cache';
REGISTER-TABLES: fast, special-purpose register-files; paging-table for main memory, spectral-array-processing;
IOP-C-FIFO: inter-processor communications-FIFO; control-data queue-streams (2 x 64-1024);
AUTO-CODE: direct instruction-address-to-decode, faster-bigger variable-width instructions, FPGA Field-Programmable Gate Array, nonlinear instruction-addressing, (cf hyperthreading but which is more flexible-volatile); horizontal-architecture super-instructions each assigned to a set of machine-state-addresses so-interleaved (est. 2x-3x-faster than vertical-architecture), machine-state-determined-instructions, program-ownership-protection (post-QUAL machine-state-flow-signature-detection);
DATA-TYPING: indirection-control ('value-call/name-call'), system-and-process-protection;
DATA-UNITING: units-tracking, units-conversion, units-checking (cf torque vs. work both force×distance but torque is static-perpendicular and work is dynamic-inline);
MULTI-CORE/GPU/RISC: general-processing-units (plural-units), reduced-instruction-set-(optimized)-CPUs; specialized vector-processing;
BETTER EXTENSIONS:
Vertical architecture instruction-set constant-width-loading tends more flexible, more efficient with constant-width memory; loads faster (cf horizontal loads wide superinstructions -and- addresses, though fewer of them);
Array memory by paging-descriptor 'sparse' addressing to reduce micro-code address space, secure arrays from each other, and from single-valued items, and attach indexing (one register per, albeit multiple arrays may be defined as the same in perspectives);
Rapid-sequential-memory-sub-blocking (single-level-adjacency-addressing-decoding);
Semi-logical memory by trees: category-association and refinement, to reduce micro-code address space;
Elimination of micro-code indirect-addressing that only ever saved a bit of coding space but not time; the advantage had been for variables assumed, dynamic, rather than tagged-so;
Subcoded dynamic hardware configuration, register assignments, function assignments, (runtime internal-register management, context-switched multi-emulators);
Direct-cache-addressing, compiler-optimized cache-control-instructions, pre-caching, do-once-instructions;
Address-encrypted memory-parity-sense rapid-halting of errant processing, (alt. use: testrun breakpointing);
Intra-parity (instead of extra-bit) program-immediate-readable 'full' use of data (cf '50%-legal' distance-2 op-coding but meanwhile-also bulk-data-CRC);
Sub-calculable-cyclic-redundancy-checking (runtime-piecewise-CRC-updating with concurrent-rechecking) bulk-memory;
Address-field-subpartitioning by-page-passthrough transistor-fanout-speed-efficiency (low-capacitance-FET-Off-mode);
Pipeline-emulation of successive-approximation-processing hardware (high-overhead minimal-logic delayed-output vs. low-overhead massive-logic prompt-output) by preshifting into the ALU;
Auto-paging, associative-page/address-actuation-in-RAM-hardware, storage-sector-size-pages/half/double, pages-powered-on-demand/delay, priority-encoding (to prevent mispaged-responses; cf carry-generate/propagate);
Page-Scope/Compass/Utility priority-value/user-link-list, for retaining multi-use content e.g. multiply-common-subroutines, (already effected by implicit usage-statistics, but should be more explicit when any-user-pages, 'lower-priority', remain present);
Sub-paging/RAM-cache-cloning of code-fragments/sub-subroutines in-use/compiled-local, (sync with original in case of update);
ADVANCED CONCEPTS:
ALU CHECKING: ('check-by-nines-and-elevens' which in binary converges as only 'check-by-threes'), but also carry+overflow-checking...parallel processes but that are not affected by timing and design 'flaws', so that hardware does not 'reduplicate the trouble' to check for it...
SUBKERNEL/speed-trap runs at half-Clockrate, (trap-to-lower-rate-and-retry-successive-approximation);
System-authority to drop the Clockrate to a specified 'guaranteed-flawless' timing, (or instruction-clocking), for better signal settling, to prevent system kernel failure and allow for soft-fail recovery and Clockrate control of lower-authority processes, to ensure process resumption-if-possible, even to extend the operable temperature/voltage-range...
Checkpointing, with validity-testing, to allow for instantaneous backup and resume-recontinue;
Simultaneous-parallel-multi-writeback-to-memory to rapidify and radiation-harden checkpointing, milgrade, spacegrade, secgrade, (e.g. cosmic-ray bursts, 1000yr-spaceship, bank vault)...
Administrative recording efficiency (exp. fewer older, millennial-recycling, parallel-simulation);
'Self Test And Repair' ['STAR'] system design and-or configuration;
RAM/P: ALU-pipeline-processing at the RAM-level, obviating access-time between CPU and RAM, where the RAM-word/block-itself is a pipeline-slice-buffer; Gbops fast reduced-data-routing-distributed-process-slivering/slicing;
ADDITIONAL CONCEPTS: security, routing, operating systems [2019]
...events should be added to a priority-time-sorted-queue (new data format) with gentle rate-tracking, (a responsivity-expectability problem we'd seen in the Internet-web where lost packets caused path-processing resets rather than scale-back the fill-proportion)...
TIME GRAINING: (cf RAM-Paging) timelocked integral processing by CPU/Core-resource-release-timer (typ. 3-1023 cycle ~ 3-1023 nsec × GHz) uninterruptible 'grains' of CPU/Core-time, short enough to not-interfere with thread-priority/interactivity-scheduling (typ. 20-5000×longer ~ 20 µsec ~ 20 kc CPU-dwell-increments), long enough to improve single-Core process-cohesion (atomicity; contrast single-data Compare-Conditional-Swap; cf independent multicore sharing processes advancing at square-root-of-time 'random-walk')... compiler-optimized instruction-order-closeness... n.b. timer-setting releases-and-relocks... a 'guarddog'-timer...

THE SYSTEM CONCEPT: Exponential Bootup

BOOT (hardware) tiny hardwired code, typically main memory; starts up 1-4 environment threads to self-check, cross-check, load and startup the next level;
COMPUTER (software)
SYSTEM (software)
MANAGER (software)
USER (software) second party application software, compilers, word processing, data tabulation;
BENEFACTION (software) performance applications, peripherals, storage, third-party goods and services;
GOVERNMENT (background documentation) lawbase, global and archaic protocols, agreements, decorum;

REFINEMENTS, IMPROVEMENTS:

Hybrid immediate-value numbers:
- e.g. high-low-2+2-bits exponential+tweak expansion-decode-8-bit-to-31-bit (positive-only-32-bit):
  - :8-bit-values v(8) = 0-55 [00000000-00110111] high-end-zero-extend directly to the same, 31-bit v(31) = 0-55;
  - :8-bit-values v(8) = 56-255 [00111000-11111111] expand as bit-fields n(5),y(1),x(2) to v(31) = 1(1),y(1),0(n-4),x(2);
  - i.e. v(31) = 0,1,2,3,4,...,54,55,(+8),64,65,66,67,(+28),96,97,98,99,(+28),128,129,130,131,(+60),192,...,...,...,3×2³⁰+3;
- alt. non-field-parsed high-low-2+2-bits exponential+tweak expansion-decode-8-bit-to-32-bit (full):
  - :8-bit-values v(8) = 0-43 [00000000-00101011] extend likewise, 44-255 [00101100-11111111] expand similarly;
  - i.e. v(31) = 0,1,2,3,4,...,42,43,(+4),48,49,50,51,(+12),64,65,66,67,(+28),96,97,98,99,(+28),128,129,...,...,...,3×2³¹+3;
  - alt. v(31) = 0,1,2,3,4,...,38,39,(+8),48,49,50,51,52,53,54,55,(+8),64,65,66,67,(+28),96,97,98,99,(+28),...,...,...,3×2³¹+3.
Rich-detail numbers:
- e.g. floatingpoint: sign, radix-2 scale, implicit MSB '1.0+', fraction, units-code (ibidem), infinity-code (or scale);
- also: deviation energy (cf significant digits), 2nd-order-deviation energy (or energy-form, TBD);
- also: finity weakness (cf infinite-infinitesimal product/integration);
- alt. floatingpoint-integer: scale=0 for integer, signed-scale for floatingpoint;
- alt. hypo-unity-floatingpoint: nonpositive-scale or mostly-negative-scale (cf cosmic-lightspeed relativistic);
- alt. differentiated-floatingpoint: differential part (cf derivative part, cf position+momentum);
- e.g. probability-fraction: implicit '+0.5' LSB (no 'absolute-0', no 'absolute-1') [0.000...05-0.999...95];

A premise discovery under the title,

computer processor theory history notes

Grand-Admiral Petry 'Majestic Service in a Solar System' Nuclear Emergency Management

Grand-Admiral Petry
'Majestic Service in a Solar System'
Nuclear Emergency Management