computer processor theory history notes

Before these things disappear utterly from standard use, because today these are typically subcircuits, one-of-a-thousand inside a larger computer...

[See also notes on improving the computing-web]

THE EARLY CONCEPT: CPU Central Processing Unit, today's PU/LPU/SPU Local Sub-Processing Unit, not necessarily central:

  1. ALU/AU/ALFU/FCU/NLU Arithmetic Numeric Logic Function Calculation Unit; also digit-manipulation (bit, binary; trit, trinary); numbers in sub-process format (unnormalized-carry), basic arithmetic, fast-as-possible, hardware-efficient, chip-realestate-efficient;
  2. REGISTERS: first-concept memory, fast-as-possible vs. convenient-as-possible; machine state, flags, tags, numbers in fixed-width sub-process format, (typ. 4-32: operator, operand, result, instruction-program-address, data-pointer/index), typ. interchangeable, multi-register extended-precision; loaded from instruction, via ALU, via other register, loaded-saved via memory, via IOP, vs. same-as-lowest-memory (typ. the first 4-32 multibyte-'words' thereof);
  3. CENTRAL CONTROL AND INSTRUCTION DECODE: interpretation of instructions to hardware-action, primarily linear-sequential-program memory addressing with conditional secondary address jumping;
  4. MEMORY: bulk-large memory, slower than registers; programs, data, data-arrays; numbers in compact format, tagged, parity-detection, error-correction, protected areas, segment-paging, hot-paging, other-process buffering, alteration-change-save-monitoring; short-range 'relative'-addressing vs. large-field 'absolute'-addressing;
  5. IOP Input-Output Processor: slower peripheral interface processes, box-internal or external, virtual-extended memory, data-storage (e.g. disks), other fast-processors;

    MORE DETAIL:

  6. Finite State Machine: usually referring only to Program Location and Register values, IOP likewise; allowed to run ad infinitem (or finitely large approximation thereto); (theory-technically all data values comprise the machine state);
  7. Interrupt structure: levels, arm, enable, trigger, corresponding memory allocations;
  8. ALU: arithmetic add-subtract-negate, logical and-or-exor-complement, shift-left-carry, shift-right-signextend-carry, partial-word-access;
  9. Instruction Set: typically vertical extent, constant-width, some multiples-thereof, minorly variable subpart-fields, 1-of-N-decoded (e.g. 256 instructions per 8-bit-width);
  10. multiple-length instructions: instruction with long numeric literal, 'Huffman'-coded instruction set (less-often-used are longer instructions);
  11. compact number format: typ. binary, short-integer aka indice, short zero-fill, short sign-extended, long-integer, probability-fraction, floating-point integer-scale-characteristic-exponent and fraction-mantissa, implicit high-order-bit, significant-digit-precision-monitoring, 2's-complement arithmetic/1's-complement (high-order-carry-wraparound, unused/wiped-zero);
  12. Assistant-instructions: cache-control-functions, software-where-hardware-needs-more-time;

    MODERN EXTENSIONS:

  13. REGISTER-STACKS (3-4 x 16-64), REGISTER-QUEUE-STREAMS (3 x 64-1024), REGISTER-STRING-DESCRIPTORS (3-4 x 10MB-1GB); half-as-many top-registers, one-bit sub-access-control (push-append-shift-'output', pop-depend-deshift-'input'); overflow-exception-handling; power-time-efficient large-addressing (adjacency-decoding); N.B. codependency;
  14. INSTRUCTION SUBPROCESS PIPELINING (where not-already parallel);
  15. SUB-INDEXED-REGISTER-PAIRS: (pipeline decision-streamlining avoiding conditional-program-flow-jumping: by equal-purpose-alternate-register-pairing) one-bit switched-sub-selection, key-and-indice streams, stream-length-cumulators, key-statistics/averaging-cumulators (n-log-n-sorting-thresholds);
  16. SUB-INDEXED-FUNCTION-PAIRS: (pipeline decision-streamlining avoiding conditional-program-flow-jumping: by equal-purpose-alternate-function-pairing)
  17. MULTI-ENVIRONMENT: banked machine-state-register-tags-flags, bank-per-environment (typ. 4-16), assignable by task-priority-intensity-IOP-level, minimax-scheduling; instantaneous low-overhead bank-switching, extensible into main memory, stacks, pages; (N.B. secure independency for unrelated tasks; codependency with single kernel task; shallower stacks-per);
  18. TRAPPING MECHANISM: instantaneous memory-indirection redirection-step, subprocess-emulation-simulation-implementation-upgrade, occasional-overflow-error-prevention processing; N.B. event-conditional precedence-priority levels; N.B. a stack is queued in overflow;
  19. EXTENDED-ALU: multiplication (fast direct, efficient double-complement-algorithm, semi-fast-or-serial e.g. 'Booth'-algorithm), conditional-subtract-shift-flag, division (usually by SAR Successive Approximation Register), encryption, DES;
  20. COMPOUND-FUNCTIONS: (multi-register-read/write) complex-number arithmetic, quotient-&-remainder, rotation-sum-&-difference, full-precision-multiplication, address-descriptors, radix-(e.g. binary/bit)-interleaving/deinterleaving;
  21. EXPANSIVE-FUNCTIONS: decode/encrypt-wide, multiplicative-squaring, (additive carry-and-overflow are like extra-bits just not full-word-extra);
  22. PASS-CACHE: auto-access copy-aside-and-reuse passing data (one-bit sub-access-control reread-prior vs. reread-updated value; can obviate read-to-register);
  23. MEMORY-CACHE: volatile fast-memory; array-processing, instruction-program-loops;
  24. LOOP-CACHE: auto-access high efficiency program loops;
  25. ARRAY-CACHE: efficient array-based process structuring, 'matrix-row-cache', 'submatrix-cache';
  26. REGISTER-TABLES: fast, special-purpose register-files; paging-table for main memory, spectral-array-processing;
  27. IOP-C-FIFO: inter-processor communications-FIFO; control-data queue-streams (2 x 64-1024);
  28. AUTO-CODE: direct instruction-address-to-decode, faster-bigger variable-width instructions, FPGA Field-Programmable Gate Array, nonlinear instruction-addressing, (cf hyperthreading but which is more flexible-volatile); horizontal-architecture super-instructions each assigned to a set of machine-state-addresses so-interleaved (est. 2x-3x-faster than vertical-architecture), machine-state-determined-instructions, program-ownership-protection (post-QUAL machine-state-flow-signature-detection);
  29. DATA-TYPING: indirection-control ('value-call/name-call'), system-and-process-protection;
  30. DATA-UNITING: units-tracking, units-conversion, units-checking (cf torque vs. work both force×distance but torque is static-perpendicular and work is dynamic-inline);

    BETTER EXTENSIONS:

  31. Vertical architecture instruction-set constant-width-loading tends more flexible, more efficient with constant-width memory; loads faster (cf horizontal loads wide superinstructions -and- addresses, though fewer of them);
  32. Array memory by paging-descriptor 'sparse' addressing to reduce micro-code address space, secure arrays from each other, and from single-valued items, and attach indexing (one register per, albeit multiple arrays may be defined as the same in perspectives);
  33. Rapid-sequential-memory-sub-blocking (single-level-adjacency-addressing-decoding);
  34. Semi-logical memory by trees: category-association and refinement, to reduce micro-code address space;
  35. Elimination of micro-code indirect-addressing that only ever saved a bit of coding space but not time; the advantage had been for variables assumed, dynamic, rather than tagged-so;
  36. Subcoded dynamic hardware configuration, register assignments, function assignments, (runtime internal-register management, context-switched multi-emulators);
  37. Direct-cache-addressing, compiler-optimized cache-control-instructions, pre-caching, do-once-instructions;
  38. Address-encrypted memory-parity-sense rapid-halting of errant processing, (alt. use: testrun breakpointing);
  39. Intra-parity (instead of extra-bit) program-immediate-readable 'full' use of data (cf '50%-legal' distance-2 op-coding but meanwhile-also bulk-data-CRC);
  40. Sub-calculable-cyclic-redundancy-checking (runtime-piecewise-CRC-updating with concurrent-rechecking) bulk-memory;
  41. Address-field-subpartitioning by-page-passthrough transistor-fanout-speed-efficiency (low-capacitance-FET-Off-mode);
  42. Pipeline-emulation of successive-approximation-processing hardware (high-overhead minimal-logic delayed-output vs. low-overhead massive-logic prompt-output) by preshifting into the ALU;
THE SYSTEM CONCEPT: Exponential Bootup REFINEMENTS, IMPROVEMENTS: ADVANCED CONCEPTS:
  1. ALU checking ('check-by-nines-and-elevens' which in binary converges as only 'check-by-threes'), But also carry+overflow-checking...parallel processes but that are not affected by timing and design 'flaws', so that hardware does not 'reduplicate the trouble' to check for it...
  2. System-authority to drop the Clockrate by half, or to a specified 'guaranteed-flawless' timing, or for instruction-double-clocking, for better signal settling, to prevent system kernel failure and allow for soft-fail recovery and Clockrate control of lower-authority processes, to ensure process resumption-if-possible, even to extend the operable temperature/voltage-range...
  3. Clockrate-testing (trap-to-lower-rate-and-retry-successive-approximation)
  4. Checkpointing, with validity-testing, to allow for instantaneous backup and resume-recontinue;
  5. Simultaneous-parallel-multi-writeback-to-memory to rapidify and radiation-harden checkpointing, milgrade, spacegrade (e.g. cosmic-ray bursts, 1000yr-spaceship);
  6. Administrative recording efficiency (exp. fewer older, millennial-recycling, parallel-simulation);
  7. 'Self Test And Repair' system design and-or configuration;

A premise discovery under the title,

Grand-Admiral Petry
'Majestic Service in a Solar System'
Nuclear Emergency Management

2009,2014 GrandAdmiralPetry@Lanthus.net