computer processor theory history notes
Before these things disappear utterly from standard use, because,
today these are typically subcircuits, one-of-a-thousand inside a
larger computer in a distributed-processing-network there-of...
|
[See also notes on improving the computing-web]
THE EARLY CONCEPT: CPU Central Processing Unit, today's PU/LPU/SPU Local
Sub-Processing Unit, not necessarily central:
- ALU/AU/ALFU/FCU/NLU Arithmetic Numeric Logic Function Calculation Unit; also
digit-manipulation (bit, binary; trit, trinary); numbers in sub-process format (unnormalized-carry),
basic arithmetic, fast-as-possible, hardware-efficient, chip-realestate-efficient;
- REGISTERS: first-concept memory, fast-as-possible vs. convenient-as-possible; machine
state, flags, tags, numbers in fixed-width sub-process format, (typ. 4-32: operator, operand,
result, instruction-program-address, data-pointer/index), typ. interchangeable, multi-register
extended-precision; loaded from instruction, via ALU, via other register, loaded-saved via
memory, via IOP, vs. same-as-lowest-memory (typ. the first 4-32 multibyte-'words' thereof);
- CENTRAL CONTROL AND INSTRUCTION DECODE: interpretation of instructions to
hardware-action, primarily linear-sequential-program memory addressing with conditional
secondary address jumping;
- MEMORY: bulk-large memory, slower than registers; programs, data, data-arrays;
numbers in compact format, tagged, parity-detection, error-correction, protected areas,
segment-paging, hot-paging, other-process buffering, alteration-change-save-monitoring;
short-range 'relative'-addressing vs. large-field 'absolute'-addressing;
- IOP Input-Output Processor: slower peripheral interface processes, box-internal or
external, virtual-extended memory, data-storage (e.g. disks), other fast-processors;
MORE DETAIL:
- Finite State Machine: usually referring only to Program Location and Register values,
IOP likewise; allowed to run ad infinitem (or finitely large approximation thereto);
(theory-technically all data values comprise the machine state);
- Interrupt structure: levels, arm, enable, trigger, corresponding memory allocations;
- ALU: arithmetic add-subtract-negate, logical and-or-exor-complement,
shift-left-carry, shift-right-signextend-carry, partial-word-access;
- Instruction Set: typically vertical extent, constant-width, some multiples-thereof,
minorly variable subpart-fields, 1-of-N-decoded (e.g. 256 instructions per 8-bit-width);
- multiple-length instructions: instruction with long numeric literal,
'Huffman'-coded instruction set (less-often-used are longer instructions);
- compact number format: typ. binary, short-integer aka indice, short zero-fill, short sign-extended,
long-integer, probability-fraction, floating-point integer-scale-characteristic-exponent and fraction-mantissa,
implicit high-order-bit, significant-digit-precision-monitoring, 2's-complement arithmetic/1's-complement
(high-order-carry-wraparound, unused/wiped-zero);
- Assistant-instructions: cache-control-functions, software-where-hardware-needs-more-time;
MODERN EXTENSIONS:
- REGISTER-STACKS (3-4 x 16-64), REGISTER-QUEUE-STREAMS (3 x 64-1024),
REGISTER-STRING-DESCRIPTORS (3-4 x 10MB-1GB); half-as-many
top-registers, one-bit sub-access-control (push-append-shift-'output',
pop-depend-deshift-'input'); overflow-exception-handling; power-time-efficient
large-addressing (adjacency-decoding); N.B. codependency;
- INSTRUCTION SUBPROCESS PIPELINING (where not-already parallel);
- SUB-INDEXED-REGISTER-PAIRS: (pipeline decision-streamlining avoiding
conditional-program-flow-jumping: by equal-purpose-alternate-register-pairing)
one-bit switched-sub-selection, key-and-indice streams, stream-length-cumulators,
key-statistics/averaging-cumulators (n-log-n-sorting-thresholds);
- SUB-INDEXED-FUNCTION-PAIRS: (pipeline decision-streamlining avoiding
conditional-program-flow-jumping: by equal-purpose-alternate-function-pairing)
- MULTI-ENVIRONMENT: banked machine-state-register-tags-flags, bank-per-environment
(typ. 4-16), assignable by task-priority-intensity-IOP-level, minimax-scheduling;
instantaneous low-overhead bank-switching, extensible into main memory, stacks, pages;
(N.B. secure independency for unrelated tasks; codependency with single kernel task;
shallower stacks-per);
- TRAPPING MECHANISM: instantaneous memory-indirection redirection-step,
subprocess-emulation-simulation-implementation-upgrade,
occasional-overflow-error-prevention processing; N.B.
event-conditional precedence-priority levels; N.B. a stack is queued in overflow;
- EXTENDED-ALU: multiplication (fast direct, efficient double-complement-algorithm,
semi-fast-or-serial e.g. 'Booth'-algorithm), conditional-subtract-shift-flag, division
(usually by SAR Successive Approximation Register), encryption, DES;
- COMPOUND-FUNCTIONS: (multi-register-read/write) complex-number arithmetic,
quotient-&-remainder, rotation-sum-&-difference, full-precision-multiplication,
address-descriptors, radix-(e.g. binary/bit)-interleaving/deinterleaving;
- EXPANSIVE-FUNCTIONS: decode/encrypt-wide, multiplicative-squaring,
(additive carry-and-overflow are like extra-bits just not full-word-extra);
- PASS-CACHE: auto-access copy-aside-and-reuse passing data (one-bit sub-access-control
reread-prior vs. reread-updated value; can obviate read-to-register);
- MEMORY-CACHE: volatile fast-memory; array-processing, instruction-program-loops;
- LOOP-CACHE: auto-access high efficiency program loops;
- ARRAY-CACHE: efficient array-based process structuring, 'matrix-row-cache', 'submatrix-cache';
- REGISTER-TABLES: fast, special-purpose register-files; paging-table for main memory,
spectral-array-processing;
- IOP-C-FIFO: inter-processor communications-FIFO;
control-data queue-streams (2 x 64-1024);
- AUTO-CODE: direct instruction-address-to-decode, faster-bigger variable-width
instructions, FPGA Field-Programmable Gate Array, nonlinear instruction-addressing,
(cf hyperthreading but which is more flexible-volatile); horizontal-architecture
super-instructions each assigned to a set of machine-state-addresses so-interleaved
(est. 2x-3x-faster than vertical-architecture), machine-state-determined-instructions,
program-ownership-protection (post-QUAL machine-state-flow-signature-detection);
- DATA-TYPING: indirection-control ('value-call/name-call'), system-and-process-protection;
- DATA-UNITING: units-tracking, units-conversion, units-checking (cf torque vs. work
both force×distance but torque is static-perpendicular and work is dynamic-inline);
- MULTI-CORE/GPU/RISC: general-processing-units (plural-units), reduced-instruction-set-(optimized)-CPUs;
specialized vector-processing;
BETTER EXTENSIONS:
- Vertical architecture instruction-set constant-width-loading tends more flexible,
more efficient with constant-width memory; loads faster (cf horizontal loads wide
superinstructions -and- addresses, though fewer of them);
- Array memory by paging-descriptor 'sparse' addressing to reduce micro-code address space,
secure arrays from each other, and from single-valued items, and attach indexing
(one register per, albeit multiple arrays may be defined as the same in perspectives);
- Rapid-sequential-memory-sub-blocking (single-level-adjacency-addressing-decoding);
- Semi-logical memory by trees: category-association and refinement, to reduce
micro-code address space;
- Elimination of micro-code indirect-addressing that only ever saved a bit of coding space but not time;
the advantage had been for variables assumed, dynamic, rather than tagged-so;
- Subcoded dynamic hardware configuration, register assignments, function assignments,
(runtime internal-register management, context-switched multi-emulators);
- Direct-cache-addressing, compiler-optimized cache-control-instructions, pre-caching, do-once-instructions;
- Address-encrypted memory-parity-sense rapid-halting of errant processing, (alt. use: testrun breakpointing);
- Intra-parity (instead of extra-bit) program-immediate-readable 'full' use of data
(cf '50%-legal' distance-2 op-coding but meanwhile-also bulk-data-CRC);
- Sub-calculable-cyclic-redundancy-checking (runtime-piecewise-CRC-updating with concurrent-rechecking)
bulk-memory;
- Address-field-subpartitioning by-page-passthrough transistor-fanout-speed-efficiency
(low-capacitance-FET-Off-mode);
- Pipeline-emulation of successive-approximation-processing hardware (high-overhead minimal-logic delayed-output
vs. low-overhead massive-logic prompt-output) by preshifting into the ALU;
- Auto-paging, associative-page/address-actuation-in-RAM-hardware, storage-sector-size-pages/half/double,
pages-powered-on-demand/delay, priority-encoding (to prevent mispaged-responses; cf carry-generate/propagate);
- Page-Scope/Compass/Utility priority-value/user-link-list, for retaining multi-use content
e.g. multiply-common-subroutines, (already effected by implicit usage-statistics, but should be more explicit
when any-user-pages, 'lower-priority', remain present);
- Sub-paging/RAM-cache-cloning of code-fragments/sub-subroutines in-use/compiled-local, (sync with original in case of update);
ADVANCED CONCEPTS:
- ALU CHECKING: ('check-by-nines-and-elevens' which in binary converges as only 'check-by-threes'),
but also carry+overflow-checking...parallel processes but that are not affected by timing and design 'flaws',
so that hardware does not 'reduplicate the trouble' to check for it...
- SUBKERNEL/speed-trap runs at half-Clockrate, (trap-to-lower-rate-and-retry-successive-approximation);
- System-authority to drop the Clockrate to a specified 'guaranteed-flawless' timing, (or instruction-clocking),
for better signal settling, to prevent system kernel failure and allow for soft-fail recovery and Clockrate control of
lower-authority processes, to ensure process resumption-if-possible, even to extend the operable
temperature/voltage-range...
- Checkpointing, with validity-testing, to allow for instantaneous backup and resume-recontinue;
- Simultaneous-parallel-multi-writeback-to-memory to rapidify and radiation-harden checkpointing, milgrade,
spacegrade, secgrade, (e.g. cosmic-ray bursts, 1000yr-spaceship, bank vault)...
- Administrative recording efficiency (exp. fewer older, millennial-recycling, parallel-simulation);
- 'Self Test And Repair' ['STAR'] system design and-or configuration;
- RAM/P: ALU-pipeline-processing at the RAM-level, obviating access-time between CPU and RAM, where the
RAM-word/block-itself is a pipeline-slice-buffer; Gbops fast reduced-data-routing-distributed-process-slivering/slicing;
ADDITIONAL CONCEPTS: security, routing, operating systems [2019]
- ...events should be added to a priority-time-sorted-queue (new data format) with gentle rate-tracking, (a
responsivity-expectability problem we'd seen in the Internet-web where lost packets caused path-processing resets
rather than scale-back the fill-proportion)...
- TIME GRAINING: (cf RAM-Paging) timelocked integral processing by CPU/Core-resource-release-timer
(typ. 3-1023 cycle ~ 3-1023 nsec × GHz) uninterruptible 'grains' of CPU/Core-time, short enough to not-interfere
with thread-priority/interactivity-scheduling (typ. 20-5000×longer ~ 20 µsec ~ 20 kc CPU-dwell-increments),
long enough to improve single-Core process-cohesion (atomicity; contrast single-data Compare-Conditional-Swap;
cf independent multicore sharing processes advancing at square-root-of-time 'random-walk')... compiler-optimized
instruction-order-closeness... n.b. timer-setting releases-and-relocks... a 'guarddog'-timer...
THE SYSTEM CONCEPT: Exponential Bootup
- BOOT (hardware) tiny hardwired code, typically main memory; starts up 1-4 environment threads to self-check,
cross-check, load and startup the next level;
- COMPUTER (software)
- SYSTEM (software)
- MANAGER (software)
- USER (software) second party application software, compilers, word processing, data tabulation;
- BENEFACTION (software) performance applications, peripherals, storage, third-party goods and services;
- GOVERNMENT (background documentation) lawbase, global and archaic protocols, agreements, decorum;
REFINEMENTS, IMPROVEMENTS:
- Hybrid immediate-value numbers:
- e.g. high-low-2+2-bits exponential+tweak expansion-decode-8-bit-to-31-bit (positive-only-32-bit):
- :8-bit-values v(8) = 0-55 [00000000-00110111] high-end-zero-extend directly to the same, 31-bit v(31) = 0-55;
- :8-bit-values v(8) = 56-255 [00111000-11111111] expand as bit-fields n(5),y(1),x(2) to v(31) = 1(1),y(1),0(n-4),x(2);
- i.e. v(31) = 0,1,2,3,4,...,54,55,(+8),64,65,66,67,(+28),96,97,98,99,(+28),128,129,130,131,(+60),192,...,...,...,3×230+3;
- alt. non-field-parsed high-low-2+2-bits exponential+tweak expansion-decode-8-bit-to-32-bit (full):
- :8-bit-values v(8) = 0-43 [00000000-00101011] extend likewise, 44-255 [00101100-11111111] expand similarly;
- i.e. v(31) = 0,1,2,3,4,...,42,43,(+4),48,49,50,51,(+12),64,65,66,67,(+28),96,97,98,99,(+28),128,129,...,...,...,3×231+3;
- alt. v(31) = 0,1,2,3,4,...,38,39,(+8),48,49,50,51,52,53,54,55,(+8),64,65,66,67,(+28),96,97,98,99,(+28),...,...,...,3×231+3.
- Rich-detail numbers:
- e.g. floatingpoint: sign, radix-2 scale, implicit MSB '1.0+', fraction, units-code (ibidem), infinity-code (or scale);
- also: deviation energy (cf significant digits), 2nd-order-deviation energy (or energy-form, TBD);
- also: finity weakness (cf infinite-infinitesimal product/integration);
- alt. floatingpoint-integer: scale=0 for integer, signed-scale for floatingpoint;
- alt. hypo-unity-floatingpoint: nonpositive-scale or mostly-negative-scale (cf cosmic-lightspeed relativistic);
- alt. differentiated-floatingpoint: differential part (cf derivative part, cf position+momentum);
- e.g. probability-fraction: implicit '+0.5' LSB (no 'absolute-0', no 'absolute-1') [0.000...05-0.999...95];
A premise discovery under the title,
© 2009,2014,2019 GrandAdmiralPetry@Lanthus.net