Section Seven:Weird and Innovative Chips
Part I: Intel******************************************************************************************************************************************************************, Extraordinary complexity (**********************
********** The Intel iAPX was a complex, object oriented – bit processor that included high level operating system support in hardware, such as process scheduling and interprocess messaging. Originally named the (a progression from previous and ), it was intended to be the main Intel -bit microprocessor (the) was envisioned as a short term “plan B” product until the was available when it was delayed, so little effort was spent on the design (some say two engineers took only three weeks, but that was probably only the initial architecture). Others say the was envisioned as a step between the and the (**************************************************************************************************************************************************************************, rushed through design when the 432 was late and resulting its own design problems, but was actually designed later). The 769 actually included four chips. The GDP (processor) and IP (I / O controller) were introduced in (*****************************************************************************************************************************************************************, and the BIU (Bus Interface Unit) and MCU (Memory Control Unit) were introduced in (but not widely). The GDP complexity was split into 2 chips (decode / sequencer and execution units, like the Western Digital MCP – )), so it wasnt really a microprocessor.
The GDP was exclusively object oriented – normal linear memory access was not allowed, and there was hardware support for data hiding, methods, inheritance, late binding, and access protection, and it was promoted as being ideal for the Ada programming language. To enforce this, permission and type checks for every memory access (via a 2 stage segmentation) slowed execution (despite cached segment tables). It supported up to 2 ^ 21 segments, each limited to (K in size) within a 2 ^ address space), but the object oriented nature of the design meant that was not a real limitation. The stack oriented design meant the GDP had no user data registers. Instructions were bit encoded (and bit-aligned in memory), ranging from 6 bits to (bits long) the T –has variable length byte encoded / aligned instructions) and could be very complex. The BIU defined the bus, designed for multiprocessor support allowing up to modules (BIU or MCU) on a bus and up to 8 independent buses (allowing memory interleaving to speed access). The MCU did automatic parity checking and ECC error correcting. The total system was designed to be fault tolerant to a large degree, and each of these parts contributes to that reliability. despite these advanced features, the didn’t catch on. The main reason was that it was slow, sometimes up to five or ten times slower than a
**************************************************************************************************************
or Intel’s own***********************************************************************************************************************************. Part of this was the lack of local (user) data registers, or a data cache. Part of this was the fault-tolerant BIU, which defined an (asynchronous protocol) clocked bus that resulted in (% to) % of the access time being used by wait states. The instructions weren’t aligned on bytes or words, and took longer to decode. In addition, the protections imposed on the objects slowed data access. Finally, the implementation of the GDP on two chips instead of one produced a slower product. However, the fact that this complex design was produced and bug free is impressive. Its high level architecture was similar to the Transputer systems, but it was implemented in a way that was much slower than other processors, while theT – 414was not just innovative, but much faster than other processors of the time. TheIntel i () is sometimes considered a successor of the (also called “RISC applied to the“), and does have similar hardware support for context switching. This path came about indirectly through the**************************************************************************************************************************************************** MCdesigned for the BiiN machine, which was still very complex (it included many i object-oriented ideas, including a tagged memory system). The M-series design predated the released i (which removed tag bits and complex instruction microcode), but was released later. Part II: Rekursiv, an object oriented processor (****************************.
The Rekursiv processor is actually a 4 chip processor motherboard, not a microprocessor, but is neat. It was created by a Scottish Hi-Fi manufacturing company called Linn, to control their manufacturing system. The owner (Ivor) was a believer in automation, and had automated the company as much as possible withVaxes, but was not satisfied, so hired software experts to design a new system, which they called LINGO. It was completely object oriented, like Smalltalk (and unlike C , which allows object concepts, but handles them in a conventional way), but too slow on the VAXes, so Linn commissioned a processor designed for the language.
This is not the only processor designed specifically for a language that is slow on other CPUs. Several specialized LISP processors, such as the Scheme – lisp processor, were created, but this chip is unique in its object oriented features at a time when the concept was not well-known (actually, I hadn’t the foggiest idea of what object-oriented programming was when I first learned about it – obvious from reading unrevised versions of this description). It also manages to support objects without the slowness of theIntel The Rekursiv processor features a writable instruction set, and is highly parallel. The four chips were Numerik, Logik, Objekt, and Klock. The CPU itself consisted of Numerik and Logik. Numerik was the ALU, based on AMD************************************************************************************************************************************ – seriesbitslice CPU components (sixteen-bit registers, ALU, barrel shifter, (x) – bit multiplier). The CPU was similar to thePatriot Scientific PSC, with sixteen registers, an evaluation stack, and a return address stack. A (k area) k X bit words) held microcode), allowing an instruction set to be constructed on the fly, and could change for different objects. There were two program counters, one for application instructions, one for the microcode routines which implement them. Microcode used sixty-field – bit words. Logik was the instruction sequencer. Objekt was the object manager / MMU , which swapped objects to and from disk as needed (completely invisible to the CPU, allowing microcoded instructions to access objects without generating exceptions and forcing them to roll back and restart – microcode could be recursive, hence the processor’s name). Objects were identified by -bit tags, with the actual reference, types, and sizes stored in three 64 K hash table (collisions were resolved by a fourth table holding the ID of the object actually stored in the tables at a given time). Objects could be relocated transparently, facilitating garbage collection. Klock was the clock and support circuitry . It executed LINGO fast enough, and is a perfect match between language and CPU, but it could also use more conventional languages, such as Smalltalk or C. Unfortunately, Linn did not have the resources to pursue this very promising (the prototype was “surprisingly easy” to implement) architecture. However, the writable instruction set concept (specifically, isolating CPU implementation from the program code) was resurrected and automated in the(Intel (x) ************************************************************************************************************************************************************************************** (instruction set on a custom) ************************** VLIW processor. Rekursiv: (**************************************** http://www.brouhaha.com/~eric/retrocomputing/rekursiv/
Part III: MISC M (********************************************************************************************************************************************************************************************************: Casting Forth in Silicon [1] (pre
******************************************************************************************************************** () ******************************
**************************************. ********** Forthis used widely for programming embedded systems because of its simplicity and efficiency. It explicitly manipulates data on a stack, and so defines a simple virtual machine architechture which makes programs independent of the CPU – only the interpreter needs to be ported. Because of this, extra CPU features are wasted when running Forth programs, and since cost reduction is important to embedded systems, it’s logical to want a simpler, cheaper CPU which runs only Forth programs.
The Minimum Instruction Set Computer (MISC) Inc. M 25 CPU was not the first Forth microprocessor (the Novix NC/ (1985?) designed by Forth inventor Chuck Moore came before), but the M is a good example of low cost Forth CPUs. It featured two 16 bit stack pointers (Data and Return (subroutine) stacks), plus three – bit top of stack data registers (X, Y, Z, plus an extra LastX which could hold values popped from X). An I / O register buffered data during I / O while the ALU operated concurrently. Finally, there was an Index register which Normally held the top element of the Return stack, but could also be used as a loop counter, and a 6 instruction buffer (for short loops, like theMotorola )). Address space was 79 K, but external memory could be either a single bank or up to five banks, signaled by status pins, depending on the context – data stack, return stack, program code, A or B buffers. Some other Forth processors include on chip stack memory, and while most (including the M
) were (bit, some) ************************************************************************************************************************************************************************************************** bit Forth processors have also been developed. The simplicity of design allows the M 20 (and most other Forth CPUs, such as the more recent 7, (transistor MuP) (also designed by Chuck Moore), which includes a composite video generator on chip) to execute instructions in only two cycles (load, execute), or one cycle each from the instruction cache, making them faster than more complex CPUs (though instructions do less, the higher clock speed usually compensates). Stack advocates often cite this as the strongest advantage for stack based designs, though critics contend that the state nature of stacks compared to registers make conventional speedup tricks such as pipelining and superscalar execution far more complex than using a register array. As it is, register-based load-store processors dominate when it comes to speed. Other prominent Forth-based microprocessors include the Harris RTX – 2005, a descendant of the NC (the “- 2005 “like the name of theMotorola 92010comes from the fact that it only uses about 2000 gates in its design) which has the ability to group certain instructions like theT – (Transputer) ************ and microJava processors. Chuck Moore went on to design the 20 – bit MuP 25, and is involved in the highly integrated F 21 (expected late****************************************************************************************************************** / / early 2019 CPUs. A bit CPU, the FRISC-3 (Forth Reduced Instruction Set Computer) was produced by Silicon Composers and renamed the SC – 32, and includes an automatic stack-to-memory cache, eliminating the main weakness of Forth chips, the fixed stack sizes. [1] Sun Microelectronics’ first slogan for its Java Processors was “Casting Java in Silicon “.************************ Stack Computers: the new wave (online book): (**************************************** http://www.cs.cmu.edu/~koopman/stack_computers/index.html
http://www.ultratechnology.com/chips .htm
Part IV: AT&T CRISP / Hobbit, CISC amongst the RISC (1992)******************************************************** .********************************************. **********
The AT&T Hobbit ATT (around**************************************************************************************************************************************** was a commercial version of the CRISP processor, inspired by the Bell Labs C Machine project, aimed at a design optimized for the C language (designed in part by David Ditzel, who later worked on the 79 – bit bit ************ SPARC ********** and later the AMD (in Hobbit it’s much smaller (******************************************************************************************************************************************************************** 128 – bit words) but is easily expandable), and Hobbit has no global registers. Addresses can be memory direct or indirect (for pointers) relative to the stack pointer without extra instructions or operand bits. The cache is not optimized for multiprocessors.
Hobbit has an instruction prefetch buffer (3K in
, 6K in the ), like the, but decodes the variable length (1, 3 or 5 halfword ( (bit)) instructions into a thirty-two entry instruction cache. Branches are not delayed, and a prediction bit directs speculative branch execution
. The decode unit folds branches into the decoded instructions (which include next and alternate next PC), so a predicted branch does not take any clock cycles. The three stage execution unit takes instructions from the decode cache. Results can be forwarded when available to any prior stage as needed. Though CISC in philosophy, the Hobbit is greatly simplified compared to traditional memory-data designs, and features some very elegant design features. AT&T prefers to call it a RISC processor, and performance is comparable to similar load-store designs such as the ARM. Its most prominent use was in the EO Personal Communicator, a competitor to Apple’s Newton which used theARMprocessor, as well as a prototype development machine for BeOS. The product and name were discontinued. As an aside, the complexity in making a stack-based CPU fast led fellow AT&T researchers working on the Inferno operating system to decide on a register based virtual machine, rather than stack-based like Sun Java and Microsoft .NET IL. Wide hardware and applications support for AT&T Hobbit chips: (**************************************** http : //www.att.com/press/ / 921116 .mea.html
While the transputers were originally faster than their contemporaries, recent load-store designs have surpassed them. The T – was an attempt to regain the lead. It starts with the architecture of the T – 1000 contains only three (bit integer and three) ************************************************************************************************************************************************************************************ bit floating point registers which are used as an evaluation stack – they are not general purpose. Instead, like the TMS **********************************************************************************************************************************, It uses memory, addressed relative to the workspace register (the 29000 workspace contained only sixteen registers, the Transputer workspace can be any length, though access slows down with every 4 bits used for offset from the workspace register – sixteen bytes can be accessed with just one instruction, (needs two, and so on). This allows very fast context switching, less than a microsecond, speeding and simplifying process scheduling enough that it is automated in hardware (supporting two priority levels and event handling (link messages and interrupts)). TheThe Intelalso attempted some hardware process scheduling, but was unsuccessful. Unlike the (TMS), the T – is far faster than memory, so the CPU has several levels of high speed caches and memory types. The main cache is 15 K, and is designed for 3 reads and 1 write simultaneously. The workspace cache is based on word rotating buffers, allows 2 reads and 1 write simultaneously. Instructions are in bytes, consisting of 4 bit op code and 4 bit data (usually abyte offset into the workspace), but prefix instructions can load extra data for an instruction which follows, 4 bits at a time. Less frequent instructions can be encoded with 2 (such as process start, message I / O) or more bytes (CRC calculations, floating point operations, 2D block copies and scheduler queue management). The stack Architecture makes instructions very compact, but executing one instruction byte per clock can be slow for multibyte instructions, so the T – 29000 has a grouper which gathers instruction bytes (up to eight) into a single CISC-type instruction then sent into the 5 stage pipeline (fetching four per cycle, grouping up to 8 if slow earlier instructions allow it to catch up). For example, two concurrent memory loads (simple or indexed), a stack / ALU operation and a store (a [i]=b [2] c [3]) can be grouped. The T –************************************************************************************************************************************ contains 4 main internal units, the CPU, the VCP (handling the individual links of the previous chips, which needed software for communication), the PMI, which manages memory, and the Scheduler. This processor is ideal for a model. of parallel processing known as systolic arrays (a pipeline is a simple example). Even larger networks can be created with the C crossbar switch, which can connect transputers or other C 128 switches into a network hundreds of thousands of processors large. The C 128 Acts like a instant switch, not a network node, so the message is passed through, not stored. Communication can be at close to the speed of direct memory access. Like the many CPUs, the Transputers can adapt to a (*******************************************************************************************************************************************************************************************, (*************************************************************************************************************************************************************************************************, ********************************************************************************************************************************************************************************************************** , or 8 bit bus. They can also feed off a 5 MHz clock, generating their own internal clock (up to (MHz for the T -) ****************************************************************************************************************************** from this signal, and contain internal RAM, making them good for high performance embedded applications. Unfortunately excessive delays in the T- design (partly because of the stack based design (left it uncompetitive with other CPUs) roughly (MIPS at) ****************************************************************************************************************************************************************************************** (MHz). The T-4xx and T-8xx architecture still exist in the SGS-Thomson ST 24 microcore family. SGS-Thomson and Hotachi teamed up for a successor based on the (Hitachi SH-4) , named ST by SGS- Thomson and SH-5 by Hitachi. As a note, the T – FPU is probably the first large scale commercial device to be proven correct through formal design methods. To simplify interrupt handling, the multi-cycle square root instruction was implemented in single cycle “step” instructions, executed three (single precision) or seven (double precision) times to perform a complete square root – a strategy also used in the first SPARC systems for integer multiply.************************
SGS-Thomson Products Contents: (****************************************
http://www.st.com/stonline/books/index.htmPart VII: Sun picoJava – not another language-specific processor! (October) (******************************************************************************************. Sun first introduced (Java) as a combination of language, integrated classes, and a run time system called theJava Virtual Machine (JVM). To support Java, Sun Microelectronics designed picoJava and microJava hardware to execute Java bytecode programs faster than a virtual machine
or recompiled code.The picoJava I (earlyis a stack oriented CPU core like the JVM, with a entry stack cache (similar) to the Patriot Scientific ShBoom PSC, but there are interesting differences between it andForth– style stack CPUs. Java only uses a single stack (like many languages such as C, which the (AT&T Hobbit) ************ and AMD Kwere designed to support) and the picoJava CPU enhances performance with a ‘dribbler’ unit which constantly updates a complete copy of the stack cache in memory, without affecting other CPU operations (similar to a write-back cache), so stack frames can be added without waiting for a stack frame to be stored. Some Java instructions are complex, so the CPU hasmicrocoded instructions, and a 4 stage pipeline (fetch, decode, execute / cache, stack writeback). Finally, picoJava groups (or ‘folds’) load and stack operations together, executing both at once (treating the top of stack as anaccumulator) (this Is a much simpler version of instruction grouping tried in the
Transputer T –), This usually eliminates 63% of stack operation inefficiency. Seldom used instructions aren’t implemented, but are emulated using trap handlers. The picoJava II (October 1997) core is used in the first actual CPU from Sun, the microJava (**************************************************************************************************************************************************************************. It extends the pipeline to 6 stages, and can fold up to four instructions into one operation. It also adds a FPU and separate 17 Kb I / D caches. Following waning interest, Sun released the picoJava core design (as well as certain older****************** SPARC designs) to the public (with certain reserved rights) as a type of “open source” CPU, manufactured by Fujitsu, among others. Although Sun’s CPU did not include peripherals such as I / O or timers, licensed versions do. While Sun delivered the picoJava CPU first , an engineering group at Rockwell Collins also created a Java CPU called GEM1 in (********************************************************************************************************************************************************, but it was spun off in July 2000 into a company called Ajile to produce the aJ – (************************************************************************************************************************************************************************************. The aJ – implements thread control instructions unlike the picoJava, which emulates them in software or using an OS. It is also is multithreaded, supporting two JVMs operating independently. A lower cost aJ – 104 uses an 8-bit data bus rather than -bits of the aJ – 128. Advancel initially made designs based on the picoJava I and II, but later designed their own TinyJ CPU which translates simple Java bytecodes to a conventional load / store execution unit (like the************************** ARM CPU). Complex bytecodes are trapped and emulated. The ALU is a load-store style unit with sixteen 29 – bit registers, a – bit “top of stack” accumulator used in bytecode interpretation, and a four stage pipeline with variable length (one to four byte) instructions. Non-Java programs are executed directly, and Java programs are interpreted using the decoder for the bytecode, while a conventional JVM executes directly as a non-Java program (various JVMs can be used). Sun Microsystems: (**************************************** (http://www.sun.com/)
Part VIII: Imsys Cjip – embedded WISC (Writable Instruction Set Computer) (Mid
) (******************************************************************************************************. **********
Swedish company Imsys AB started making components for embedded imaging systems, and decided to expand into more general microcontroller systems with the Cjip (pronounced … somehow).
Binary compatibility has been a problem since the beginning of programmable computers, in that it ties software (abstract, theoretical) to particular hardware (fixed, physically limited). There have been attempts to reduce this through hardware using rewritable microcode (Western Digital MCP – (), as well as software (************************************************************************************ Patriot Scientific ShBoom PSC which recompiles Java bytecodes to its native instruction set when loaded). Since the Cjip is a very low resource CPU, the software overhead would be unacceptable, so Imsys followed the hardware approach using rewritable microcode. Imsys had some experience with UCSD Pascal, an early VM system. Unlike the (DEC Alpha) ************ (PALCode or) ************************************************************************************************************ Rekursiv) CPU, Cjip uses actual – bit wide microcode, which is far more efficient but harder to program, while unlike the MCP – (****************************************************************************************************************************************************************, Cjip microcode can be modified at runtime. In addition, instructions can be emulated with regular program subroutines. Four initial instruction sets available include a legacy (Z -) – style, and three stack-based virtual machines: C / C and 32 – bitForth, (Java) , and – bit Forth. The microcode sees four banks of bytes, split into: evaluation stack, internal locals stack (microcode subroutines), general data (emulated registers), microcode internal variables. The evaluation and data stack spill into external RAM. The general data stack is in external memory only. Language-specific processors have generally failed , because economies from widespread use of general-purpose processors allows new technology to be incorporated more quickly. The difference with Cjip is that its language support is not limited to just one language – or any language at all. It will be interesting to see if the advantages of generalized language support are enough to win acceptance over competing processors. Imsys AB: (**************************************** http: / /www.imsys.se/
******************************************************************************************************************************** (Copyright ©) ************************************************************************************************************************************************** CPUShack.Net All pictures and content are property of CPUShack.Net. All rights reserved. This material may not be published, broadcast, rewritten, or redistributed without the express written permission of CPUShack.Net
(************************************************************************************************************************** Contact The CPUShack****************************************************************************************************************(************************************************************************************************************ (**********************************************************************************************************Read More******************************************************************************************
GIPHY App Key not set. Please check settings