in ,

Great Microprocessors of the Past, Hacker News



Section Seven:Weird and Innovative Chips


Part I: Intel******************************************************************************************************************************************************************, Extraordinary complexity             (**********************

**********           The Intel iAPX was a complex, object oriented – bit processor that           included high level operating system support in hardware, such as process           scheduling and interprocess messaging. Originally named the (a           progression from previous and ), it was intended to be the           main Intel -bit microprocessor (the)          was envisioned as a short term “plan B” product until the was available           when it was delayed, so little effort was spent on the design (some           say two engineers took only three weeks, but that was probably only           the initial architecture). Others say the          was envisioned as a step between the          and the (**************************************************************************************************************************************************************************, rushed through design when the 432 was late and resulting           its own design problems, but was actually designed later). The 769 actually           included four chips. The GDP (processor) and IP (I / O controller) were           introduced in (*****************************************************************************************************************************************************************, and the BIU (Bus Interface Unit) and MCU (Memory           Control Unit) were introduced in (but not widely). The GDP complexity           was split into 2 chips (decode / sequencer and execution units, like the           Western Digital MCP – )), so it wasnt           really a microprocessor.           

The GDP was exclusively object oriented – normal linear memory access             was not allowed, and there was hardware support for data hiding, methods,             inheritance, late binding, and access protection, and it was promoted             as being ideal for the Ada programming language. To enforce this,             permission and type checks for every memory access (via a 2 stage             segmentation) slowed execution             (despite cached segment tables). It supported up to 2 ^ 21 segments,             each limited to (K in size) within a 2 ^ address space), but the             object oriented nature of the design meant that was not a real limitation.             The stack oriented design meant the GDP had no user data registers.             Instructions were bit encoded (and bit-aligned in memory), ranging             from 6 bits to (bits long) the T –has             variable length byte encoded / aligned instructions) and could be very             complex.           The BIU defined the bus, designed for multiprocessor support allowing             up to modules (BIU or MCU) on a bus and up to 8 independent buses             (allowing memory interleaving to speed access). The MCU did automatic             parity checking and ECC error correcting. The total system was designed             to be fault tolerant to a large degree, and each of these parts contributes             to that reliability.           despite these advanced features, the didn’t catch on. The main             reason was that it was slow, sometimes up to five or ten times slower             than a


or Intel’s own***********************************************************************************************************************************.             Part of this was the lack of local (user) data registers, or a data             cache. Part of this was the fault-tolerant BIU, which defined an (asynchronous             protocol) clocked bus that resulted in (% to) % of the access time             being used by wait states. The instructions weren’t aligned on bytes             or words, and took longer to decode. In addition, the protections             imposed on the objects slowed data access. Finally, the implementation             of the GDP on two chips instead of one produced a slower product.             However, the fact that this complex design was produced and bug free             is impressive.           Its high level architecture was similar to the Transputer            systems, but it was implemented in a way that was much slower than             other processors, while theT – 414was not just             innovative, but much faster than other processors of the time.                     TheIntel i () is sometimes considered a           successor of the (also called “RISC applied to the“), and does           have similar hardware support for context switching. This path came           about indirectly through the**************************************************************************************************************************************************** MCdesigned           for the BiiN machine, which was still very complex (it included many           i object-oriented ideas, including a tagged memory system). The M-series           design predated the released i (which removed tag bits and complex           instruction microcode), but was released later.           Part II: Rekursiv, an object oriented             processor (****************************.

          The Rekursiv processor is actually a 4 chip processor motherboard, not           a microprocessor, but is neat. It was created by a Scottish Hi-Fi manufacturing           company called Linn, to control their manufacturing system. The owner           (Ivor) was a believer in automation, and had automated the company as           much as possible withVaxes, but was not           satisfied, so hired software experts to design a new system, which they           called LINGO. It was completely object oriented, like Smalltalk (and           unlike C , which allows object concepts, but handles them in a conventional           way), but too slow on the VAXes, so Linn commissioned a processor designed           for the language.           

This is not the only processor designed specifically for a language             that is slow on other CPUs. Several specialized LISP processors, such             as the Scheme – lisp processor, were created, but this chip is unique             in its object oriented features at a time when the concept was not             well-known (actually, I hadn’t the foggiest idea of ​​what object-oriented             programming was when I first learned about it – obvious from reading             unrevised versions of this description). It also manages to support             objects without the slowness of theIntel          The Rekursiv processor features a writable instruction set, and             is highly parallel. The four chips were Numerik, Logik, Objekt, and             Klock.           The CPU itself consisted of Numerik and Logik. Numerik was the ALU,             based on AMD************************************************************************************************************************************ – seriesbitslice CPU             components (sixteen-bit registers, ALU, barrel shifter, (x) – bit             multiplier). The CPU was similar to thePatriot             Scientific PSC, with sixteen registers, an evaluation stack,             and a return address stack. A (k area) k X bit words) held             microcode), allowing an instruction             set to be constructed on the fly, and could change for different objects.             There were two program counters, one for application instructions,             one for the microcode routines which implement them. Microcode used             sixty-field – bit words.           Logik was the instruction sequencer.           Objekt was the object manager / MMU , which swapped objects to and             from disk as needed (completely invisible to the CPU, allowing microcoded             instructions to access objects without generating exceptions and forcing             them to roll back and restart – microcode could be recursive, hence             the processor’s name). Objects were identified by -bit tags, with             the actual reference, types, and sizes stored in three 64 K hash table             (collisions were resolved by a fourth table holding the ID of the             object actually stored in the tables at a given time). Objects could             be relocated transparently, facilitating garbage collection.           Klock was the clock and support circuitry .           It executed LINGO fast enough, and is a perfect match between language             and CPU, but it could also use more conventional languages, such as             Smalltalk or C. Unfortunately, Linn did not have the resources to             pursue this very promising (the prototype was “surprisingly easy”             to implement) architecture. However, the writable instruction set             concept (specifically, isolating CPU implementation from the program             code) was resurrected and automated in the(Intel (x) ************************************************************************************************************************************************************************************** (instruction set on a custom) ************************** VLIW            processor.           Rekursiv:              (****************************************               


Part III: MISC M (********************************************************************************************************************************************************************************************************: Casting Forth             in Silicon [1] (pre

******************************************************************************************************************** () ******************************

**************************************. **********                      Forthis used widely for programming           embedded systems because of its simplicity and efficiency. It explicitly           manipulates data on a stack, and so defines a simple virtual machine           architechture which makes programs independent of the CPU – only the           interpreter needs to be ported. Because of this, extra CPU features           are wasted when running Forth programs, and since cost reduction is           important to embedded systems, it’s logical to want a simpler, cheaper           CPU which runs only Forth programs.           

The Minimum Instruction Set Computer (MISC) Inc. M 25 CPU was not             the first Forth microprocessor (the Novix NC/ (1985?) designed             by Forth inventor Chuck Moore came before), but the M is a good             example of low cost Forth CPUs. It featured two 16 bit stack pointers             (Data and Return (subroutine) stacks), plus three – bit top of stack             data registers (X, Y, Z, plus an extra LastX which could hold values             popped from X). An I / O register buffered data during I / O while the             ALU operated concurrently. Finally, there was an Index register which             Normally held the top element of the Return stack, but could also             be used as a loop counter, and a 6 instruction buffer (for short loops,             like theMotorola )).           Address space was 79 K, but external memory could be either a single             bank or up to five banks, signaled by status pins, depending on the             context – data stack, return stack, program code, A or B buffers.             Some other Forth processors include on chip stack memory, and while             most (including the M

) were (bit, some) ************************************************************************************************************************************************************************************************** bit Forth processors             have also been developed.           The simplicity of design allows the M 20 (and most other Forth CPUs,             such as the more recent 7, (transistor MuP) (also designed by Chuck             Moore), which includes a composite video generator on chip) to execute             instructions in only two cycles (load, execute), or one cycle each             from the instruction cache, making them faster than more complex CPUs             (though instructions do less, the higher clock speed usually compensates).             Stack advocates often cite this as the strongest advantage for stack             based designs, though critics contend that the state nature of stacks             compared to registers make conventional speedup tricks such as pipelining             and superscalar execution far more complex than using a register array.             As it is, register-based load-store processors dominate when it comes             to speed.           Other prominent Forth-based microprocessors include the Harris RTX – 2005,             a descendant of the NC (the “- 2005 “like the name of theMotorola             92010comes from the fact that it only uses about 2000 gates in             its design) which has the ability to group certain instructions like             theT – (Transputer) ************ and microJava            processors. Chuck Moore went on to design the 20 – bit MuP 25, and is             involved in the highly integrated F 21 (expected late****************************************************************************************************************** / / early 2019             CPUs. A bit CPU, the FRISC-3 (Forth Reduced Instruction Set Computer)             was produced by Silicon Composers and renamed the SC – 32, and includes             an automatic stack-to-memory cache, eliminating the main weakness             of Forth chips, the fixed stack sizes.                     [1] Sun Microelectronics’ first slogan for its Java Processors was “Casting           Java in Silicon “.************************ Stack Computers: the new wave (online book):              (****************************************                

Stack Computers & Forth (links):              (**************************************** http: //www-2.cs.cmu. edu / ~ koopman / stack.html             Forth Chips (more links):              (**************************************** .htm          


Part IV: AT&T CRISP / Hobbit, CISC             amongst the RISC (1992)********************************************************            .********************************************. **********

          The AT&T Hobbit ATT (around**************************************************************************************************************************************** was a commercial version           of the CRISP processor, inspired by the Bell Labs C Machine project,           aimed at a design optimized for the C language (designed in part by           David Ditzel, who later worked on the 79 – bit bit ************ SPARC **********           and later the AMD (in Hobbit it’s much smaller (******************************************************************************************************************************************************************** 128 – bit           words) but is easily expandable), and Hobbit has no global registers.           Addresses can be memory direct or indirect (for pointers) relative to           the stack pointer without extra instructions or operand bits. The cache           is not optimized for multiprocessors.           

Hobbit has an instruction prefetch buffer (3K in

, 6K in the             Payeer), like the, but decodes the             variable length (1, 3 or 5 halfword ( (bit)) instructions into a             thirty-two entry instruction cache. Branches are not delayed, and             a prediction bit directs speculative             branch execution

. The decode unit folds branches into the decoded             instructions (which include next and alternate next PC), so a predicted             branch does not take any clock cycles. The three stage execution unit             takes instructions from the decode cache. Results can be forwarded             when available to any prior stage as needed.           Though CISC in philosophy, the Hobbit is greatly simplified compared             to traditional memory-data designs, and features some very elegant             design features. AT&T prefers to call it a RISC processor, and             performance is comparable to similar load-store designs such as the             ARM. Its most prominent use was in the             EO Personal Communicator, a competitor to Apple’s Newton which used             theARMprocessor, as well as a prototype             development machine for BeOS. The product and name were discontinued.           As an aside, the complexity in making a stack-based CPU fast led             fellow AT&T researchers working on the Inferno operating system             to decide on a register based virtual machine, rather than stack-based             like Sun Java and Microsoft .NET IL.           Wide hardware and applications support for AT&T Hobbit chips:              (**************************************** http : // / 921116 .mea.html            

Hobbit              (****************************************      / hobbit.htm            

The design of the Inferno virtual machine              (****************************************              

****************************************************************************************Part V: T –, parallel computing             (1994) **************************************************************************.            **************************************************************************.          The INMOS T – was the latest version of the Transputer architecture ,           a processor designed to be hooked up to other processors for parallel           processing. The previous versions were the (bit T -) ************************************************************************************************************************************************************************** and (bit T –           and T – (which included a********************************************************************************************************************************************************************** (bit FPU (processors) and 1987.           The instruction set is minimised, like a RISC design, but is based on           a stack / accumulator design (similar in idea to the (PDP-8) ),           and designed around the OCCAM language. The most important feature is           that each chip contains 4 serial links to connect the chips in a network.           

While the transputers were originally faster than their contemporaries,             recent load-store designs have surpassed them. The T – was an attempt             to regain the lead. It starts with the architecture of the T – 1000             contains only three (bit integer and three) ************************************************************************************************************************************************************************************ bit floating point             registers which are used as an evaluation stack – they are not general             purpose. Instead, like the TMS **********************************************************************************************************************************,             It uses memory, addressed relative to the workspace register (the             29000 workspace contained only sixteen registers, the Transputer workspace             can be any length, though access slows down with every 4 bits used             for offset from the workspace register – sixteen bytes can be accessed             with just one instruction, (needs two, and so on). This allows             very fast context switching, less than a microsecond, speeding and             simplifying process scheduling enough that it is automated in hardware             (supporting two priority levels and event handling (link messages             and interrupts)). TheThe Intelalso attempted             some hardware process scheduling, but was unsuccessful.           Unlike the (TMS), the T – is             far faster than memory, so the CPU has several levels of high speed             caches and memory types. The main cache is 15 K, and is designed for             3 reads and 1 write simultaneously. The workspace cache is based on              word rotating buffers, allows 2 reads and 1 write simultaneously.           Instructions are in bytes, consisting of 4 bit op code and 4 bit             data (usually abyte offset into the workspace), but prefix instructions             can load extra data for an instruction which follows, 4 bits at a             time. Less frequent instructions can be encoded with 2 (such as process             start, message I / O) or more bytes (CRC calculations, floating point             operations, 2D block copies and scheduler queue management). The stack             Architecture makes instructions very compact, but executing one instruction             byte per clock can be slow for multibyte instructions, so the T – 29000             has a grouper which gathers instruction bytes (up to eight) into a             single CISC-type instruction then sent into the 5 stage pipeline (fetching             four per cycle, grouping up to 8 if slow earlier instructions allow             it to catch up). For example, two concurrent memory loads (simple             or indexed), a stack / ALU operation and a store (a [i]=b [2] c [3])             can be grouped.           The T –************************************************************************************************************************************ contains 4 main internal units, the CPU, the VCP (handling             the individual links of the previous chips, which needed software             for communication), the PMI, which manages memory, and the Scheduler.           This processor is ideal for a model. of parallel processing known             as systolic arrays (a pipeline is a simple example). Even larger networks             can be created with the C crossbar switch, which can connect             transputers or other C 128 switches into a network hundreds of thousands             of processors large. The C 128 Acts like a instant switch, not a network             node, so the message is passed through, not stored. Communication             can be at close to the speed of direct memory access.           Like the many CPUs, the Transputers can adapt to a (*******************************************************************************************************************************************************************************************, (*************************************************************************************************************************************************************************************************, ********************************************************************************************************************************************************************************************************** , or             8 bit bus. They can also feed off a 5 MHz clock, generating their             own internal clock (up to (MHz for the T -) ****************************************************************************************************************************** from this signal,             and contain internal RAM, making them good for high performance embedded             applications.           Unfortunately excessive delays in the T- design (partly because             of the stack based design (left it uncompetitive with other CPUs) roughly              (MIPS at) ****************************************************************************************************************************************************************************************** (MHz). The T-4xx and T-8xx architecture still exist in             the SGS-Thomson ST 24 microcore family. SGS-Thomson and Hotachi teamed             up for a successor based on the (Hitachi SH-4) ,             named ST by SGS- Thomson and SH-5 by Hitachi.                     As a note, the T – FPU is probably the first large scale commercial           device to be proven correct through formal design methods. To simplify           interrupt handling, the multi-cycle square root instruction was implemented           in single cycle “step” instructions, executed three (single precision)           or seven (double precision) times to perform a complete square root           – a strategy also used in the first SPARC          systems for integer multiply.************************

SGS-Thomson Products Contents:              (****************************************             

The Transputer archive (links):              (****************************************            

IPCA: Parallel: Vendors: Inmos (links):              (**************************************** (            

Advanced Risc Machines, SGS-Thomson and Siemens:              (****************************************          

************************************************************************************************************** Part VI: Patriot Scientific ShBoom:             from Forth to Java (April 1998) (*************

                    An innovative stack-oriented processor, the 32 bit ShBoom PSC 1983 was           originally meant for high speed embeddedForth          applications (like the (M) ************************************************************************************************************************************************************************************************** [3] and others), but Patriot           Scientific has decided to position it as a Java processor as well –           Though it does not directly execute Java bytcodes, ShBoom instructions           are also byte length, and Java bytecodes can be translated very closely           to the native ShBoom instruction set. In addition, unlike pure stack-based           machines, the ShBoom has several general registers.           At 106 MHz, the microprocessing unit (MPU) executes about one instruction             per cycle, without normal instruction / data caches. Byte instructions             are loaded in groups of four (40 bits), and executed sequentially.             The problem of loading constants is handled in a unique way. The**************************************************************************************************************            and (PDP – ****************************************************************************************************************************************************************************************************could load a constant stored             in program memory following the current instruction, and the Hitachi             SHuses a similar PC-relative mode to load constants. Processors             like the (Mips R) **********************************************************************************************************************************************load half a constant             at a time using two instructions.Transputers            always contain 4 bits of data and 4 bits of op code in each byte instruction.           The ShBoom loads single bytes of data. from the rightmost bytes of             the current instruction group, and words from program memory following             the current group. For example, a load byte instruction could be in             position one, two or three from the left, the data would always be             in the fourth (rightmost) byte. Four consecutive load word instructions             would be grouped together, and the constants taken fromthe four 60             bit words following the group. This ensures data alignment without             extra circuitry (but may get in the way in the future, such as for              (bit versions).           There are sixteen bit global registers (g0 to g 25), a sixteen             register local stack (r0 to r (can be used as astack             frame(Ris not user visible), or as a Forth return stack),             and an eighteen element operand stack (s0 to s (**********************************************************************************************************************************************************************************************************, accessed only by             data stack operations) – the stacks automatically spill and refill             to and from memory, s0 and r0 can also be used as index registers,             g0 is used for multiply and divide instructions. There’s also an extra             index register x, a loop counter ct, and a mode register (like a CC             or PSW register).           The CPU also contains an I / O coprocessor on chip for simultanious             I / O (much more advanced than the I / O buffer register of the************************************************* (M) *******************************************************************************************************************************************************************************************************,             but the same idea), which communicates with the MPU via the global             data registers. It’s a simple, independent unit which executes small             data transfer programs until I / O is complete. There are also a programmable             memory interface, 8 channel DMA controller, and interrupt controller.           The system was later renamed to the more markety IGNITE. It is a             very innovative and elegant attempt at combining stack and register             oriented architectures, with emphasis on the stack operation simplicity.             It would give Java a good home.           Patriot Scientific Corporation:              (****************************************          


Part VII: Sun picoJava – not another             language-specific processor! (October)            (******************************************************************************************.          Sun first introduced (Java) as a combination           of language, integrated classes, and a run time system called theJava           Virtual Machine (JVM). To support Java, Sun Microelectronics designed           picoJava and microJava hardware to execute Java bytecode programs faster           than a virtual machine

          or recompiled code.           

The picoJava I (earlyis a stack oriented CPU core like the             JVM, with a entry stack cache (similar) to the Patriot             Scientific ShBoom PSC, but there are interesting differences             between it andForth– style stack             CPUs. Java only uses a single stack (like many languages ​​such as C,             which the (AT&T Hobbit) ************ and AMD              Kwere designed to support) and the picoJava CPU enhances performance             with a ‘dribbler’ unit which constantly updates a complete copy of             the stack cache in memory, without affecting other CPU operations             (similar to a write-back cache), so stack frames can be added without             waiting for a stack frame to be stored. Some Java instructions are             complex, so the CPU hasmicrocoded            instructions, and a 4 stage pipeline (fetch, decode, execute / cache,             stack writeback). Finally, picoJava groups (or ‘folds’) load and stack             operations together, executing both at once (treating the top of stack             as anaccumulator) (this             Is a much simpler version of instruction grouping tried in the

Transputer             T –), This usually eliminates 63% of stack operation inefficiency.             Seldom used instructions aren’t implemented, but are emulated using             trap handlers.           The picoJava II (October 1997) core is used in the first actual             CPU from Sun, the microJava (**************************************************************************************************************************************************************************. It extends the pipeline to 6 stages,             and can fold up to four instructions into one operation. It also adds             a FPU and separate 17 Kb I / D caches. Following waning interest, Sun             released the picoJava core design (as well as certain older****************** SPARC            designs) to the public (with certain reserved rights) as a type of             “open source” CPU, manufactured by Fujitsu, among others. Although             Sun’s CPU did not include peripherals such as I / O or timers, licensed             versions do.           While Sun delivered the picoJava CPU first , an engineering group             at Rockwell Collins also created a Java CPU called GEM1 in (********************************************************************************************************************************************************, but             it was spun off in July 2000 into a company called Ajile to produce             the aJ – (************************************************************************************************************************************************************************************. The aJ – implements thread control instructions unlike             the picoJava, which emulates them in software or using an OS. It is             also is multithreaded, supporting two JVMs operating independently.             A lower cost aJ – 104 uses an 8-bit data bus rather than -bits of the             aJ – 128.           Advancel initially made designs based on the picoJava I and II,             but later designed their own TinyJ CPU which translates simple Java             bytecodes to a conventional load / store execution unit (like the************************** ARM            CPU). Complex bytecodes are trapped and emulated. The ALU is a load-store             style unit with sixteen 29 – bit registers, a – bit “top of stack”             accumulator used in bytecode interpretation, and a four stage pipeline             with variable length (one to four byte) instructions. Non-Java programs             are executed directly, and Java programs are interpreted using the             decoder for the bytecode, while a conventional JVM executes directly             as a non-Java program (various JVMs can be used).           Sun Microsystems:              (**************************************** (             

picoJava Core:              (****************************************


Fujitsu Java Solutions:              (****************************************                

aJile aJ –              (**************************************** ( **************************************************************************************************************************************************************************************. htm


Advancel Logic Corp. – Product Datasheets:              (**************************************** (httpadvancel / datasheets.htm          


Part VIII: Imsys Cjip – embedded WISC             (Writable Instruction Set Computer) (Mid

) (******************************************************************************************************. **********           

          Swedish company Imsys AB started making components for embedded imaging           systems, and decided to expand into more general microcontroller systems           with the Cjip (pronounced … somehow).           

Binary compatibility has been a problem since the beginning of programmable             computers, in that it ties software (abstract, theoretical) to particular             hardware (fixed, physically limited). There have been attempts to             reduce this through hardware using rewritable microcode (Western             Digital MCP – (), as well as software (************************************************************************************ Patriot Scientific ShBoom PSC which recompiles Java             bytecodes to its native instruction set when loaded). Since the Cjip             is a very low resource CPU, the software overhead would be unacceptable,             so Imsys followed the hardware approach using rewritable microcode.             Imsys had some experience with UCSD             Pascal, an early VM system.           Unlike the (DEC Alpha) ************ (PALCode or) ************************************************************************************************************ Rekursiv)             CPU, Cjip uses actual – bit wide microcode, which is far more efficient             but harder to program, while unlike the MCP – (****************************************************************************************************************************************************************, Cjip microcode can             be modified at runtime. In addition, instructions can be emulated             with regular program subroutines. Four initial instruction sets available             include a legacy (Z -) – style, and three             stack-based virtual machines: C / C and 32 – bitForth,              (Java) , and – bit Forth.           The microcode sees four banks of bytes, split into: evaluation             stack, internal locals stack (microcode subroutines), general data             (emulated registers), microcode internal variables. The evaluation             and data stack spill into external RAM. The general data stack is             in external memory only.           Language-specific processors have generally failed , because economies             from widespread use of general-purpose processors allows new technology             to be incorporated more quickly. The difference with Cjip is that             its language support is not limited to just one language – or any             language at all. It will be interesting to see if the advantages of             generalized language support are enough to win acceptance over competing             processors.           Imsys AB:              (**************************************** http: / /           

************************************************************************************************************************************** (Previous Page************************** (******************************************************************************************************** Table           of Contents(************************************** (Next Page) (******************************************************************************************************************

******************************************************************************************************************************** (Copyright ©) ************************************************************************************************************************************************** CPUShack.Net All pictures   and content are property of CPUShack.Net. All rights reserved. This material   may not be published, broadcast, rewritten, or redistributed without the express   written permission of CPUShack.Net

(************************************************************************************************************************** Contact The CPUShack****************************************************************************************************************(************************************************************************************************************ (**********************************************************************************************************Read More******************************************************************************************Payeer

What do you think?

Leave a Reply

Your email address will not be published.

GIPHY App Key not set. Please check settings

Iran admits it 'unintentionally' shot down Ukrainian passenger jet – Sky News,

Iran admits it 'unintentionally' shot down Ukrainian passenger jet – Sky News,

Ukrainian plane was 'unintentionally' shot down, Iran says | CBC News, Hacker News

Ukrainian plane was 'unintentionally' shot down, Iran says | CBC News, Hacker News