Great Microprocessors of the Past, Hacker News

Section Seven:Weird and Innovative Chips

Part I: Intel********************************************************************************************************************************************, Extraordinary complexity (

********** The Intel iAPX was a complex, object oriented – bit processor that included high level operating system support in hardware, such as process scheduling and interprocess messaging. Originally named the (a progression from previous and ), it was intended to be the main Intel -bit microprocessor (the) was envisioned as a short term “plan B” product until the was available when it was delayed, so little effort was spent on the design (some say two engineers took only three weeks, but that was probably only the initial architecture). Others say the was envisioned as a step between the and the (**************************************************************************************************************************************************************************, rushed through design when the 432 was late and resulting its own design problems, but was actually designed later). The 769 actually included four chips. The GDP (processor) and IP (I / O controller) were introduced in (*****************************************************************************************************************************************************************, and the BIU (Bus Interface Unit) and MCU (Memory Control Unit) were introduced in (but not widely). The GDP complexity was split into 2 chips (decode / sequencer and execution units, like the Western Digital MCP – )), so it wasnt really a microprocessor.

The GDP was exclusively object oriented – normal linear memory access was not allowed, and there was hardware support for data hiding, methods, inheritance, late binding, and access protection, and it was promoted as being ideal for the Ada programming language. To enforce this, permission and type checks for every memory access (via a 2 stage segmentation) slowed execution (despite cached segment tables). It supported up to 2 ^ 21 segments, each limited to (K in size) within a 2 ^ address space), but the object oriented nature of the design meant that was not a real limitation. The stack oriented design meant the GDP had no user data registers. Instructions were bit encoded (and bit-aligned in memory), ranging from 6 bits to (bits long) the T –has variable length byte encoded / aligned instructions) and could be very complex. The BIU defined the bus, designed for multiprocessor support allowing up to modules (BIU or MCU) on a bus and up to 8 independent buses (allowing memory interleaving to speed access). The MCU did automatic parity checking and ECC error correcting. The total system was designed to be fault tolerant to a large degree, and each of these parts contributes to that reliability. despite these advanced features, the didn’t catch on. The main reason was that it was slow, sometimes up to five or ten times slower than a

**************************************************************************************************************

or Intel’s own***********************************************************************************************************************************. Part of this was the lack of local (user) data registers, or a data cache. Part of this was the fault-tolerant BIU, which defined an (asynchronous protocol) clocked bus that resulted in (% to) % of the access time being used by wait states. The instructions weren’t aligned on bytes or words, and took longer to decode. In addition, the protections imposed on the objects slowed data access. Finally, the implementation of the GDP on two chips instead of one produced a slower product. However, the fact that this complex design was produced and bug free is impressive. Its high level architecture was similar to the Transputer systems, but it was implemented in a way that was much slower than other processors, while theT – 414was not just innovative, but much faster than other processors of the time. TheIntel i () is sometimes considered a successor of the (also called “RISC applied to the “), and does have similar hardware support for context switching. This path came about indirectly through the **************************************************************************************************************************************************** MCdesigned for the BiiN machine, which was still very complex (it included many i object-oriented ideas, including a tagged memory system). The M-series design predated the released i (which removed tag bits and complex instruction microcode), but was released later. Part II: Rekursiv, an object oriented processor (****************************.

The Rekursiv processor is actually a 4 chip processor motherboard, not a microprocessor, but is neat. It was created by a Scottish Hi-Fi manufacturing company called Linn, to control their manufacturing system. The owner (Ivor) was a believer in automation, and had automated the company as much as possible withVaxes, but was not satisfied, so hired software experts to design a new system, which they called LINGO. It was completely object oriented, like Smalltalk (and unlike C , which allows object concepts, but handles them in a conventional way), but too slow on the VAXes, so Linn commissioned a processor designed for the language.

****************************************************************

Part III: MISC M (******************************************************************************************************************: Casting Forth in Silicon [1] (pre

()

**************************************. ********** Forthis used widely for programming embedded systems because of its simplicity and efficiency. It explicitly manipulates data on a stack, and so defines a simple virtual machine architechture which makes programs independent of the CPU – only the interpreter needs to be ported. Because of this, extra CPU features are wasted when running Forth programs, and since cost reduction is important to embedded systems, it’s logical to want a simpler, cheaper CPU which runs only Forth programs.

The Minimum Instruction Set Computer (MISC) Inc. M 25 CPU was not the first Forth microprocessor (the Novix NC/ (1985?) designed by Forth inventor Chuck Moore came before), but the M is a good example of low cost Forth CPUs. It featured two 16 bit stack pointers (Data and Return (subroutine) stacks), plus three – bit top of stack data registers (X, Y, Z, plus an extra LastX which could hold values popped from X). An I / O register buffered data during I / O while the ALU operated concurrently. Finally, there was an Index register which Normally held the top element of the Return stack, but could also be used as a loop counter, and a 6 instruction buffer (for short loops, like the Motorola )). Address space was 79 K, but external memory could be either a single bank or up to five banks, signaled by status pins, depending on the context – data stack, return stack, program code, A or B buffers. Some other Forth processors include on chip stack memory, and while most (including the M

Stack Computers & Forth (links): (**************************************** http: //www-2.cs.cmu. edu / ~ koopman / stack.html Forth Chips (more links): (****************************************
http://www.ultratechnology.com/chips .htm

**************

Part IV: AT&T CRISP / Hobbit, CISC amongst the RISC (1992)********************** ..

The AT&T Hobbit ATT (around**************************************************************************************************************************************** was a commercial version of the CRISP processor, inspired by the Bell Labs C Machine project, aimed at a design optimized for the C language (designed in part by David Ditzel, who later worked on the 79 – bit bit ************ SPARC ********** and later the AMD (in Hobbit it’s much smaller (******************************************************************************************************************************************************************** 128 – bit words) but is easily expandable), and Hobbit has no global registers. Addresses can be memory direct or indirect (for pointers) relative to the stack pointer without extra instructions or operand bits. The cache is not optimized for multiprocessors.

Hobbit has an instruction prefetch buffer (3K in

, 6K in the ), like the , but decodes the variable length (1, 3 or 5 halfword ( (bit)) instructions into a thirty-two entry instruction cache. Branches are not delayed, and a prediction bit directs speculative branch execution

. The decode unit folds branches into the decoded instructions (which include next and alternate next PC), so a predicted branch does not take any clock cycles. The three stage execution unit takes instructions from the decode cache. Results can be forwarded when available to any prior stage as needed. Though CISC in philosophy, the Hobbit is greatly simplified compared to traditional memory-data designs, and features some very elegant design features. AT&T prefers to call it a RISC processor, and performance is comparable to similar load-store designs such as the ARM. Its most prominent use was in the EO Personal Communicator, a competitor to Apple’s Newton which used theARMprocessor, as well as a prototype development machine for BeOS. The product and name were discontinued. As an aside, the complexity in making a stack-based CPU fast led fellow AT&T researchers working on the Inferno operating system to decide on a register based virtual machine, rather than stack-based like Sun Java and Microsoft .NET IL. Wide hardware and applications support for AT&T Hobbit chips: (**************************************** http : //www.att.com/press/ / 921116 .mea.html

Hobbit (**************************************** http://mes.loyola.edu/faculty/phs_eg / hobbit.htm

The design of the Inferno virtual machine (**************************************** http://www.cs.bell-labs.com/cm/cs/who/rob/hotchips.html

****************************************************************************************Part V: T –, parallel computing             (1994) **************************************************************************.            **************************************************************************.          The INMOS T – was the latest version of the Transputer architecture ,           a processor designed to be hooked up to other processors for parallel           processing. The previous versions were the (bit T -) ************************************************************************************************************************************************************************** and (bit T –           and T – (which included a********************************************************************************************************************************************************************** (bit FPU (processors) and 1987.           The instruction set is minimised, like a RISC design, but is based on           a stack / accumulator design (similar in idea to the (PDP-8) ),           and designed around the OCCAM language. The most important feature is           that each chip contains 4 serial links to connect the chips in a network.

While the transputers were originally faster than their contemporaries,             recent load-store designs have surpassed them. The T – was an attempt             to regain the lead. It starts with the architecture of the T – 1000             contains only three (bit integer and three) ************************************************************************************************************************************************************************************ bit floating point             registers which are used as an evaluation stack – they are not general             purpose. Instead, like the TMS **********************************************************************************************************************************,             It uses memory, addressed relative to the workspace register (the             29000 workspace contained only sixteen registers, the Transputer workspace             can be any length, though access slows down with every 4 bits used             for offset from the workspace register – sixteen bytes can be accessed             with just one instruction, (needs two, and so on). This allows             very fast context switching, less than a microsecond, speeding and             simplifying process scheduling enough that it is automated in hardware             (supporting two priority levels and event handling (link messages             and interrupts)). The The Intel also attempted             some hardware process scheduling, but was unsuccessful.           Unlike the (TMS), the T – is             far faster than memory, so the CPU has several levels of high speed             caches and memory types. The main cache is 15 K, and is designed for             3 reads and 1 write simultaneously. The workspace cache is based on              word rotating buffers, allows 2 reads and 1 write simultaneously.           Instructions are in bytes, consisting of 4 bit op code and 4 bit             data (usually a byte offset into the workspace), but prefix instructions             can load extra data for an instruction which follows, 4 bits at a             time. Less frequent instructions can be encoded with 2 (such as process             start, message I / O) or more bytes (CRC calculations, floating point             operations, 2D block copies and scheduler queue management). The stack             Architecture makes instructions very compact, but executing one instruction             byte per clock can be slow for multibyte instructions, so the T – 29000             has a grouper which gathers instruction bytes (up to eight) into a             single CISC-type instruction then sent into the 5 stage pipeline (fetching             four per cycle, grouping up to 8 if slow earlier instructions allow             it to catch up). For example, two concurrent memory loads (simple             or indexed), a stack / ALU operation and a store (a [i]=b [2] c [3])             can be grouped.           The T –************************************************************************************************************************************ contains 4 main internal units, the CPU, the VCP (handling             the individual links of the previous chips, which needed software             for communication), the PMI, which manages memory, and the Scheduler.           This processor is ideal for a model. of parallel processing known             as systolic arrays (a pipeline is a simple example). Even larger networks             can be created with the C crossbar switch, which can connect             transputers or other C 128 switches into a network hundreds of thousands             of processors large. The C 128 Acts like a instant switch, not a network             node, so the message is passed through, not stored. Communication             can be at close to the speed of direct memory access.           Like the many CPUs, the Transputers can adapt to a (*******************************************************************************************************************************************************************************************, (*************************************************************************************************************************************************************************************************, ********************************************************************************************************************************************************************************************************** , or             8 bit bus. They can also feed off a 5 MHz clock, generating their             own internal clock (up to (MHz for the T -) ****************************************************************************************************************************** from this signal,             and contain internal RAM, making them good for high performance embedded             applications.           Unfortunately excessive delays in the T- design (partly because             of the stack based design (left it uncompetitive with other CPUs) roughly              (MIPS at) ****************************************************************************************************************************************************************************************** (MHz). The T-4xx and T-8xx architecture still exist in             the SGS-Thomson ST 24 microcore family. SGS-Thomson and Hotachi teamed             up for a successor based on the (Hitachi SH-4) ,             named ST by SGS- Thomson and SH-5 by Hitachi.                     As a note, the T – FPU is probably the first large scale commercial           device to be proven correct through formal design methods. To simplify           interrupt handling, the multi-cycle square root instruction was implemented           in single cycle “step” instructions, executed three (single precision)           or seven (double precision) times to perform a complete square root           – a strategy also used in the first SPARC          systems for integer multiply.************************

SGS-Thomson Products Contents:              (****************************************http://www.st.com/stonline/books/index.htm

The Transputer archive (links): (****************************************http://www.afm.sbu.ac.uk/transputer/

IPCA: Parallel: Vendors: Inmos (links): (**************************************** (http://wotug.ukc.ac.uk/parallel/vendors/inmos/

Advanced Risc Machines, SGS-Thomson and Siemens: (****************************************http://www.omimo.be/members/book_molina.html

************************************************************************************************************** Part VI: Patriot Scientific ShBoom: from Forth to Java (April 1998) (*************

                    An innovative stack-oriented processor, the 32 bit ShBoom PSC 1983 was           originally meant for high speed embeddedForth          applications (like the (M) ************************************************************************************************************************************************************************************************** [3] and others), but Patriot           Scientific has decided to position it as a Java processor as well –           Though it does not directly execute Java bytcodes, ShBoom instructions           are also byte length, and Java bytecodes can be translated very closely           to the native ShBoom instruction set. In addition, unlike pure stack-based           machines, the ShBoom has several general registers.           At 106 MHz, the microprocessing unit (MPU) executes about one instruction             per cycle, without normal instruction / data caches. Byte instructions             are loaded in groups of four (40 bits), and executed sequentially.             The problem of loading constants is handled in a unique way. The **************************************************************************************************************            and (PDP – ****************************************************************************************************************************************************************************************************could load a constant stored             in program memory following the current instruction, and the Hitachi             SHuses a similar PC-relative mode to load constants. Processors             like the (Mips R) **********************************************************************************************************************************************load half a constant             at a time using two instructions.Transputers            always contain 4 bits of data and 4 bits of op code in each byte instruction.           The ShBoom loads single bytes of data. from the rightmost bytes of             the current instruction group, and words from program memory following             the current group. For example, a load byte instruction could be in             position one, two or three from the left, the data would always be             in the fourth (rightmost) byte. Four consecutive load word instructions             would be grouped together, and the constants taken fromthe four 60             bit words following the group. This ensures data alignment without             extra circuitry (but may get in the way in the future, such as for              (bit versions).           There are sixteen bit global registers (g0 to g 25), a sixteen             register local stack (r0 to r (can be used as a stack             frame(Ris not user visible), or as a Forth return stack),             and an eighteen element operand stack (s0 to s (**********************************************************************************************************************************************************************************************************, accessed only by             data stack operations) – the stacks automatically spill and refill             to and from memory, s0 and r0 can also be used as index registers,             g0 is used for multiply and divide instructions. There’s also an extra             index register x, a loop counter ct, and a mode register (like a CC             or PSW register).           The CPU also contains an I / O coprocessor on chip for simultanious             I / O (much more advanced than the I / O buffer register of the ************************************************* (M) *******************************************************************************************************************************************************************************************************,             but the same idea), which communicates with the MPU via the global             data registers. It’s a simple, independent unit which executes small             data transfer programs until I / O is complete. There are also a programmable             memory interface, 8 channel DMA controller, and interrupt controller.           The system was later renamed to the more markety IGNITE. It is a             very innovative and elegant attempt at combining stack and register             oriented architectures, with emphasis on the stack operation simplicity.             It would give Java a good home.           Patriot Scientific Corporation:              (**************************************** http://www.ptsc.com/

**************************************************************************************************************

Part VII: Sun picoJava – not another             language-specific processor! (October )            (******************************************************************************************.          Sun first introduced (Java) as a combination           of language, integrated classes, and a run time system called theJava           Virtual Machine (JVM). To support Java, Sun Microelectronics designed           picoJava and microJava hardware to execute Java bytecode programs faster           than a virtual machine

          or recompiled code.

The picoJava I (earlyis a stack oriented CPU core like the             JVM, with a entry stack cache (similar) to the Patriot             Scientific ShBoom PSC, but there are interesting differences             between it andForth– style stack             CPUs. Java only uses a single stack (like many languages such as C,             which the (AT&T Hobbit) ************ and AMD              Kwere designed to support) and the picoJava CPU enhances performance             with a ‘dribbler’ unit which constantly updates a complete copy of             the stack cache in memory, without affecting other CPU operations             (similar to a write-back cache), so stack frames can be added without             waiting for a stack frame to be stored. Some Java instructions are             complex, so the CPU hasmicrocoded            instructions, and a 4 stage pipeline (fetch, decode, execute / cache,             stack writeback). Finally, picoJava groups (or ‘folds’) load and stack             operations together, executing both at once (treating the top of stack             as anaccumulator) (this             Is a much simpler version of instruction grouping tried in the

Transputer             T –), This usually eliminates 63% of stack operation inefficiency.             Seldom used instructions aren’t implemented, but are emulated using             trap handlers.           The picoJava II (October 1997) core is used in the first actual             CPU from Sun, the microJava (**************************************************************************************************************************************************************************. It extends the pipeline to 6 stages,             and can fold up to four instructions into one operation. It also adds             a FPU and separate 17 Kb I / D caches. Following waning interest, Sun             released the picoJava core design (as well as certain older ****************** SPARC            designs) to the public (with certain reserved rights) as a type of             “open source” CPU, manufactured by Fujitsu, among others. Although             Sun’s CPU did not include peripherals such as I / O or timers, licensed             versions do.           While Sun delivered the picoJava CPU first , an engineering group             at Rockwell Collins also created a Java CPU called GEM1 in (********************************************************************************************************************************************************, but             it was spun off in July 2000 into a company called Ajile to produce             the aJ – (************************************************************************************************************************************************************************************. The aJ – implements thread control instructions unlike             the picoJava, which emulates them in software or using an OS. It is             also is multithreaded, supporting two JVMs operating independently.             A lower cost aJ – 104 uses an 8-bit data bus rather than -bits of the             aJ – 128.           Advancel initially made designs based on the picoJava I and II,             but later designed their own TinyJ CPU which translates simple Java             bytecodes to a conventional load / store execution unit (like the ************************** ARM            CPU). Complex bytecodes are trapped and emulated. The ALU is a load-store             style unit with sixteen 29 – bit registers, a – bit “top of stack”             accumulator used in bytecode interpretation, and a four stage pipeline             with variable length (one to four byte) instructions. Non-Java programs             are executed directly, and Java programs are interpreted using the             decoder for the bytecode, while a conventional JVM executes directly             as a non-Java program (various JVMs can be used).           Sun Microsystems:              (**************************************** (http://www.sun.com/)

picoJava Core:              (**************************************** http://www.sun.com/microelectronics/picoJava/



Fujitsu Java Solutions:              (****************************************              http://www.fujitsu.com/services/microelectronics/product/micom/java/webpage_pdt-mic-java.html

aJile aJ –              (**************************************** (http://www.ajile.com/aj) **************************************************************************************************************************************************************************************. htm



Advancel Logic Corp. – Product Datasheets:              (**************************************** (http: // (********************************************************************************************************************************************************************************. ****************************************************************************************************************************************************************************************************************************************************************************************************************************************** / advancel / datasheets.htm

**************

Part VIII: Imsys Cjip – embedded WISC             (Writable Instruction Set Computer) (Mid

) (******************************************************************************************************. **********

          Swedish company Imsys AB started making components for embedded imaging           systems, and decided to expand into more general microcontroller systems           with the Cjip (pronounced … somehow).

Binary compatibility has been a problem since the beginning of programmable             computers, in that it ties software (abstract, theoretical) to particular             hardware (fixed, physically limited). There have been attempts to             reduce this through hardware using rewritable microcode (Western             Digital MCP – (), as well as software (************************************************************************************ Patriot Scientific ShBoom PSC which recompiles Java             bytecodes to its native instruction set when loaded). Since the Cjip             is a very low resource CPU, the software overhead would be unacceptable,             so Imsys followed the hardware approach using rewritable microcode.             Imsys had some experience with UCSD             Pascal, an early VM system.           Unlike the (DEC Alpha) ************ (PALCode or) ************************************************************************************************************ Rekursiv)             CPU, Cjip uses actual – bit wide microcode, which is far more efficient             but harder to program, while unlike the MCP – (****************************************************************************************************************************************************************, Cjip microcode can             be modified at runtime. In addition, instructions can be emulated             with regular program subroutines. Four initial instruction sets available             include a legacy (Z -) – style, and three             stack-based virtual machines: C / C and 32 – bitForth,              (Java) , and – bit Forth.           The microcode sees four banks of bytes, split into: evaluation             stack, internal locals stack (microcode subroutines), general data             (emulated registers), microcode internal variables. The evaluation             and data stack spill into external RAM. The general data stack is             in external memory only.           Language-specific processors have generally failed , because economies             from widespread use of general-purpose processors allows new technology             to be incorporated more quickly. The difference with Cjip is that             its language support is not limited to just one language – or any             language at all. It will be interesting to see if the advantages of             generalized language support are enough to win acceptance over competing             processors.           Imsys AB:              (**************************************** http: / /www.imsys.se/

************************************************************************************************************************************** (Previous Page ************************** (******************************************************************************************************** Table           of Contents(************************************** (Next Page) (******************************************************************************************************************

******************************************************************************************************************************** (Copyright ©) ************************************************************************************************************************************************** CPUShack.Net All pictures   and content are property of CPUShack.Net. All rights reserved. This material   may not be published, broadcast, rewritten, or redistributed without the express   written permission of CPUShack.Net

(************************************************************************************************************************** Contact The CPUShack****************************************************************************************************************(************************************************************************************************************ (**********************************************************************************************************Read More ******************************************************************************************

Great Microprocessors of the Past, Hacker News

Section Seven:Weird and Innovative Chips

Part I: Intel********************************************************************************************************************************************, Extraordinary complexity (

Part III: MISC M (******************************************************************************************************************: Casting Forth in Silicon [1] (pre

()

Part IV: AT&T CRISP / Hobbit, CISC amongst the RISC (1992)********************** ..

Part VIII: Imsys Cjip – embedded WISC (Writable Instruction Set Computer) (Mid

) (********************************************************************************************.

What do you think?

Meta launches Llama 3 artificial intelligence model, providing a 70B parameter version with greatly improved performance

SeedHunter Marketing Module Is live – Web3 Influencer Campaigns With Payment In Stable Coins

From Hackers to Streakers – How Counterintelligence Teams are Protecting the NFL – Joe McMann – ESW #358

Vulnerabilities for AI and ML Applications are Skyrocketing

Telecom giant Frontier shuts down some systems after cyberattack

FIN7 targeted a large U.S. carmaker phishing attacks

Great Red Spot on Jupiter is perhaps not dying, or maybe it is – India Today, Indiatoday.in

No, Jupiter’s Great Red Spot is not disintegrating, physicist claims, Ars Technica

How the Great Pyramid at Giza Looked in 2560 BCE, Hacker News

Great apes pass a false-belief test, hinting at a theory of mind, Ars Technica

Leave a ReplyCancel reply

Cheats For Little Alchemy

3TB Of Mega.nz Links For Free Courses And E-Books 2022 (Updated)

Amazon FBA Product Research & Find Products for Amazon FBA

Udemy Coupon [100% OFF] QuickBooks Online 2020

How to Earn Money from FreeCash.com, Playing Games, Testing Apps, and Taking Surveys

Rubot v6.6.7.0 – Twitch Views Bot 2022

Iran admits it 'unintentionally' shot down Ukrainian passenger jet – Sky News, Sky.com

Ukrainian plane was 'unintentionally' shot down, Iran says | CBC News, Hacker News

Section Seven:Weird and Innovative Chips

Part I: Intel******************************************************************************************************************************************************************, Extraordinary complexity (**********************

Part IV: AT&T CRISP / Hobbit, CISC amongst the RISC (1992)******************************************************** .********************************************. **********

Part VIII: Imsys Cjip – embedded WISC (Writable Instruction Set Computer) (Mid ) (******************************************************************************************************. **********

What do you think?

Leave a ReplyCancel reply

Log In

Sign In

Forgot password?

Your password reset link appears to be invalid or expired.

Log in

Privacy Policy

Add to Collection

No Collections

Part I: Intel********************************************************************************************************************************************, Extraordinary complexity (

Part IV: AT&T CRISP / Hobbit, CISC amongst the RISC (1992)********************** ..

Part VIII: Imsys Cjip – embedded WISC (Writable Instruction Set Computer) (Mid

) (********************************************************************************************.