in ,

Compiling my own SPARC CPU inside a cheap FPGA, Hacker News


                 

            

submit to programming reddit            

(October 2019)

For the TL; DR crowd:

I am developing a strange habit – I keep “twisting” products from failed companies … into nice toys.

For science! 🙂

After completing myAtomicPI saga, something else caught my attention – a very cheap FPGA board, that came out of yet another failed company – thePano Logic G2,

Let's make a CPU!

Why not compile anopen-sourceCPU inside this FPGA?
And in fact, since this FPGA is quite big, why not make it a multi-core one?

And then compile and run programs inside it – with anopen-sourcecross-compiler, that uses anopen-sourcereal-time OS? The same OS that most European satellites and their instruments are using?

Why not, indeed!

(he said, a month ago – and dove into the abyss

).

How fast could it go? And since the Pano Logic was meant to be a thin client, and comes with VGA, USB, Ethernet … it’s packing all the pieces necessary to create a standalone computer! Could it be that this can be made intoa fully open-source computer?

Keep reading – I believe you’ll learn a thing or two.

PS. The material is heavily technical and long, so I’ll try to lighten it up here and there, with the occasional rant / funny picture. Also, please rememberthat I am a software developer, not a HW one; I simply enjoy fooling around with technology like this, so take everything said in this blog post – and in the referenced repos – witha grain of salt.

First step: the hardware

This adventure began a few months ago, when I read amagnificent articlefrom Tom Verbeure – a principal hardware engineer at NVIDIA. Tom built a real-time ray tracer on a dirt-cheap FPGA board; and “dirt-cheap” is not an exaggeration, since Tom bought 20 G1s and 25 G2s off e-Bay …

I’ll just quote him, so you can understand the “why” and “how”:

  

Pano Logic was a Bay Area startup that wanted to get rid of PCs in large organizations by replacing them with tiny, CPU-less thin clients; connected to a central server. Think of them as VNC replacements. No CPU? No software upgrades! No viruses!

… The thin clients had a wired Ethernet interface, a couple of USB ports, an audio port and a video port. And all this was glued together with an FPGA

… The company has been defunct since 2013 and the clients are not supported by anything. But they are amazing for hobby purposes and can be bought dirt cheap on eBay.

So I got my hands on a Pano Logic – in particular, a G2 model; with the Spartan6 LX 100 FPGA inside it. This is a rather large FPGA, promising far more power than any hobbyist has a right to want – but since Pano Logic (the company) failed, the product itself is of no use to anyone but hackers and tinkerers; and it’s therefore sold at amazing bargain prices.

I followedTom’s instructions– first dismantling the box, and then soldering wires to the FPGA connector:

Soldering JTAG wires

Soldering JTAG wires

On the other end, I soldered 6 pins from pin header strips – and used a small piece of perfboard, to create an “adapter” of sorts. This allowed me to “plug” the 6 cables into the JTAG connector of a Xilinx programmer. Note that these programmers can be found for cheap on eBay(see Tom’s article linked above for details).

Blu-Tack is sometimes better than helping hands

When soldering, Blu-Tack is sometimes better than helping hands

The end result - FPGA visible in IMPACT

The end result – FPGA visible in IMPACT

The last picture you see above, shows the IMPACT tool – made by Xilinx, the company that created these FPGAs – being able to see the chip.

Intermission – on open source, and the abyss

Just like many other engineers, I learned over the years to hate non-determinism; in all its forms, and all its manifestations. This means that I gravitate towards open-source operating systems; where I can use my engineering skills to fully tracewhathappened, andwhy; and fully control the OS’s behavior.

I don’t want my computer to decide to upgrade while I am giving a presentation. I don’t want some fancy antivirus decide that it must “scan” every .c and .cpp file read by my compiler during a build, because it performs“on-access scans”.

I wantmyself– not some mega-corporation –to be in control of my own hardware. And to automate all the workflows and processes that I need; like installing my developing environments on new machines by running a few simple one-liners …

bash $ sudo apt install gcc -(8)vim git make cscope exuberant-ctags tmuxbash $ git clone https:// github.com / ttsiodras / dotfilesbash $ git clone https:// github.com / ttsiodras / dotvim.(VIM)...

… and seeing all the myriads of complex dependencies being perfectly resolved under my Debian (or in a similar way, under my Arch) …

… or orchestrating the creation of a complex open-source cross-compiler; allowing  me to deterministically build applications witha real-time, freely-accessible OS(that happens to fly on many European satellites)

… or installing a company’s HW synthesis tools – via …

bash $ sudo apt install spartan6-xilinx-synthesis

Actually ….
That last one was a lie.

NEVER GONNA HAPPEN.EVER. Abandon all hope on that front, kids. Think about it – the way commercial companies operate, it makes ZERO financial sense for them to break down 15 GB installs

of-monstrous hodge-podges of kitchen-sinks

into a proper dependency graph of packages using each other – and allow you to `apt install` only the parts you need.

HW is cheap – throw money to the problem! Yes?

But what does that mean for our endeavor with the Pano Logic G2?

Well, the last version of the Xilinx synthesis tools that was supporting the Spartan family under Linux, was the freely available ISE 14 .7 WebPACK. I have installed this in my machine, and it does – thankfully – allow me to synthesize for an older Spartan3 board I have.

It’s also miniscule. So tiny!

bash $ du -s -h Xilinx/14 .7/****************************************************************************************************************************************************************************************************************************** (G Xilinx)/14 7/

But I digress – and forgot to mention …that final Linux version of WebPACK doesn’t support Spartan6 chips.

Let me repeat that – in case you didn’t catch it – in a way that will make it clear:

bash $ sudo apt install gcc -(8) We are sorry,but we detected a(9)year old CPU that is not supportedby the freely available version of our compilerPlease buy our BRAND NEW CPU - WITH BONUS NSA MANAGEMENT EXTENSIONS!Or sell your left kidney and buy our BRAND NEW COMPILER TOOLCHAIN ​​that supports everything!

(sigh)

Searching the Xilinx site some more, we see that there is a free version of the ISE WebPACK that targets Spartan6 devices – but only for Windows.

After downloading and unzipping this package … what do you know! That setup actually installs a Virtual Machine, containing …

… a Linux distribution!

Nowthatwould be a nice example of somethingactually“ironic” – if one were inclined to, erm, tellAlanis Morissetteabout the true meaning of the word.

But let’s continue our investigation – and have a look at this.OVAfile:

$ cd Xilinx_ISE_S6_Win 10 _ 14.7_ISE / ova$ tar tvf ISE_S6_VM.ova- rw-r ----- vboxovf 10 / vbox_v5.(2).VBOX_VERSION_PATCHr 11124252018-02-03:39ISE_S6_VM.OVF- rw-rw ---- vboxovf 10 / vbox_v5.(2).VBOX_VERSION_PATCHr 1172532321282018-02-0300:39ISE_S6_VM-disk001.VMDK

The.VMDKfile contained inside is a virtual drive. After extracting it from the.ovawith (tar) , we discover that this is a dynamic volume; so it can’t be mounted as-is with (qemu-nbd) .

It must first be converted to a “normal” VMDK – and then, we can mount it:

$ qemu-img convert ISE_S6_VM-disk001.vmdk -O vmdk plain.VMDK$ qemu-nbd -r -c / dev / nbd(0) ********************************************* () ******************************************** (plain).VMDK$ mount / dev / nbd0p1 / iso  (******************************************** (4)/$ ls -l / ISO(4)/ opt / Xilinxdrwxrwxr-x.(3)5004096Dec(8)201614 7

… and of course, there the Xilinx toolchain is – right where we’d expect it to be … In the same folder as the “Spartan3-supporting” version!

Maybe we don’t have to boot this thing at all – we’ll just copy the entire tree of (ISE_DS) , to create two folders – one with the normal (2013 – era ) WebPACK ISE that we used forZestSC1 / Spartan3 Mandelbrot experiments, and this new one (2016 – era) for the upcoming Pano Logic ones.

A symlink will point to one or the other:

$ ls -ldrwxr-xr- x(4)ttsiod users4096(Oct)1221:14./drwxr-xr- x11ttsiod users4096(Oct)1220:59../lrwxrwxrwx1)root root15Oct1220:46ISE_DS ->(ISE_DS).(Spartan6)/drwxr-xr- x(7)ttsiod users4096(Mar)   (4) *********************************************(ISE_DS).Spartan3/drwxrwxr-x(6)ttsiod users4096(Dec)   (8) ********************************************* ()2016ISE_DS.Spartan6/

… and since Xilinx tools depend on license files, a script will switch everything from one form to the other – depending upon what we want to do:

check_symlink (){ (if)[!-h"$1"];thenecho"$ 1 was not a symlink! Aborting .. . "exit(1)Fi}if[$#-eq0];thenCD||exit(1)ls -l.Xilinx / Xilinx.lic Xilinx / Xilinx.(lic Xilinx)/14 .7/ ISE_DSechoecho Use xilinx.SH(3)or xilinx.SH(6)echoelse (if)["$1"-ne3-a"$1"-ne6];thenecho Use xilinx.SH(3)or xilinx.SH(6)exit(1)Fi (XIL)="$ 1"CD||exit(1)CD.Xilinx||exit(1)check_symlink Xilinx.Licrm Xilinx.Lic||(exit)(1)ln -s Xilinx.lic.(Spartan)$ {XIL}Xilinx.lic||(exit)(1)CD../ Xilinx/14 .7  (/||exit(1)check_symlink ISE_DSRM ISE_DS||exit(1)ln -s ISE_DS.Spartan$ {XIL}ISE_DS||exit(******************************************** (1)CD..||exit(1)check_symlink Xilinx.Licrm Xilinx.Lic||(exit)(1)ln -s Xilinx.lic.spartan$ {XIL}(Xilinx).lic||(exit)(1)CD||exit(1)ls -l.Xilinx / Xilinx.lic Xilinx / Xilinx.(lic Xilinx)/14 .7/ ISE_DSechoecho Now go run this:echo"cd ~ / Xilinx / 14 7 / ISE_DS "echo". settings 64 .sh "fi
$ xilinx.SH(6)lrwxrwxrwx1)ttsiod users15Oct1916:10Xilinx/14 7/ ISE_DS ->(ISE_DS).(Spartan6)lrwxrwxrwx1)ttsiod users19******************************************** (Oct)1916(********************************************:10.(Xilinx / Xilinx)lic ->(Xilinx).lic.spartan6lrwxrwxrwx1)ttsiod users19******************************************** (Oct)1916(********************************************:10Xilinx / Xilinx.lic ->Xilinxlic.spartan6Now go run this(********************************************:CD~/ Xilinx// ISE_DS.settings 64.SH$

Additionally, to avoid wasting a metric ton of hard drive storage, we userdfind; to identify the files that are identical between these two subtrees – and form hard links so they only occupy space (once) :

$ rdfind -makehardlinkstrueISE_DS.Spartan {(3),  ( (6)}/

After this finished, it became clear that the two trees shared almost EVERYTHING. In fact, the total storage cost wentBELOWthe original storage cost usedfor just the single ISE for the Spartan3! …

In the absence of miracles, this can only mean one thing: that apparently there’s plenty of copies of files spread all over –even within the same folder subtree.

How do we live like this

So … can we now, finally, launch the thing?

Err … no.

The * free * WebPACK version, put inside a Linux Virtual machine, and made by its makers to specifically target Spartan6 targets … will first check that the MAC address of the `eth0` Ethernet adapter * has a specific value *.

I don’t know what else to say about this. I believe the situation is describing itself, very eloquently – about themeritsof closed-source software.

Let’s check the; OVFfile in the original package:

$ grep MAC ISE_S6_VM.ovf | head -1

We see here that the Virtual Machine is equipped with an “eth0” Ethernet adapter, witha specific MAC address. Since my laptop only has a “wlan0” interface, I added a dummy one – making it the way Xilinx apparently expects it:

$ cd / etc / systemd / network $ cat 25 - dummy.netdev [Match]  [NetDev] Name=eth0 Kind=dummy MACAddress=08: 00: (**********************************************************************************************************************************************************************************************************************: 68: C9: 35  $ sudo systemctl restart systemd-networkd $ sudo ifconfig eth0 up $ sudo ifconfig eth0 | grep ether     ether 08: 00: 27: 68: C9: 35 txqueuelen 1000 (Ethernet)

So does it work now?

Nope.

First, you have todisable your wlan0 adapter (!)– otherwise the detectedlmhostidby the Xilinx tools, is the MAC address of theWLAN0adapter!

Clearly, Xilinx doesn’t check whether there’s an (eth0) with the MAC they want … No, they look up which network adapter internet traffic goes through – and check that adapter’s MAC.

Maybe.

Or maybe they stop at the first network adapter they find during enumeration.

Or maybe they draw lottery tickets from/ dev / urandom– and perform anRM -RF / USRRussian roulette once in a blue moon.

Remember, we are talking about the free version of WebPACK here – that is officially distributed for people who somehow payed the company to get Xilinx Spartan6 chips, and want to program them.

And, yet, the free version distributed, has to perform checks like these – because, erm, it has to … IT JUST HAS TO.

(facepalm)

The heaven of closed-source

The next time you wonder about the impact of Linus Torvalds, and Richard Stallman, and Fabrice Bellard, and all the other magnificent fellows of the open-source SW world …JUST PAY A VISIT TO ANY OF THE GRAVEYARDS OF CLOSED SOURCE “HEAVENS”.

And you’ll then remember how the world was … before these giants decided to rescue us.

Back to “compiling” our CPU

Now that wehave sacrificed our firstborns andare able to run the synthesis toolchain for our target – and see our FPGA being detected in IMPACT – we can finally move to “compiling” our CPU.

Over the last 4 years, I’ve been working as a real-time embedded SW engineer in the European Space Agency. In a very large percentage of our missions, our SW runs on one form or another of an open-source CPU design –specifically, on aSPARCderivative calledLEON.

So when I started fooling around with CPU synthesis and FPGAs,I forked this repository; that contains a mirror of the open-source version of GRLIB, the home of LEONs. My owncopy is here; please rememberthat I am a software developer, not a HW one; I just enjoy playing around with technology. What you are reading is just one of my hobbies – don’t go and bet the family farm on my repository’s code quality: -)

Also, don’t expect this post to start from a ‘hello, VHDL world’ and end with a working LEON3. That would require a book, not a blog post. Instead, we will follow along the traditional paths of engineering; we will base our efforts on pre-existing designs, and tweak them to match our own target. This is in fact one of the roles served by TheDesignsfolder in the original repository.

Programming languagues – living in the HW and SW worlds

As one might expect, writing code for programmable HW shares similarities with the SW development workflows. You have stages of processing your inputs in both worlds – instead of compiling compilation units into object files and then linking them, the FPGA tools perform synthesis, followed by placement-and-routing. You run your unit and integration tests prior to deploying your SW in production – just as you run your VHDL testbenches in your simulator prior to deploying your circuit to your FPGA.

And you edit your VHDL or Verilog code with your (Vim) , or perhaps your Emacs – NOTHING ELSE, INFIDEL – just like you would for your traditional SW programming languages.

I amutterlylying here, of course; the truth is that most of the HW designers I know are editing inside their Vendor-provided IDEs. Remember, these are walled gardens – with the designers pretty much “trapped” inside them.

Some of my friends have literally invested their lives in learning the peculiarities of specific toolchains – heck, even of specific versions of the toolchains!

Think about it – what else can you do, when you don’t have the source code of a tool? All you can do, is “learn”, over decades, the things to avoid … So that the black-box you build your designs with, doesn’t go … banana.

But there are significant differences, too.

For example, HW tools have far more issues with re-using previous work. If you touch a single.cfile in a codebase containing thousands of source files, only that one will be recompiled when youmake– you’ll pay the small price for a quick compilation of a single file, and a re-link. Fast build-compile-test cycles.

But in the HW world, that doesn’t seem to the case. There is no edit-compile-run cycle; there’s edit-compile-GoForAtripToTheAlpsAndStayForAweek-then-run cycle.

Another crazy difference I experienced was thatbuilds are NOT deterministic; in the sense that in a design that utilises almost all resources of your FPGA, you may try rebuilding your code after just adding a comment – only to see it fail to satisfy the timing constraints it did in the previous build!

I am NOT joking. The placement and routing stages, in particular, are apparently very “tough” (algorithmically speaking). Heuristics are applied, in the cost functions that are used to estimate routing and timing performance … These in turn “feed” the gradient descents and simulated annealings that try to findthe best location in the search space. In the end, this translates to, potentially, your “compilation” ending up trapped in a different, worse, “local minimum” than the one it found in your previous build.

Which is why you see HW designers COMMITING the bitfiles they generated, after they see them actually work on the chip.

Put simply:

A SW developer commiting an executable in his source repository, is an idiot.
A HW developer doing the same, is a wise man.

Apparently.

Randomness in synthesis

I am told this has been improved in newer versions of HW toolchains; that they now allow you to “seed” the random processes driving the search space, so that they at least behave deterministically.

Which is nice.

Executive Summary: HW design is a strange land. It is, after all, a land full of clocks!

Tweaking, and then simulating with GHDL

As we said above, we will now base our efforts on pre-existing designs, and tweak them to match our own target.

After cloningmy repository, navigate todesigns; and copy the folder of my previous( (unexpectedly successful) !)attempt to bootstrap a LEON3 inside my Spartan3 board:

bash $ cp -a leon3-zestsc(1)- xc3s 1000 lets-make-a-cpu

First of all, the master configuration file –config.vhd– defines a number of things that are FPGA specific. We are targeting a Spartan6 now, not a Spartan3 – so …

---. ./leon3-zestsc1-xc3s1000 / config. VHD 2019 - 03 - 17 09: 37: 11.   0 100   config .vhd 2019 - 10 - 19 09: 21: 21. 950877962   0200@@ -1 , 7  1,7 @@     -------------------------------------------------- ------------------------------ My customizations for my ZestSC1 board - based on the original design - My customizations for my PanoLogic G2 board - based on the original design - for the leon3-digilent-xc3s 1000.  - Hey.  - Original Copyright:@@ - 15, 22   15,  (@@   package config is  - Technology and synthesis options- constant CFG_FABTECH: integer:=spartan3;- constant CFG_MEMTECH: integer:=spartan3;- constant CFG_PADTECH: integer:=spartan3; constant CFG_FABTECH: integer:=spartan6; constant CFG_MEMTECH: integer:=spartan6; constant CFG_PADTECH: integer:=spartan6;   constant CFG_TRANSTECH: integer:=TT_XGTP0;    constant CFG_NOASYNC: integer:=0;    constant CFG_SCAN: integer:=0;    - Clock generator- constant CFG_CLKTECH: integer:=spartan3; constant CFG_CLKTECH: integer:=spartan6; 
  • We change all references of spartan3 to spartan6

  • We set CFG_CLKMULandCFG_CLKDIVto the same value – eg 5 – for now, the LEON will be running at the same speed as the board’s clock ( (MHz). After we’ve done our first successful synthesis / placement / routing, we’ll see the maximum frequency our circuit can be run – and we will bump up the clock accordingly.

  • In the Makefile.inc, we change to using the proper HW parts:

---. ./leon3-zestsc1-xc3s1000 / Makefile. inc 2019 - 02 - 28 20: 55: 35. 143510266  0 100     Makefile.inc 2019 - 10 - 19 08: 55: 49. 590853311 0200@@ -1 , 12  1, 12 @@- TECHNOLOGY=Spartan3- PART=xc3s 1000- PACKAGE=ft 256- SPEED=- 5 TECHNOLOGY=Spartan6 PART=xc6slx 100 PACKAGE=fgg 484 SPEED=- 2 SYNFREQ=48    # PROMGENPAR=-x xcf 04 s -u 0 $ (TOP) .bit -p mcs -w -o digilent-xc3s 1000  MANUFACTURER=Xilinx- MGCPART=3s 1000 $ (PACKAGE) MGCPART=6slx 100 $ (PACKAGE) MGCTECHNOLOGY=$ (TECHNOLOGY)  MGCPACKAGE=$ (PACKAGE)  
  • In the ZestSC1 experiments, we used a USB / TTL dongle, that we connected to a couple of GPIO pins – and through that, we obtained access to the LEON3 Debug Support Unit. But we are using a (much faster!) JTAG interface now – so we adapt the configuration to disable the former and enable the latter:
   constant CFG_AHB_MONWAR: integer:=0;    constant CFG_AHB_DTRACE: integer:=0;  - DSU UART- constant CFG_AHB_UART: integer:=1; constant CFG_AHB_UART: integer:=0; - JTAG based DSU interface- constant CFG_AHB_JTAG: integer:=0; constant CFG_AHB_JTAG: integer:=1;
  • Finally, since this FPGA is a monster compared to the Spartan3 XC3S 1000, we can bump up the amount of BlockRAM (used to create the “memory” of the LEON cores) by 16 times![2]
 - LEON2 memory controller    constant CFG_MCTRL_LEON2: integer:=1;    constant CFG_MCTRL_RAM8BIT: integer:=0;@@ - 132, 7  133, 7 @@   constant CFG_ROMMASK: integer:=16 # E  (#  )  # 100 #;  - AHB RAM    constant CFG_AHBRAMEN: integer:=1;- constant CFG_AHBRSZ: integer:=16; constant CFG_AHBRSZ: integer:=256;   constant CFG_AHBRADDR: integer:=16 # 400 #;    constant CFG_AHBRPIPE: integer:=0;  - UART 1

And for now, that’s it – LEON configuration wise.

Now, there are many ways to use LEONs in one’s design. To make things easier, for this 1st test, I will be using thefreely available evaluation version of GRMON. GRMON is a debugging monitor / control tool specifically made to assist development with LEONs. For later stages in particular, where we will be loading the software we compiled inside our CPU, GRMON offers a GDB server; allowing us to debug things over good old GDB.Veryconvenient.

GRMON is not open-source, sadly – but at least the developers behind it know what they are doing. You don’t download 15 GB of kitchen sinks, you download 5 MB of a properly made, platform-specific command-line tool; that does one thing, and does it well.

One might even call this aPhilosophy.

Speaking of small, nice tools, you better downloadxc3sprogas well. It can be compiled from source – it’s fully open; and then, instead of launching IMPACT to program our XC6LX 100, we will be able to spawn a tiny 300 KB executable – and do all the work via a simple incantation in our Makefile:

xc3sprog -c xpc -v YourBitfileGoesHere

But enough about tooling, let’s get back to the code.

What aboutleon3mp.vhd– the VHDL file that describes our LEON3 core?

---. ./leon3-zestsc1-xc3s1000 / leon3mp. VHD 2019 - 03 - 17 09: 36: 24. 901622577  0 100   leon3mp .vhd 2019 - 10 - 19 08: (********************************************************************************************************************************************************************************************************: 55. 520843754   0200@@ - 60, 15   60, 14 @@     use_ahbram_sim: integer:=0    );    port (- resetn: in std_ulogic;- clk: in std_ulogic;- iu_error: out std_ulogic;- dsuact: out std_ulogic;- dsu_rx: out std_ulogic;- dsu_tx: in std_ulogic;- rx: out std_ulogic;- tx: in std_ulogic;- IO: inout std_logic_vector (46 downto 0) resetn: in std_ulogic; clk: in std_ulogic; iu_error: out std_ulogic; dsuact: out std_ulogic; RX: OUT std_ulogic; tx: in std_ulogic   );  end;  @@ - 76, 7  75, 7 @@      constant blength: integer:=12;     constant fifodepth: integer:=8;- constant maxahbm: integer:=CFG_NCPU   CFG_AHB_UART; - A truly "Spartan" set of AHB masters :-) constant maxahbm: integer:=CFG_NCPU   CFG_AHB_JTAG; - A truly "Spartan" set of AHB masters :-)
  • Compared to the previous ZestSC1 / Spartan3 design, GRMON won’t be controlling the LEON’s Debug Support Unit (DSU) via special serial data; we will be using JTAG instead (spawning (grmon -u -xilusb) – or, if you are using a Digilent HS2-compatible device,grmon -u -digilent). We therefore need to drop these DSU-serial signals (dsu_rx,dsu_tx).

  • The Pano UCF file also has no (IO) . It contains many signals towards other parts that look like a lot of fun, though – VGA signals, for instance … 🙂 Looking forward to hooking my HWMandelbrot, directly on a monitor: -)

    - my ZestSC1 board's frequency in KHz- constant BOARD_FREQ: integer:=48000;frequency in KHz will be 34000 - as per my S / P / R results, constant BOARD_FREQ: integer:=25000; - CPU frequency in KHz will be 25000 - as per my S / P / R results,    - my design can easily reach this speed.     constant CPU_FREQ: integer:=BOARD_FREQ * CFG_CLKMUL / CFG_CLKDIV;     constant IOAEN: integer:=0;@@ - 126, 13   123, 9 @@    attribute syn_keep: boolean;     attribute syn_preserve: boolean;      RS - ********************************************  APB Uart- signal rxd1: std_logic;- signal txd1: std_logic;-   - A "heartbeat" LED for the DSU - I used it to make sure the- - locally instantiated clock here beats indeed at  (MHz)- - ( search below for 34000000 to see the logic) - locally instantiated clock here beats indeed at  (MHz)
  • The clock in the Pano runs at (MHz, not) ****************************************************************************************************************************************************************************************************** (MHz.

  • We also need to instantiate the JTAG controller – and remove the DSU-controlling UART :

@@ - 199, 35   192, 28 @@     dsuo.tstop  ahbjtaggen0: if CFG_AHB_JTAG=1 generate ahbjtag0: ahbjtag generic map (tech=>fabtech, hindex=>CFG_NCPU) port map ( rstn, clkm, tck, tms, tdi, tdo, ahbmi, ahbmo (CFG_NCPU), open, open , open, open, open, open, open, gnd (0)); end generate;   - To verify that the clock shenanigans actually work on my board,    - I hooked this up to LED6 (i.e. the 2nd from the right) and    - confirmed that the clock driving the LEON3 and the DSU and all - ******************************************** rest is indeed a  (MHz clock.) - the rest is indeed a  (MHz clock.)     process (clkm)    begin        if rising_edge (clkm) then          counter_dsu- if counter_dsu=34000000 then if counter_dsu=25000000 then             counter_dsu- - Debug UART- dcomgen: if CFG_AHB_UART=1 generate- dcom0: ahbuart- generic map ( hindex=>CFG_NCPU, pindex=>4, paddr=>7)- port map ( rstn, clkm, dui, duo, apbi, apbo (4), ahbmi, ahbmo (CFG_NCPU));- dui.rxd- end generate;- nouah: if CFG_AHB_UART=0 generate apbo (4)-- urx_pad: inpad generic map (tech=>padtech) port map (dsu_tx, rxd1);- utx_pad: outpad generic map (tech=>padtech) port map (dsu_rx, txd1);- txd1- -------------------------------------------------- --------------------  --- APB Bridge and various periherals -------------------------------

All of these components that we are using – the AHBUART we just removed, the AHBJTAG we just added – they are coming from the open-source contents of the GRLIB. And this relates to an important concern about the HW world vs the SW one: the ecosystem of pre-made “library IP blocks” that one needs to make a system operational.

Now, I am not the only one stating that – when compared to their SW counterparts – the HW synthesis toolchains are in an abysmal state. There is a movement underway to implement open-source alternatives (eg seeYosys,arachne-pnr, etc ). But for these efforts to succeed, an ecosystem of open library IPs needs to be developed around them.

I know the current “DNA” of HW engineers is very much of a proprietary nature – but IMHO,the HW design community needs to evolve beyond this. Become open-source mutants, like us SW people!

I am pretty sure some truly spectacular super-powers would come out of such a mutation.

Finally, let’s update our testbench to comply with all the changes we did to our LEON3 design:

  • Adapt to new interfaces (remove DSU TX / RX, etc)
  • Remove the big test sending serial data over TX to control the DSU
  • And just reset the LEON for a little while.
---. ./leon3-zestsc1-xc3s1000 / testbench. VHD 2019 - 03 - 08 17: 41: 44.  ( 0)      testbench.vhd 2019 - 10 - 20 08: (***********************************************************************************************************************************************************************************************: 26. 754650548   0200@@ - 31, 6   31, 7 @@ library gaisler;  use gaisler.libdcom.all;  use gaisler.sim.all;  library techmap;  use techmap.gencomp.all;  use std.textio.all;@@ - 56, 8   57, 9 @@   signal rstn: std_ulogic:='1';    signal iu_error: std_ulogic;    signal dsuact: std_ulogic;- signal dsu_tx: std_logic;- signal dsu_rx: std_logic;     component leon3mp      port (@@ - (*******************************************************************************************************************************************************************************************, 8  67, 8 @@       resetn: in std_ulogic;        iu_error: out std_ulogic;        dsuact: out std_ulogic;- dsu_rx: out std_ulogic; - UART1 tx data- dsu_tx: in std_ulogic - UART1 rx data   );    end component;  @@ - 75, 12   77, 12 @@ begin    d3: leon3mp      port map (- clk=>CLK,         resetn=>rstn, clk=>CLK,         iu_error=>iu_error,          dsuact=>dsuact,- dsu_rx=>dsu_rx,- dsu_tx=>dsu_tx     );      clk@@ - 94, 79   96, 21 @@       severity failure;    end process;  - dsucom: process- procedure dsucfg ( signal dsutx: out std_ulogic; signal dsurx: in std_ulogic) is - variable w 32: std_logic_vector (31 downto 0);- variable c8: std_logic_vector (7 downto 0);- constant txp: time:=320 * 1 ns;- variable l: line;- begin- dsutx- write (l , String '("Resetting for 40 cycles "));- writeline (output , l);- RSTN- wait for 40 * CLK_PERIOD;- RSTN- wait for 10 * CLK_PERIOD;-- wait for 5000 ns;-- - Send exactly what grmon3 sends.- txc (dsutx , 16 # 55 #, txp);- txc (dsutx , 16 # 55 #, txp);- txc (dsutx , 16 # 55 #, txp);- txc (dsutx , 16 # 55 #, txp);- txc (dsutx , 16 # 80 #, txp);- txc (dsutx , 16 # ff #, txp);- txc (dsutx , 16 # ff #, txp);- txc (dsutx , 16 # ff #, txp);- txc (dsutx , 16 # f0 #, txp);- txc (dsutx , 16 # 80 #, txp);- txc (dsutx , 16 # ff #, txp);- txc (dsutx , 16 # ff #, txp);- txc (dsutx , 16 # ff #, txp);- txc (dsutx , 16 # f0 #, txp);- txc (dsutx , 16 # ff #, txp);-- - and look at the magnificent output from our design;- ******************************************** DSU replies with 00 00 10 70; the proper response!-- - This test can also be used - it is the original- - scenario taken from digilent-xc3s 1000.-- - txc (dsutx, 16 # 55 #, txp); - sync uart-- - txc (dsutx, 16 # c0 #, txp);- - txa (dsutx, 16 # 90 #, 16 # 00 #, 16 # 00 #,  # 00 #, txp);- - txa (dsutx, 16 # 00 #, 16 # 00 #, 16 # 20 #, 16 # 2e #, txp);-- - wait for 25000 ns;- - txc (dsutx, 16 # c0 #, txp);- - txa (dsutx, 16 # 90 #, 16 # 00 #, 16 # 00 #,  # 20 #, txp);- - txa (dsutx, 16 # 00 #, 16 # 00 #, 16 # 00 #, 16 #  (#, txp);-- - txc (dsutx, 16 # c0 #, txp);- - txa (dsutx, 16 # 90 #, 16 # 40 #, 16 # 00 #, 16 # 24 #, txp);- - txa (dsutx, 16 # 00 #, 16 # 00 #, 16 # 00 #, 16 # 0D #, txp);-- - txc (dsutx, 16 # c0 #, txp);- - txa (dsutx, 16 # 90 #, 16 # 70 #, 16 # 11 #, 16 # 78 #, txp);- - txa (dsutx, 16 # 91 #, 16 # 00 #, 16 # 00 #, 16 # 0D #, txp);-- - txa (dsutx, 16 # 90 #, 16 # 40 #, 16 # 00 #, 16 # 44 #, txp);- - txa (dsutx, 16 # 00 #, 16 # 00 #, 16 # 20 #, 16 # 00 #, txp);-- - txc (dsutx, 16 # 80 #, txp);- - txa (dsutx, 16 # 90 #, 16 # 40 #, 16 # 00 #, 16 # 44 #, txp);-- - Look ! The DSUACT signal goes high! All good.- wait for 50000 ns;-- write (l , String '("Test completed."));- writeline (output , l);- end procedure; jtagproc: process variable l: line;   begin- dsucfg (dsu_tx , dsu_rx);- wait;- end process; write (l , String '("Resetting for 40 cycles ")); writeline (output , l); RSTN wait for 40 * CLK_PERIOD; RSTN wait for 10 * CLK_PERIOD; wait for 5000 ns; write (l , String '("Looks like we are booting.")); writeline (output , l); assert false report "Reached end of test" severity failure; end process; end;

Time to launch GHDL to simulate this circuit – GHDL being a magnificent open-source simulator that you can compile from source(or install via your Linux distribution’s repositories):

bash $ make simulation-setup...bash $ make simulation...Resettingfor40cyclesPanologic G2 LX 100 Demonstration designGRLIB Version3).(0),build4208Target technology:spartan6,(memory library):spartan6ahbctrl(********************************************:AHB arbiter / multiplexer rev(1)ahbctrl(********************************************:Common I / O area disabledahbctrl(********************************************:AHB masters:(2),AHB slaves:(8)ahbctrl(********************************************:Configuration area at0xfffff  (0),(4) *********************************************kbyteahbctrl(********************************************:MST0:Cobham Gaisler LEON3 SPARC V8 Processorahbctrl(********************************************:MST1:Cobham Gaisler JTAG Debug Linkahbctrl(********************************************:slv1:Cobham Gaisler AHB / APB Bridgeahbctrl(********************************************:memory at(0x),size(1)   (Mbyte)ahbctrl(********************************************:slv2:Cobham Gaisler LEON3 Debug Support Unitahbctrl(********************************************:memory at(0x),size256Mbyteahbctrl(********************************************:SLV3:Cobham Gaisler Single-port AHB SRAM moduleahbctrl(********************************************:memory at(0x),size(1)   (Mbyte),cacheable,Prefetchahbctrl(********************************************:slv4:Cobham Gaisler Test report moduleahbctrl(********************************************:memory at(0x) ,size(1)   (Mbyte)apbctrl(********************************************:APB Bridge at(0x)Rev(1)apbctrl(********************************************:slv1:Cobham Gaisler Generic UARTapbctrl(********************************************:I / O ports at(0x),size256Byteapbctrl(********************************************:slv2:Cobham Gaisler Multi-processor Interrupt Ctrl.apbctrl(********************************************:I / O ports at(0x),size256Byteapbctrl(********************************************:SLV3:Cobham Gaisler Modular Timer Unitapbctrl(********************************************:I / O ports at(0x),size256Bytetestmod4(********************************************:Test report moduleahbram3(********************************************:AHB SRAM Module rev(1),256(kbytes)gptimer3(********************************************:Timer Unit rev(1),(8)- bit scaler,(2)32- bit timers,IRQ8IRQMP(********************************************:Multi-processor Interrupt Controller rev(4),apbuart1(********************************************:Generic UART rev(1),fif o(4),IRQ(2),scaler bits12ahbjtag AHB Debug JTAG rev(2)dsu3_2(********************************************:LEON3 Debug support unitAHB Trace Buffer,2kbytesleon3_0(********************************************:LEON3 SPARC V8 processor rev(3):iuft:(0) ********************************************* () ,FPFT:(0),cacheft:(0)leon3_0(********************************************:icache(1)*(8)kbyte,(dcache)   1)*8kbyteclkgen_spartan3e(********************************************:spartan3 / e sdram / pci clock generator,version(1)clkgen_spartan3e(********************************************:Frequency25000KHz,DCM divisor(5)/(5)1750(ns):CPU0:(0x)  00 00 00(unimp)((trapped))Looks like we are Booting.testbench.VHD:113:(5):@ 6us: (assertion failure):Reached end oftestGHDL(********************************************:error:assertion failedfrom(********************************************:process work.testbench(BEHAV).jtagproc at testbench.VHD:113GHDL(********************************************:error:simulation failedmake(********************************************:***[Makefile:38:simulation]Error1)

All good! The LEON3 traps after 1. 75 microseconds, since it reads a nice 32 – bit zero from our “ram” – which is not valid code for a SPARC.

The “assertion failure” is normal, since that’s how the testbench ends:

assert false report "Reached end of test" severity failure;

Run Forrest, Run

Now, there’s plenty more things we can do here – like configuring the simulated RAM to have a binary we compile ourselves.

But we are insane SW people here, playing with forces we don’t comprehend.

Let’s launch the thing in the real HW!

$ make ise ... ... laptop fans wake up - sounds like an airplane here ... ... 5 minutes pass ... ... there's no edit-compile-run cycle ... there's ... ... edit-compile-GoForAtripToTheAlpsAndStayForAweek-then-maybe-run cycle ... ... FLEXnet Licensing error: -5, 357 For further information, refer to the FLEXnet Licensing documentation, available at "www.flexerasoftware.com". ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^ ERROR: Map: 258 - A problem was encountered attempting to get the license for this    architecture.

Ah yes, I forgot!

$ # No wireless lan interface tolerated by Xilinx; $ # temporarily remove the driver for wlan0 from the kernel $ sudo rmmod wl  $ # Must also have 'eth0' - with the magic MAC address set, $ # so process my / etc / systemd / network / 25 - dummy.netdev $ sudo systemctl restart systemd-networkd $ sudo ifconfig eth0 up

I refuse to memorize idiocy – so I just add these commands in theMakefile; the network will be automatically made the way Xilinx wants it, every time the build takes place – and will then be automatically set back to normal(modprobe wl; dhclient wlan0).

At least in the UNIX way of doing things, you can easily cope – and automatically handle – even insane requirements.

Come to think of it, perhaps I should investigate making synthesis happen inside a Docker container; and setup this insane network inside the container. Hmm.

Oh well, postponed for later investigation.

For now, take 2:

$ make ise ... vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv INFO: Security: 56 - Part 'xc6slx 100 'is not a WebPack part. WARNING: Security: 42 - Your software subscription period has lapsed. Your current version of Xilinx tools will continue to function, but you no longer qualify for Xilinx software updates or new releases. -------------------------------------------------- -------------------- ... (panic attack at first - but thankfully, synthesis continues fine regardless)

(shakes head)

You poor, poor HW people …

... All constraints were met. ... Generating Pad Report.  All signals are completely routed.  Design statistics:    Minimum period: ........ (Maximum frequency: 83.  (MHz)  ... Creating bit map ... Saving bit stream in "TheBigLeonski.bit". Creating bit mask ... Saving mask bit stream in "TheBigLeonski.msk". Bitstream generation is complete.

Woohoo! We’re good. Way beyond good, in fact – we can bump up our LEON’s clock way above our current setting of 25 MHz.

But before we do that, let’s bump the number of cores – In fact, this FPGA is such a monster, it can easily accommodate 2, even 4 LEONs. The utilization report – showing percentage of utilized resources – is far from maximised, in everything (except BlockRAMs [2]).

So we bump up the number of cores:

constant(CFG_NCPU):(integer):=((2));

… and we bump up the clock, to a very safe 50 MHz:

constantCFG_CLKMUL:(integer):=(10);constantCFG_CLKDIV:(integer):=(5));

This is the part that should make you stand and notice – here we are, casually specifying, in code,that we want 2 cores in our CPU. FPGAs are amazing.

We run our synthesis again, and a few minutes later …

That’s it – GRMON sees both our cores, running at 50 MHz:

JTAG chain (1): xc6slx 100 GRLIB build version: 4208 Detected frequency:  0 MHz  Component Vendor LEON3 SPARC V8 Processor Cobham Gaisler LEON3 SPARC V8 Processor Cobham Gaisler JTAG Debug Link Cobham Gaisler AHB / APB Bridge Cobham Gaisler LEON3 Debug Support Unit Cobham Gaisler Single-port AHB SRAM module Cobham Gaisler Generic UART Cobham Gaisler Multi-processor Interrupt Ctrl. Cobham Gaisler Modular Timer Unit Cobham Gaisler

Time to compile some SW and run it inside this!

The cross-compiler

One can compile a cross compiler for this target from source(and in fact I frequently do, as part of my duties in the Agency). But to avoid making this gigantic blog post even heavier, let’s just use the precompiled open-source toolchain of BCC2 –from here. We un-tar under/ opt; and build our hello world:

$ cat hello.c #includeint main () {puts ("Hello, Big Leonski!"); }  $ /opt/bcc-2.0.8-gcc/bin/sparc-gaisler-elf-gcc -mcpu=leon3      -o hello hello.c  $ /opt/grmon-eval-3.1.0/linux/bin64 / grmon -u -xilusb ... grmon3>load hello 40000000 .text 25 2kB / 25 .2kB [===============>] 100% 40006500 .rodata  (B [===============>] 100% 40006580 .data 1.2kB / 1.2kB [===============>] 100% Total size: 26.  (kB) 1.  (Mbit / s) Entry point 0x 40000000 Image / var / tmp / hello loaded  grmon3>run Hello, Big Leonski!    CPU 0: Program exited normally.   CPU 1: Power down mode

And that’s it – we have ourselves a multi-core CPU, built from our own source code, running binaries built from our own source code, with a cross-compiler that can also be built from openly accessible source code.

What’s next?

Ideally, one would want to support the remaining pieces of this board; It has two USB slots, Ethernet, and most importantly 128 MB of DDR2 SDRAM. These last two pieces in particular, would elevate it to something like the first “serious” machine I worked with, back when I was a student:a SPARCStation. I’d love that; and if the HW controllers involved are supported by Linux, bootstrapping the undisputed king of OSes inside this would be a breeze.

Alas, I am told by my friends that DDR controllers are no joke; they are not the playground of bored SW engineers.

Sigh: -)

Still, I hope you found this (very long) read an interesting one.

Discuss on Reddit

Discuss on Hacker News

Notes

  1. To HW designers reading this – please remember who is the intended audience of this blog post. Come to think of it, remember this is written by a SW developer;cue appropriate meme.

  2. Until an actual HW wizard makes the Pano Logic DDR2 SDRAM work! Which will gives us an insane 128 MB of space … At that point, I will boot Linux in this thing.


        

                profile for ttsiodras at Stack Overflow, Q&A for professional and enthusiast programmers        

        

                GitHub member ttsiodras        

             

    


                     

    

Brave Browser
Read More
Payeer

What do you think?

Leave a Reply

Your email address will not be published. Required fields are marked *

GIPHY App Key not set. Please check settings

Derek Jeter Card Goes For $ 202K on eBay, Crypto Coins News

Derek Jeter Card Goes For $ 202K on eBay, Crypto Coins News

How meme culture changed the PSAT, Ars Technica

How meme culture changed the PSAT, Ars Technica