Tag Archives: assembler

Digging deeper into the highRISC

After 7 years mainly doing research on Transputers and the i860, I had the feeling it’s time to do some more digging into the highRISC card.
If you have read my initial post about the miroHIGHRISC (and the Tiger) you remember the undocumented 20pin socket on the card (pictured in the upper right corner):

HighRiscLeft

Let’s have another look at the “UART port” again:

The pinout (the connector is rotated 90° counter clock-wise):

GND  oo  /WR0
D0   oo  INT2
D1   oo  /RD
D2   oo  /IOSEL
D3   oo  RESET (most likely)
D4   oo  A2
D5   oo  A3
D6   oo  A4
D7   oo  A5
VCC  oo  A23

Reading a bit of the BIOS’ disassembly, I stumbled across routines to talk to an UART.  A very common (D)UART of those days was the SCN2681. If you take a look at this chips specs, they perfectly fit to the signals provided at the HIGRISCs UART-port!

Here’s its pinout with the corresponding pins marked:

 

A2-A5 are used for A0-A3 on the 2681 and the only pin not directly represented is A23 which might be used to decode. Also, it nicely reveals that CPU INT2 is used for the UART.

The LR33000 datasheet tells me that there’s an 4MB IO-area starting at 0x1E000000 reaching up to 0x1EFFFFFF- most likely the 2681 will live there… and the corresponding signal called /IOSEL is available on the UART-port (and will perfectly help as chip-select decoder). Tadaa!

So after the UART we need to get the RX/TX signals to a higher level, i.e. the +/-15V of RS232 – this is the call for our old friend MAX232.

[current bread-board experiments sadly didn’t yield into ‘instant success(tm)’… I’m missing out something – need more time to investigate]

Bootcode / BIOS

The LR330xx CPU also has an /EPSEL EPROM select signal, indicating it’s accessing an EPROM expected to start at 0x1F000000 and ends at 0x1FFFFFFF (4MB right below the IO-area).
Using this knowledge and knowing that the MIPS standard boot-vector is at 0x1FC00000, it’s easy to feed the ROM-dump I did some years ago into the disassembler with the correct start address to do his job.

We need to get an understanding of this bootcode first, so that we can get an idea of “what is where” (e.g. ISA bus, UART etc) and later upload our own code and use those addresses.
Just to stick my head a bit into the clouds, the aim is to first port a then common MIPS monitor-program called ‘PMON’ and use that to run some sort of μLinux. But that’s probably another handful of years ahead…
PMON was a good source of information, because it’s originally written by LSI, supporting all the LSI eval-boards. Lo and behold, some of them had a 2681 UART, too… located at 0xBE000000, which is extensively used in my BIOS disassembly  😉
I have a certain feeling that miro borrowed some design ideas from the LSI Pocket Rocket evaluation board (don’t Google it, it’s a mythic being – if you have documentation, mail me!).

So this is the 2681 memory-map then:

#define BASE_2681 0xbe000000
#define SRA_2681 ((1*4)+BASE_2681) // 0xbe000004 status register 
#define THRA_2681 ((3*4)+BASE_2681) // 0xbe00000C Rx/Tx holding register 
#define ACR_2681 ((4*4)+BASE_2681) // 0xbe000010 Aux contrl. register 
#define ISR_2681 ((5*4)+BASE_2681) // 0xbe000014 interrupt state register 
#define CTU_2681 ((6*4)+BASE_2681) // 0xbe000018 Counter timer upper 
#define CTL_2681 ((7*4)+BASE_2681) // 0xbe00001C counter timer lower 
#define START_2681 ((14*4)+BASE_2681) // 0xbe000038 start timer 
#define STOP_2681 ((15*4)+BASE_2681) // 0xbe00003C stoptimer

Using those addresses we should easily identify the comms routines.

Something happens at 0xBE800000 which seems not UART related. So that’s probably the reason why A23 is available on the connector. That way we can ignore access to that address by OR’ing it with /IOSEL to create a /CS.

The DOS side of things

The tool to load a MIPS executable into the HIGHRISC is called DL.EXE. Loading the test-program prints this to the console:

miroHIGHRISC download program. V 1.00
(c) miro Computer Products AG , Germany

CONFIG: I/O-register-address: 0x368 
CONFIG: DRAM - base-address : 0xD000 
CONFIG: DRAM - size : 8 MB
CONFIG: TIGER - RAM - size : 8 MB

Resetcount = 87340

Loading test.zor
text : start=0x80030000 size=0x52c0
data : start=0x800352c0 size=0x520
bss : start=0x800357e0 size=0x150
entry : 0x800301a0
TIGER comm.address : 0x3ffd00
max_used_address : 0x35930 
real_DRAM : 0x800000 
Heapsize : 0x7CA6D0

test.zor sucessfully downloaded.

This gives us valuable information. The DOS-side uses the IO port 0x368 and has a memory window of 16K from  0xD000 to D3FF.
MIPS programs are loaded to 0x80030000 and the 16K seems to be mapped to 0x003FFD00, just 128K below the 4MB boundary of the LR33k address space.

As usual – this is heavily work-in-progress. So this post will be edited while making any new progress. TBC…

Putting the DSM to use

So after the lengthy description of the DSM cards – how can we make use of them? As said in the previous chapter, they were shipped with an assembler and even an early version of GCC (1.3) so development is pretty straightforward.

Activation

First, you have to understand how the cards integrate themselves into an ISA/EISA system. While the three versions (8, 16, 32bit) differ in some areas, the integration is more or less similar:

Each version offers a latch for controlling the card. This means to activate the card by writing bits to that latch to define a memory-window inside the hosts RAM to blend-in the cards dual-ported RAM  and/or resetting it etc.. The latch is accessible through an IO-port set by jumpers on the card (default 0x300).

So for the ISA cards you have to for example write a 0xC2 at that port-adress to reset & activate the card and use the mem-window of 0xDC000-DC7FF. In Turbo-Pascal this would be something like:

port[$300] := $c2;

This gives you a 2K mem-window to exchange data between the DSM and the host (just 1K for the DSM860-8).

The EISA cards obviously use other ports depending on the slot-number, so this would be an example to do the same for am DSM860-32, this time in Turbo-C:

outportb(slot_no * 0x1000 + 0x800, 0xc2); // For slot #2 this would be 0x2800

This would also open a mem-window at 0xDC000, this time up to 0xDCFFF, i.e. 4K long.

Memory

As mentioned above, the Host and the DSM-card are communicating through a memory-window of diffenrent sizes, depending on the DSM used. Due to their nature, the memory is looking different though. That said, at least they’re both litte-endian, so no byte-swapping needed.

The 80×86 side

For the hosting PC, memory looks pretty straightforward. 1KB-4KB of RAM somewhere in ‘lower-RAM’, that’s it.
While we don’t use it, it’s worth mentioning that there’s a 2nd memory window called “Common“. This is fixed at a specific address and is shared between all possible cards plugged into one host. I guess you already got it: This enables easy multi-processor communication… and gives a lot of possibilities for f**k-ups.

The i860 side

The memory-mapping on the i860-side is the same for the 16 and 32bit cards, the dual-ported RAM is located at 0xd0040000 (0xC0000000 for the DSM-8).
In any case the i860 memory is linear, 64bit wide and always on a 64-bit boundary. This means you have to read the DP-RAM area differently depending on which card you run your code. Here’s an example of how the DP-RAM looks like on the Host- and i860 side:

Host DP-RAM in DOS ‘debug’
-d dc00:0000
DC00:0000 11 22 33 44 55 66 77 88 ...

which would look like this on the i860 side:

DSM/8
C0000000 - 11 xx xx xx xx xx xx xx 22 xx xx xx xx xx xx xx
C0000010 - 33 xx xx xx xx xx xx xx 44 xx xx xx xx xx xx xx
C0000020 - 55 xx xx xx xx xx xx xx 66 xx xx xx xx xx xx xx
C0000030 - 77 xx xx xx xx xx xx xx 88 xx xx xx xx xx xx xx

DSM/16
D0040000 - 11 22 xx xx xx xx xx xx 33 44 xx xx xx xx xx xx
D0040010 - 55 66 xx xx xx xx xx xx 77 88 xx xx xx xx xx xx

DSM/32
D0040000 - 11 22 33 44 xx xx xx xx 55 66 77 88 xx xx xx xx

So reading and writing from/to the DP-RAM involves some thinking to be done by the programmer. Here are two code-snippets showing the difference between reading the DP-RAM on a DSM860-8 and an DSM860-16. First the ‘8 bit version’:

mov 4*8,r4
readloop:
ld.b 0(r15),r14  // Load BYTE from DP-RAM
st.b r14,0(r29)  // store it destination
addu 8,r15,r15   // add 8 to read-mem-pointer
addu 1,r29,r29   // add 1 to desitination-mem-pointer
addu -1,r4,r4    // loop-counter
xor r0,r4,r0     // Test Zero
bnc readloop

And the same for the DSM860-16:

mov 2*8,r4
readloop:
ld.s 0(r15),r14  // Load SHORT (2 Bytes) from DP-RAM
st.s r14,0(r29)  // store it destination
addu 8,r15,r15   // add 8 to read-mem-pointer
addu 2,r29,r29   // add 2(!) to desitination-mem-pointer  (short <> byte)
addu -1,r4,r4    // loop-counter
xor r0,r4,r0     // Test Zero
bnc readloop

Because of reading SHORTs (ld.s) the DSM860-16 version has to loop just 16 times while the 8-bit version has to do that 32 times.
Same applies to writing. You will find an example in the Mandelbrot program (Commented source file).

[This is work-in-progress and will expanded over time]

Action!

So here we go, finally some program running showing all the power behind the i860. I took the Mandelbrot example from R.D.Klein and modified it a bit, well quite a bit as it was written for the DSM860-8 and provided CGA output (yuck!).

Like most “external accelerator” programs, there’s one part running on the accelerator (the i860 in this case) and one part running on the host doing useful things with the provided data. In this case we have an i860 assembler code doing the number-crunching on the Mandelbrot algorithm using the i860’s ability of ‘dual instruction-mode‘ and some code done in Trubo-Pascal handling the display and zooming.
The latter was extended to use SVGA (640x480x256) output and providing an interrupt driven timer. [sourcecode package cleanup is still work in progress]

Here are the two running full steam ahead:

Some things worth to mention:

  • The host being used here is a P1 133MHz, a bit unfair comparing that to a 40MHz i860 – OTOH they seem quite comparable when it comes down to Mandelbrot crunching speed.
  • To calculate the Mandelbrot the same speed as it took the Pentium (~15s) I needed five T800-20 according to my benchmarks.
  • To even achieve the 8.2s of the i860 I had to run 9(!) T800-20 in parallel.
  • A i486DX/33 took 66 sec to do the same (8.25 times slower!), while it still took 34s for a i486DX2/66!

So while all that moaning about the bad ‘programmability’ and slow context-changes of the i860 are completely correct, in certain tasks that CPU was indeed a real screamer!

Run Forest, run!

After having figured out the basics, it was time to prove the concept. First, I needed a (simple) program to run on the NumberSmasher. Here it is, reeeeally simple. It’s just an excerpt of the original 77 assembly lines. If you’re interested in the whole thing, it’s here.

The main-loop is just this. Read a byte from the C012 link, add 1 to it and write it back to the C012:

mov    iobase, %r4

loop:
  call    getlink        # (watch following delay slot!)
   shl    %r0, %r4, %r16    # mov r4, r16 - save base address

  addu    1, %r16, %r17    # add 1 and move into r17

  call    putlink        # (watch following delay slot!)
   shl    %r0, %r4, %r16    # mov r4, r16 - save base address

  br        loop
nop

What a task for a “Cray on a chip”! 😉 Ok, putting this into the assembler (currently gnu-as on Linux), linking and finally making it a pure binary with ‘objcopy -O binary hb_test.cof hb_test.bin‘ I got this binary.

How do I get it into the NumberSmasher? Again, that required a bit of coding… say Hi to nc_load.exe.
This little tool loads a pure i860-binary, pokes it into the NumberSmashers RAM and optionally starts it afterwards, ie. ‘nc_load hb_test.bin 20 start‘ loads my test program to address 0x14 and starts it from there.
[Having read the previous post, you should know that you could omit the ‘start’ parameter and just poke 20 to address 0 to start the code]

Test, test, one, two, three…

And now the exciting part: Does it work? The easiest way to test this is good ol’ debug again:

C:\> debug
-o 151 41
-i 150
42

Yay! If this isn’t proving the sense of life, what else!?!? 😉 Ok, what happened here is simple. I wrote 41 to port 151 on which the C012 is listening for input, then I read from port 150 which is the result of adding 1 to the input. Quod erat demonstandum. Program is running successfully!

Be aware that after starting a programm, the NumberSmasher is continously running that code, i.e. peek & poke do not work anymore because the Boot-ROM is out of the game.
You have to reset the NC which is done in classic INMOS-style, i.e. sending a zero to port 0x160. Thats ‘-o 160 0’ in debug. NB: Resetting the NC does not clear its RAM. The previously uploaded program is still available and can be re-started by pokeing the start-address to 0.