Tag Archives: assembler

Carrera in an SE/30 – the code part 3

Ahh, back in cosy main: – looks much easier now after that crazy MMU stuff in the previous part, right?

The next subroutine called is proc32. In the complete source code (reminder: Available at GitHub) I commented that with “works (get some RSC strings)“… and well, that sums it up pretty good.
proc32 loads (i.e. creates handles) from the resource-fork, e.g. the icons used in the menu-bar and several error-messages like “This application must run on the 68030 processor, please quit all other 68040 applications and re-run this application.“. That’s it. Boring…

That boredom instantly changes when we get to the next subroutine proc43located at 0x29DA…

I did it my way…

One fascinating thing about classic Mac OS is how easy it is to patch system calls, aka Toolbox traps. For example in the previous post we came about _BlockMove, which is a Toolbox call to copy an amount of RAM from A to B.
For example you have just read this article about a faster BlockMove method, you’re totally free to patch (read: replace) _BlockMove with your speedier version and automatically use this throughout your application – or even system-wide, if you’ve created an INIT…  [If you want to know all about it… here’s a book for you]

And that’s what proc43 heavily does. Because it’s a long subroutine (230 lines) so I will give you just one example – the inline comments should do…

2BE2:        MOVE    #$A02E,D0     ; BlockMove
2BE6:        _GetTrapAddress newOS ; (D0/trapNum:Word):A0\ProcPtr 
2BE8:        MOVE.L  A0,$270(A5)   ; oldBlockmove
2BEC:        LEA     data42,A0     ; myBlockMove
2BF0:        TST.B   MMU32bit      ; loMem global "current address mode"
2BF4:        BNE.S   lae_70        ; skip if 32bit clean machine else
2BF6:        LEA     data43,A0     ; use a different entry for dirty machines
2BFA: lae_70 MOVE.L  A0,$274(A5)   ; save routine pointer to $274(A5)	
2BFE:        LEA     data41,A0     ; DC.L 0000 0000
2C02:        MOVE.L  $270(A5),(A0) ; save oldBlockmove vector into there
2C06:        MOVE.L  #$A02E,D0     ; BlockMove
2C0C:        LEA     data40,A0     ; aaaand replace it by myBlockmove
2C10:        _SetTrapAddress newOS; (A0/trapAddr:ProcPtr; D0/trapNum:Word)

This is the sum up what else being done:

  • Save all debugger vectors into A5-world locations (suspicious. I sense Macsbug killing…)
  • Load the PACK4 resource, that’s the Floating Point emulation package (aka SANE) if no FPU found
  • Check & read several system Gestalt codes into A5-world (0x2AAC-0x2B44)
  • Patch several Toolbox traps
    • SwapMMUMode replaced by data19
    • VM_Displatch by data22
    • Pack4 by data10
    • Pack5 by data11
    • BlockMove by data40
    • jClearCache by myClearCache
    • GetNextEvent by myGetNextEvent
    • GetResource by myGetResource
    • SCSIdispatch by mySCSIdispatch
    • DrawMenuBar by myDrawMB
    • LoadSeg by data31
    • UnLoadSeg by data32
    • HWPriv by data33
    • vStdExit by data34

So far, so many.  Then there’s some RAM copying going on, of which I’m currently not quite sure what it is good for (0x2CAC-0x2CD8) 💡 .

Finally, the myShutdown routine is installed into the Shutdown Manager, i.e. it will executed before the Mac is powered down/restarting (it simply switches the host back to its own 68030). After that, RTS into main…

“There and back again…”

Barely back in main, a JSR 12(A6) warps us into MacII_4th, the last of the four handlers every supported system has.

This loads specific data from the FPSP into RAM (namely IDs 0x12C and 0x12D).
Finally a special floppy driver is installed (myFloppyDrvr @ 0x954) which IMHO just differs from the original in handling the ‘040 caches correctly. That was that and back to main…

The next sub-routine in line is chkATalkVer. I can rightfully name that routine because it’s short and crystal clear: Figure out if AppleTalk is installed, and if true, return its version in D0 (and also write it into A5-world). C’est ca…

This is the end…

It’s getting ugly (for now)… proc42 will be called – the last subroutine in main before my SE/30 crashes and burns  😥

The first few lines (0x28F4-0x293C) are comparably harmless. They are working around a bug in System 7.1 which was corrected in 2/17/92 according to some dark sources (“Corrected value of timeSCSIDB from 0DA6 to 0B24”).
After that, proc38 (0x293C) is called which again calls proc39 and something’s done with the TimeManager, not really sure what’s exactly going on, but it feels like a timing-benchmark heavily using InsTime, PrimeTime and RmvTime Toolbox calls.

[hold yer breath] Then we’re getting closer to the flat line… The stack is filled with these parameters:

2940:   CLR.L   -(A7)       ;PUSH.L 00000000 
2942:   CLR.L   -(A7)       ;PUSH.L 00000000 
2944:   CLR.L   -(A7)       ;PUSH.L 00000000 
2946:   PUSH.L  #$80008000  ;       80008000
294C:   CLR.L   -(A7)       ;PUSH.L 00000000 
294E:   PUSH.L  #64         ;       00000040
2954:   PUSH.L  #1          ;       00000001

and SpeedProc is called…

…To be continued 😉

P.S: I changed course (again) and started to investigate more into the C040’s hardware. The more I understand of the INIT/CP workings the more I can’t fight the idea that it really might be a hardware timing issue.

Carrera in an SE/30 – the code part 2

3rd handler

Next up is the 3rd handler, MacII_3rd: (0x3F94) in our case. Actually it’s called with JSR 8(A6), but that’s an 8 byte offset to the ‘base-address’ of any handler. Clever stuff, huh (Google for ‘pointer-table’)?

This subroutine contains serious magic and was a real hard nut to crack. Especially because it tricked me into believing that I’ve found the ‘crashsite’… which, to spoil the tension, isn’t.
It just kept on killing  Macsbug, because it’s so low-level.

What this routine does is replacing the Vector Base Register (VBR) which ‘lives’ at address 0x00000000. Evil stuff.

  • After disabling interrupts and switching to 32bit-mode a field with 6 long-words (data107) will be populated with data generated in other routines.
    For now I can only guess what these entries are (Values from my SE/30 given in brackets). We’ll discuss all that further down.
  • 0x3FC6 to 0x3FD8 calculates the size of the chunk of code starting at data106 (0x4008) to the beginning of MacII_4th (i.e. the end of Mac_3rd), which is 180 bytes.
  • Using this length, the routine first saves the current VBR onto the stack using the system call _BlockMove.
    Then the original VBR (+some more) will be replaced by the new version beginning at data106. (Killing Macsbug – more on that later)
  • BSR 53_cmd_1x is been called. This brings the Carrera040 into life most likely using the just copied VBR (This is discussed in much detail further down).
  • Now the contents of the stack (= copy of the original VBR) will be copied back into its place, this time using a classic DBRA loop (0x3FF4). My guess, no Toolbox call possible at the moment.
  • Adjust the stack, back to 16bit mode, restore Registers and return-from-subroutiene. Done.

Here’s the code doing all this:

3F94:MacII_3rd: MOVE    SR,-(A7)     ; 3rd call from MacII handler
3F96:           ORI     #$700,SR       ; Set bit 9-11 of SR (disable Interrupts)
3F9A:           MOVEM.L D0-D2/A0-A2,-(A7)
3F9E:           MOVEQ   #1,D0
3FA0:           _SwapMMUMode  
3FA2:           PUSH.B  D0
3FA4:           SUBA.L  A2,A2         ; faster movea.l #0,a2
3FA6:           LEA     data107,A0    ; Filling the data into the 6x32 field
3FAA:           MOVE.L  96(A5),D0
3FAE:           MOVE.L  D0,(A0)+      ; SE30: 9FE00
3FB0:           LEA     data69,A1
3FB4:           MOVE.L  A1,(A0)+      ; SE30: 9D6E2 (User/Supervisor Rootpointer?)
3FB6:           MOVE.L  $64(A5),(A0)+ ; 807FC040
3FBA:           MOVE.L  $6C(A5),(A0)+ ; 807FC040
3FBE:           MOVE.L  $68(A5),(A0)+ ; 00000000
3FC2:           MOVE.L  $70(A5),(A0)+ ; 00000000
3FC6:           LEA     MacII_4th,A0
3FCA:           MOVE.L  A0,D2
3FCC:           LEA     data106,A0
3FD0:           SUB.L   A0,D2         ; 'distance' from data106 to MacII_4th
3FD2:           SUBA.L  D2,A7
3FD4:           MOVEA.L A2,A0
3FD6:           MOVEA.L A7,A1
	  	; save the current VBR to the stack
3FD8:           MOVE.L  D2,D0
	  	; A0 = SE30: 00000000 (src)  - IIci: $FBB08000
	  	; A1 = SE30: 027ff34c (dest) - IIci: $3BF9FC6
	  	; D0 = B4    (count)   - SAME on the IIci!    
3FDA:           _BlockMove ; (A0/srcPtr,A1/destPtr:Ptr; D0/byteCount:Size) 
	  	; write my own VBR...
                ; This copies 180 bytes into 0x000000000 replacing the original VBR. 
                ; ... and kills Macsbug if not circumvented properly.
3FDC:           LEA     data106,A0
3FE0:           MOVEA.L A2,A1
3FE2:           MOVE.L  D2,D0
	  	; A0 = 9F900 (src)   - IIci 10C4EA (data88)    
	  	; A1 = 00000 (dest)  - IIci FBB08000
	  	; D0 = B4    (count) - IIci same  
3FE4:           _BlockMove ; (A0/srcPtr,A1/destPtr:Ptr; D0/byteCount:Size) 
3FE6:           BSR     53_cmd_1x  ; Bring the C040 to life
3FEA:           MOVEA.L A7,A0  ; SP to A0
3FEC:           MOVEA.L A2,A1  ; SE30: 00000000
3FEE:           MOVE.L  D2,D0  ; the code length (B4 again)
3FF0:           BRA.S   lae_163
3FF2:   lae_162 MOVE.B  (A0)+,(A1)+ ; Write the VBR back from the stack
3FF4:   lae_163 DBRA    D0,lae_162
3FF8:           ADDA.L  D2,A7 ; adjust the stack
3FFA:           POP.B   D0
3FFC:           _SwapMMUMode  
3FFE:           MOVEM.L (A7)+,D0-D2/A0-A2
4002:           MOVE    (A7)+,SR
4004:           MOVEQ   #0,D0
4006:           RTS     
 
; Start of VBR replacement- and 040-Code being copied to 0x0 the by line 0x3FE4 
; /if/ theses are the Vectors 0-17, then their meaning would be:
 
4008: data106:  DC.L    #$00001000 ; Reset initial Stack Pointer
400C:           DC.L    #$00000050 ; Reset initial Program Counter
; - ALL of these Vectors point to addr 4050 (offset 0x48) -
4010:           DC.L    #$00000048 ; Buserror  
4014:           DC.L    #$00000048 ; Adress Error
4018:           DC.L    #$00000048 ; Illegal Instruction
401C:           DC.L    #$00000048 ; Zero Divide
4020:           DC.L    #$00000048 ; CHK, CHK2 instruction
4024:           DC.L    #$00000048 ; cpTRAPcc, TRAPcc, TRAPV instruction
4028:           DC.L    #$00000048 ; Privilige Violation
402C:           DC.L    #$00000048 ; Trace
4030:           DC.L    #$00000048 ; LINE 1010 Emulation
4034:           DC.L    #$00000048 ; LINE 1111 Emulation
 
; THESE are definitely no vectors, they are dynamically written by the code above
; and to be used to setup the 040 MMU registers.
4038: data107:  DC.L    #$0009FE00 ;  
403C:           DC.L    #$0009D6E2; 
4040:           DC.L    #$807FC040 ; 
4044:           DC.L    #$807FC040 ; SE30: 
4048:           DC.L    #$00000000 ; SE30: 00000000 
404C:           DC.L    #$00000000 ; SE30: 00000000 
 
4050:           CLR.L   $53000000  ; Poke 0 to $53000000
4056:           BRA     lae_164    ; This points to itself... I'm lost at the moment.
4058:           LEA     data107,A0 ; SE30: 9F900
405C:           MOVE.L  (A0)+,D1   ; SE30: 0009FE00 (User/Supervisor Rootpointer)
405E:           MOVEA.L (A0)+,A1   ; 0009D6E2
4060:           MOVE.L  (A0)+,D4   ; 807FC040
4062:           MOVE.L  (A0)+,D5   ; 807FC040
4064:           MOVE.L  (A0)+,D6   ; 00000000
4066:           MOVE.L  (A0)+,D7   ; 00000000
4068:           MOVE.L  #$C000,D0
406E$           MOVEC   D0,ITT0   ; Set Instruction Transparent Translation
4072$           MOVEC   D0,DTT0   ; Set Data Transparent Translation
4076$           MOVEC   D1,SRP    ; Set Supervisor Rootpointer
407A$           MOVEC   D1,URP    ; Set User Rootpointer
407E:           MOVE.L  #$C000,D0
4084$           PFLUSHA           ; Invalidates all entries in the address translation cache
4086$           MOVEC   D0,TC 
408A:           LEA     data108,A0
408E:           ADDA.L  #$53002000,A0 ; (=0x530A1900)
4094:           JMP     (A0)      ; JuMP to data108 code (below) in C040 RAM range?     
 
4096:  data108: MOVEQ   #0,D0
4098$           MOVEC   D0,ITT0 
409C$           MOVEC   D0,DTT0 
40A0$           MOVEC   D4,ITT0 
40A4$           MOVEC   D5,DTT0 
40A8$           MOVEC   D6,ITT1 
40AC$           MOVEC   D7,DTT1 
40B0$           CINVA   BC
40B2:           NOP     
40B4:           MOVEQ   #0,D0
40B6$           MOVEC   D0,CACR
40BA:           JMP     (A1)    ; 0009D6E2
 
; END 040 Code being copied to somewhere by line 3FE4 
40BC: MacII_4th: MOVEM.L D1-D7/A0-A4,-(A7)  ; 4th subroutine called my MacII_handler
[...]

The Vector Base Register

I wasn’t precise when I initially said “replacing the VBR”. What actually happens is that this routine uses what I’d call an interim-VBR for the moment it initializes the 68040 on the C040. You’ve probably saw the link referring to what the VBR is in the 1st post of this series, but let me go a bit more into detail.

The VBR is a list of addresses (aka vectors) the CPU refers to in case of an exception – and this is true for every 68k system out there, e.g. Mac, SUN, NeXT, Amiga or Atari. Some of them might do some relocation using their MMU, but even the virtual address will be 0x00000000 and the order is the same.  There are 16 basic vectors as listed here:

If for example a divide-by-zero happens, the CPU would call a handler which address is stored in 0x14.  Pretty simple.
So let’s have a look what MacII_3rd left in the VBR (and below that) when the ‘interim VBR’ is in place:

0000: data106:  DC.L    #$00001000 ; Reset initial Stack Pointer
0004:           DC.L    #$00000050 ; Reset initial Program Counter
     ; - ALL of these Vectors point to addr 0x48 -
0008:           DC.L    #$00000048 ; Buserror  
000C:           DC.L    #$00000048 ; Adress Error
0010:           DC.L    #$00000048 ; Illegal Instruction
0014:           DC.L    #$00000048 ; Zero Divide
0018:           DC.L    #$00000048 ; CHK, CHK2 instruction
001C:           DC.L    #$00000048 ; cpTRAPcc, TRAPcc, TRAPV instruction
0020:           DC.L    #$00000048 ; Privilige Violation
0024:           DC.L    #$00000048 ; Trace
0028:           DC.L    #$00000048 ; LINE 1010 Emulation
002C:           DC.L    #$00000048 ; LINE 1111 Emulation
     ; - THESE are definitely no vectors, they are dynamically written by the 
     ;   code above and to be used to setup the 040 MMU registers.
0030:  data107: DC.L    #$00000000 ; SE30: 0009FE00  (12)
0034:           DC.L    #$00000000 ; SE30: 0009D6E2  (13)
0038:           DC.L    #$00000000 ; SE30: 807FC040  (14) 
003C:           DC.L    #$00000000 ; SE30: 807FC040  (15)
0040:           DC.L    #$00000000 ; SE30: 00000000 
0044:           DC.L    #$00000000 ; SE30: 00000000 
 
0048:           CLR.L   $53000000  ; Poke 0 to $53000000 ; C040 off
004C:  blocker3 BRA     blocker3    ; Points to itself... probably a "blocker"
0050:           LEA     data107,A0 ; initial Program Counter (SE30: 9F900)
0054:           MOVE.L  (A0)+,D1   ; SE30: 0009FE00 (User/Supervisor Rootpointer)
0058:           MOVEA.L (A0)+,A1   ; 0009D6E2
005C:           MOVE.L  (A0)+,D4   ; 807FC040
0060:           MOVE.L  (A0)+,D5   ; 807FC040
0064:           MOVE.L  (A0)+,D6   ; 00000000
0068:           MOVE.L  (A0)+,D7   ; 00000000
006C:           MOVE.L  #$C000,D0
0070:           MOVEC   D0,ITT0    ; Set Instruction Transparent Translation
0074:           MOVEC   D0,DTT0    ; Set Data Transparent Translation
0078:           MOVEC   D1,SRP     ; Set Supervisor Rootpointer
007C:           MOVEC   D1,URP     ; Set User Rootpointer
0080:           MOVE.L  #$C000,D0
0084:           PFLUSHA            ; Invalidates all entries in the address translation cache
0088:           MOVEC   D0,TC 
008C:           LEA     data108,A0
0090:           ADDA.L  #$53002000,A0 ; (=0x530A1900)
0094:           JMP     (A0)       ; JuMP to data108 code (below) in C040 RAM range? 
 
009C:  data108: MOVEQ   #0,D0
00A0:           MOVEC   D0,ITT0 ; 0
00A4:           MOVEC   D0,DTT0 ; 0
00A8:           MOVEC   D4,ITT0 ; 807FC040
00AC:           MOVEC   D5,DTT0 ; 807FC040
00B0:           MOVEC   D6,ITT1 ; 00000000
00B4:           MOVEC   D7,DTT1 ; 00000000
00B8:           CINVA   BC
00BC:           NOP     
00C0:           MOVEQ   #0,D0
00C4:           MOVEC   D0,CACR
00C8:           JMP     (A1)    ; 0009D6E2
Farewell, old friend

At this point, my SE/30 always froze and I thought this must be the point where to find incompatibilities between the IIci and SE/30.
But after understanding, what’s really going on, it was clear that overwriting the TRAP exception (Nr.7), Macsbug was simply kicked out of the game as this exception is triggered after every step/trace you do in a debugger…
So to get beyond this point, I had to modify the program counter to skip the point where TRAP is copied-over… which is done inside the Toolbox’ _BlockMove call. So I had to single-step into that and find the right call/time to do a ‘pc=pc+2’ 😉 (Good thing you can define a macro for that).

Okayyyyy. After that’s been written, 53_cmd_1x is called, presumably telling the C040 to come to life.
And keen as it is, it’ll look up the “Reset initial Program Counter” (VBR: 0xC) and starts executing code from 0x50. Any other occurring exception will call the ‘handler’ at 0x48, simply switching the C004 off and sit in an endless loop (0x4C) – probably making the 68030 to take over again.

EmEmYou!

Given everything’s fine, the code at 0x50 will start reading the previously populated data from data107 into several registers.
Then some serious 68040 MMU table setup happens – so this is some kind of ‘040 initialization routine… and the ‘040 is actually running. Woohoo!

Time for some special register explanation:
As we all know, the 68040 has two in-build 4k caches and an MMU. The latter can be programmed how and what to cache. This is defined in 4 registers of which only 2 are of interest here: ITT0 and DTT0, the Instruction and Data Transparent Translation registers, both sharing the same bit-fields following this pattern:

BBBBBBBBMMMMMMMMESS000UU0CC00W00

  • BLogical Address Base – compared with address bits A31-A24. Addresses that match in this comparison are transparently translated
  • MLogical Address Mask – setting a bit in this field causes corresponding bit in Base field to be ignored
  • E – Enable Bit – 1 – translation enabled; 0 – disabled
  • SSupervisor Mode – 00 – match only in user mode 01 – match only in supervisor mode 1x – ignore mode when matching
  • U – User Page Attributes – ignored by 040
  • CCache mode – 00 – Cacheable, Write-through 01 – Cacheable, Copyback 10 – Noncacheable, Serialized 11 – Noncacheable
  • WWrite protect – 0 – write permitted; 1 – write disabled

Here’s an example:

807FC040 = 10000000011111111100000001000000 
           BBBBBBBBMMMMMMMMESS000UU0CC00W00

which means:

  • a bit more than 2GB transparently translated (2063MB)
  • translation enabled
  • Supervisor Mode: ignore mode when matching
  • Cache mode: Noncacheable, Serialized
  • Write permitted

So let’s have a look at the code again:

At 0x70/0x74 the MMU is set to 0xC000, i.e. Enable translation, apply for user & supervisor mode, write-though cache, for logical address space 0x00000000-0x00ffffff (16MB).
Then Supervisor & User Rootpointer are set to 0x9FE00, then the address translation cache is flushed to finally set the Translation Control register to Enable & 8K page size (0x88)… up to here this was pretty much ‘by the book’ of how to set-up MMU tables.

Having its MMU all set, the 68040 now gets something to chew on:
The address of data108 is added to 0x53002000 and jumped to!
💡  Does 0x53002000 equal 0x00000000 for the C040?

Let’s assume the C040 executes the code at data108 for now. That is:

  • Clear the ITT/DTT registers
  • Set the MMU to 0x807FC040 (see decoding example above)
  • invalidate caches and wait’a’NOP to have that happened
  • then disable all caches
  • and jump to where A1 points to. In my SE/30 that’s 0x9D6E2, previously loaded from data107 in 0x58

Writing all this from the top of my head, I’m not 100% sure where this address is pointing to. I must be somewhat back into MacII_3rd (0x3FEA), because this is where the program execution resumes (Need to check this with Macsbug and will update).

For now, I’m tempted to call MacII_3rd something like ‘C040_MMU_setup‘… but I’d love to have this confirmed  💡 by somebody who knows more than me 😉

Next up will be continuing working further through the main: procedure again… so move over here.

Carrera in an SE/30 – the code part 1

The disassembled code of the Micromac Carrera 040 control panel is quite big: 6000+ lines of 68030/40 assembly…

While these posts might be entertaining and giving you an insight into classic MacOS driver code, they are also meant as a notebook to myself to get into the source quickly – especially after some weeks or months of distraction 😉

That said, I will not discuss each and every line of code. There are many parts which aren’t important (for now) or just not reached yet.
Still, it will take several parts/chapters to cover everything I worked on.

The complete code is available over here on GitHub and will updated every time I’m working on it.
Whenever I’m mentioning addresses I’m referring to this code on GitHub. NB: I will never use line-numbers as these might change during editing the source.
Also, when you’ll see a light-bulb  💡  somewhere, this is where I’m not sure and happy about enlightenment or comments from you 😉

This article is totally work-in-progress. E.g. every now and then my theories about what a certain code does changes, I learn new things and all the sudden whole blocks of code make sense… so this post will change/grow, too.

Approaching… difficulties.

What’s the main job of this code? From a 30000ft perspective the simple answer is “switching the Carrera040 on and off”, i.e. toggling between the hosts slower on-board 68030 and the insanely fast 68040 on the C040. At boot-time… as well as during the system is running (by user interaction).

Sounds pretty simple, huh? Lowering our flight altitude to 3000ft more things come into play:
Identify the hosting Macintosh. As mentioned in the previous chapter, the C040 was able to run in a Mac II, IIx, IIcx, IIvx, IIvi, IIvm, IIsi, IIci, LC and LCII… all of them different in many places. These differences have to be handled…
Down at 30ft we have to admit that there are differences between a 68030 and his younger brother 68040, mainly concerning caches, FPU and the MMU.
Finally hitting the ground, it’s becoming clear that it is everything but trivial to halt a running processor, save its complete context and start another (slightly different) processor with that. And back again…

Some given things before we start:

  • We will concentrate on the IIx “branch” as this machine is closest to the SE/30 like not-32bit-clean, memory-map, the GLUE chip, two real VIAs with the same register layout etc.
  • I learned from the code that the C040 is memory-mapped at 0x53000000 in some of the supported models, especially the IIx and IIci. This means 32bit addressing is a must (-> need “mode32” INIT or clean ROM)
  • I tried to comment as much as possible/understood inline (i.e in the code) – a good bit of 68k machine language knowledge is still required 😉
  • If something needs more explanation, I’ll try to provide this before the code quote or afterwards.

So this is the main routine (at 0x21FC):

main     MOVEM.L A4-A6,-(A7)
         MOVE.L  D0,D7
         MOVE.L  #$31E,D0    ; need 798 bytes
         _NewPtr ,CL_SY      ; allocate requested amount of memory (D0) in system
                             ; heap (returned in A0) and initialize to zeroes
         TST     D0          ; success?
         BNE.S   lae_6       ; nope, exit.
         LEA     data2,A1    ; else
         MOVE.L  A0,(A1) ; Init A5 world and save into data2
         MOVEA.L A0,A5
         MOVE    #$A89F,D0   ; UnimplTrap
         _GetTrapAddress     ; (D0/trapNum:Word):A0\ProcPtr 
         MOVE.L  A0,$29C(A5) ; save the trap addr into 2 places 
         MOVE.L  A0,$2A0(A5) ; in the A5 world
         BSR     sysDetect   ; Jump to Machine detection routine 
         BNE.S   lae_6       ; success?
         MOVE.L  D7,D0
         JSR     4(A6)       ; We jump to the subroutine set in the detection routine
                             ; for the second time, this time offset 4...
			     ; i.e. we skip the 1st 'BRA' there
         BNE.S   lae_6       ; success?
         BSR     instFPSP    ; Install Motos FPSP
         BNE.S   lae_6       ; success?
         JSR     8(A6)       ; That's the 3rd call in the handler call cascade (needs hack for MacsBug!)
         BNE.S   lae_6       ; success?
         BSR     proc32      ; works (get some RSC strings)
         BSR     proc43      ; install traps
         BNE.S   lae_6       ; success?
         JSR     12(A6)      ; That's the 4th call in the handler call cascade
         BNE.S   lae_6       ; success?
         BSR     proc41      ; works (atalk?)
         BSR     proc42      ; VIA stuff and such - BOOM
         BSR     proc29
         MOVEM.L (A7)+,A4-A6
         MOVEQ   #0,D0

As you can see, there are 10 calls to subroutines- currently it crashes inside the 8th subroutine, currently called proc42… But let’s check these subroutines one by one.

sysDetect

This is the subroutine I had to “patch” to initially make the driver work with an SE/30. It starts at 0x2022 and does these things:

  • Check if the ‘Gestalt‘ trap is available at all (very good style!) else throw an error
  • If it is, read the machines Gestalt code into D0, throw an error if zero
  • Decide which ‘handler’ to choose given the Gestalt code.

Based on their Gestalt codes there are four groups of Macs defined in the following lines (0x204C – 0x20AC):

  • Mac II/IIx/IIcx — “dirty Macs”, not 32bit clean, no PDS
    • “Expansion I/O Space” from 0x51000000 to 0x5FFFFFFF
    •  the C040 installs with an adapter right into the CPU socket in the II/IIx/IIcx
    • SE/30 is also “dirty”, need mode32 or IIsi ROM in slot
    • these machines also use the GLUE chip to emulate the VIA2 like the SE/30
  • Mac IIvx, IIvi, IIvm — special kind of PDS slot
    • there’s no mentioning of support on the MicroMac page
  • Mac IIsi, IIci
    • Kind of interesting because the si has the same PDS slot like the SE/30
    • Uses the RBV (Ram Based Video) controller which emulates the VIA2
    • Therefore totally different memory layout (VRAM at 0x00000000 mapped by the MMU etc.)
  • Mac LC, LCII, Color Classic
    • These share the same LC-PDS slot

If your Mac is one of those (or patched at 0x2058), you’ll branch into sys_check: (0x20BA) which will make sure you run at least System 6.0.5, have virtual memory switched off and jumps into the selected handler code (address saved in A6) at 0x20EA  for the first time.

Here’s the code of what’s discussed above:

2022: sysDetect: MOVE.L  #$A0AD,D0  ; Gestalt
2028:         _GetTrapAddress newOS ; (D0/trapNum:Word):A0\ProcPtr 
202A:         MOVE.L  A0,D2
202C:         MOVE.L  #$A89F,D0     ; UnimplTrap
2032:         _GetTrapAddress newTool; (D0/trapNum:Word):A0\ProcPtr 
2034:         CMP.L   A0,D2
2036:         BEQ     OS_bad
203A:         MOVE.L  #'mach',D0
2040:         _Gestalt ; (A0/selector:OSType):D0\OSErr 
2042:         BNE     bad_conf      ; If we can't read it, fire general Error Msg
2046:         MOVE.L  A0,D0
2048:         MOVE.L  D0,2(A5)
 
; Check for several Mac models which are grouped into 3, each having its own handler routine. 
; 1) Mac II/IIx/IIcx 
; 2) IIvx, IIvi, IIvm 
; 3) IIsi, IIci     
; 4) LC, LCII, Color Classic
 
204C:         LEA     MacII_handler,A6   ; -- The dirty gang
2050:         CMPI.L  #6,D0        ; MacII 
2056:         BEQ.S   sys_check
2058:         CMPI.L  #7,D0        ; MacIIx - we replace this by the SE/30 #9
205E:         BEQ.S   sys_check
2060:         CMPI.L  #8,D0        ; IIcx
2066:         BEQ.S   sys_check
 
2068:         LEA     V_handler,A6   ; -- The "V" Macs.
206C:         CMPI.L  #48,D0       ; IIvx
2072:         BEQ.S   sys_check
2074:         CMPI.L  #44,D0       ; IIvi
207A:         BEQ.S   sys_check
207C:         CMPI.L  #45,D0       ; IIvm
2082:         BEQ.S   sys_check
 
2084:         LEA     IIci_handler,A6   ; -- IIci and IIsi
                                        ; BOTH share the same "Expansion I/O Space" (0x5300 0000)
2088:         CMPI.L  #11,D0       ; IIci
208E:         BEQ.S   sys_check
2090:         CMPI.L  #18,D0       ; IIsi 
2096:         BEQ.S   sys_check
 
2098:         LEA     LC_handler,A6    ; -- The LC-PDS family
209C:         CMPI.L  #19,D0       ; LC
20A2:         BEQ.S   sys_check
20A4:         CMPI.L  #37,D0       ; LCII
20AA:         BEQ.S   sys_check
20AC:         CMPI.L  #49,D0       ; Color Classic
20B2:         BEQ.S   sys_check
 
; Any other Model/Gestalt will bring up an error alert-box 
 
20B4:         MOVE    #$1B5B,D0    ; "Carrera040 does not support this Macintosh model."
20B8:         BRA.S   RET_err      ; -> "TST     D0 & RTS"
 
; We found a supported model, so keep on going checking for the OS version...
 
20BA: sys_check: MOVE.L  #'sysv',D0    ; Check OS version
20C0:         _Gestalt              ; (A0/selector:OSType):D0\OSErr 
20C2:         BNE.S   bad_conf      ; If we can't read it, fire general Error Msg
20C4:         MOVE.L  A0,D0
20C6:         CMPI    #$605,D0     ; System 6.0.5
20CA:         BGE.S   OS_ok        ; or greater
20CC: OS_bad: MOVE    #$1B5C,D0    ; "Carrera040 does not work with this version of the operating system."
20D0:         BRA.S   RET_err      ;
20D2: OS_ok:  MOVE.L  #'vm  ',D0   ; Check for enabled Virtual Memory
20D8:         _Gestalt             ; (A0/selector:OSType):D0\OSErr 
20DA:         BNE.S   bad_conf     ; If we can't read it, fire general Error Msg
20DC:         MOVE.L  A0,D0
20DE:         BTST    #0,D0
20E2:         BEQ.S   VM_ok
20E4:         MOVE    #$1B5D,D0    ; "Carrera040 does not work with Virtual Memory turned on. 
                                   ; Please turn off Virtual Memory in the Memory control panel and restart your Mac."
20E8:         BRA.S   RET_err
20EA: VM_ok:  JSR     (A6)         ; This is the actual HANDLER CALL, been set in $204C-$2098
20EC:         BNE.S   RET_err
20EE:         MOVE.B  34(A5),D0    ; 34(A5) seems to contanin the Jumper settings at the lowest 3 bits and only three of them are valid:
20F2:         CMPI.B  #7,D0        ; 7 -> 111
20F6:         BEQ.S   RET_ok
20F8:         CMPI.B  #6,D0        ; 6 -> 110
20FC:         BEQ.S   RET_ok
20FE:         CMPI.B  #5,D0        ; and 5 -> 101 
2102:         BEQ.S   RET_ok
2104:         MOVE    #$1B5E,D0    ; "Carrera040 does not recognize the jumper settings on the Speedster card. 
                                   ; Please check the settings against the manual.
2108:         BRA.S   RET_err
210A: RET_ok: MOVEQ   #0,D0        ; clear D0 (no errors)
210C:RET_err: TST     D0           ; Set the Z-Flag (D0 contains Err-Code) and
210E:         RTS                  ; return from Subroutine
2110:bad_conf:MOVE    #$1B5A,D0    ; "Carrera040 does not support your system configuration."
2114:         BRA     RET_err

Yes, there’s also stuff after the call to the handler, but let’s check that handler first.
As said in the beginning, I chose to take the “IIx route”. The MacII_handler code is actually just another vector jump-table which will later be used with offsets:

414E:  MacII_handler:  BRA   MacII_1st ; From II, IIx & IIcx
4152:                  BRA     MacII_2nd
4156:                  BRA     MacII_3rd
415A:                  BRA     MacII_4th

Let’s have a look into the first call MacII_1st:

3D14:  MacII_1st  MOVEM.L D1-D3/A0-A2/A6,-(A7) ; 1st call from MacII handler
3D18:           PUSH.L  8
3D1C:           LEA     data105,A0
3D20:           MOVE.L  A0,8       ; Is that the Bus Error Handler at 0x00000008?
3D24:           MOVE    #$1B5F,D3  ; 7007
3D28:           MOVEQ   #1,D0
3D2A:           _SwapMMUMode  
3D2C:           PUSH.B  D0
3D2E:           MOVEA.L A7,A6
3D30:           BSR     read_5300k2
3D34:           MOVEQ   #0,D3
3D36:  data105  MOVEA.L A6,A7
3D38:           POP.B   D0
3D3A:           _SwapMMUMode  
3D3C:           POP.L   8
3D40:           MOVEQ   #0,D0   ; ?
3D42:           MOVE    D3,D0   ; overwriting?
3D44:           BNE.S   lae_153
3D46:           MOVEM.L D1-D2/A0-A2,-(A7)
3D4A:           LEA     53_cmd_0,A0
3D4E:           MOVE.L  A0,6(A5)
3D52:           LEA     53_cmd_1x,A0
3D56:           MOVE.L  A0,10(A5)
3D5A:           LEA     read_5300k2,A0
3D5E:           MOVE.L  A0,14(A5)
3D62:           LEA     53_cmd_5.3,A0
3D66:           MOVE.L  A0,18(A5)
3D6A:           LEA     53_cmd_5.1,A0
3D6E:           MOVE.L  A0,26(A5)
3D72:           LEA     53_cmd_5.3.5.1,A0
3D76:           MOVE.L  A0,22(A5)
3D7A:           MOVEM.L (A7)+,D1-D2/A0-A2
3D7E:           BSR     read_5300k2
3D82:           ANDI.B  #7,D0
3D86:           MOVE.B  D0,34(A5)
3D8A:           MOVEQ   #0,D0
3D8C:  lae_153: MOVEM.L (A7)+,D1-D3/A0-A2/A6
3D90:           TST     D0
3D92:           RTS

As you can see, even in such simple and short subroutines are some things I just don’t get? For example why is the effective address of data105 written to 0x8? Is that replacing the Error Handler in the VBR?
Anyhow, I think I got the overall meaning of the rest of it. What happens is this:

After switching into 32bit mode (_SwapMMUMode) it reads a longword from 0x53000000. As initially mentioned, the C040 is mapped to this address. There are 2 identical functions to read from there, that’s why this one here called read_5300k2.
It looks like reading is sufficient because the result (returned in D7) is immediately overwritten by a pop. Also that BNE after two moves is beyond me (0x3D40)…  OTOH the rest of the code is pretty clear: It’s ‘populating’ the A5-world with subroutines I’d call 53-commands. These commands write a specific byte sequence to 0x53000000, obviously communicating with the C040. For better understanding I’ve named them e.g. 53_cmd_5.3.5.1 meaning writing 5, then 3, then 5 and finally 1 to this address.
At the end, 0x5300k is read again, this time the result is masked to the last bit and written to 34(A5) – this represents the C040 jumper-settings by the way.  Return from Subroutine…

Back in sys_check: this jumper-setting will be checked immediately for three valid settings: 111, 110 or 101 representing the supported CPU types (68040,68LC040,68EC040). If the setting is ok we’re done with sysDetect:and return to main:.

2nd handler

Located at 0x3D94 this is kind of  a ’50/50 subroutine’. One half is totally obvious (check RAM, ROM and addressing mode) and the other half is all greek to me… e.g. what is all that PUSHing about? There’s not a single POP inside this routine (or subroutines call from within).  Here’s a wild guess of mine:
It looks like 1 to 4  ‘RAM range triplets’ being pushed onto the stack and after that gestaltPhysicalRAMSize (#’ram ‘) is called, for example:

    3D9A:          CLR.L   -(A7) ; faster 'PUSH.L #00000000'
    3D9C:          CLR.L   -(A7) ; PUSH.L #00000000
    3D9E:          CLR.L   -(A7) ; PUSH.L #00000000
 
    3DA0:          PUSH.L  #$100000
    3DA6:          PUSH.L  #$50F00041
    3DAC:          PUSH.L  #$50F00000
 
    3DB2:          PUSH.L  #$2000
    3DB8:          PUSH.L  #$53000041
    3DBE:          PUSH.L  #$53000000
 
    3DC4:          PUSH.L  #$2000
    3DCA:          PUSH.L  #1
    3DD0:          PUSH.L  #$53002000
 
    3DD6:         MOVE.L  #'ram ',D0  ; Returns the number of bytes of the physical RAM 
    3DDC:         _Gestalt ; (A0/selector:OSType):D0\OSErr

But the gestaltPhysicalRAMSize call does not take parameters and simply returns the amount of available RAM.

The good thing is, this sub-routine works flawlessly on the SE/30 and we can move on…

instFPSP

instFPSP is the next call in line. I’m not going to discuss this code in detail because it actually doesn’t do much. Still there are many inline comments in this routine if you like to know more. Here’s the background:

The FPU in the 68040 was made incapable of IEEE transcendental functions, which had been supported by both the 68881 and 68882 and were used by the popular  fractal generating software of the time and little else.  The Motorola floating point support package (FPSP) emulated these instructions  in software under interrupt. As this was an exception handler, heavy use of  the transcendental functions caused severe performance penalties.

TLDR; Check for FPU(type) and load the FPSP code from the resource-fork into RAM. Done. Return to main:.

Phew, that’s it for now. In the next post/chapter we’ll touch the 3rd handler, which was really hard to decipher but interesting stuff to learn, too.

Digging deeper into the highRISC

After 7 years mainly doing research on Transputers and the i860, I had the feeling it’s time to do some more digging into the highRISC card.
If you have read my initial post about the miroHIGHRISC (and the Tiger) you remember the undocumented 20pin socket on the card (pictured in the upper right corner):

HighRiscLeft

Let’s have another look at the “UART port” again:

The pinout (the connector is rotated 90° counter clock-wise):

GND  oo  /WR0
D0   oo  INT2
D1   oo  /RD
D2   oo  /IOSEL
D3   oo  RESET (most likely)
D4   oo  A2
D5   oo  A3
D6   oo  A4
D7   oo  A5
VCC  oo  A23

Reading a bit of the BIOS’ disassembly, I stumbled across routines to talk to an UART.  A very common (D)UART of those days was the SCN2681. If you take a look at this chips specs, they perfectly fit to the signals provided at the HIGRISCs UART-port!

Here’s its pinout with the corresponding pins marked:

 

A2-A5 are used for A0-A3 on the 2681 and the only pin not directly represented is A23 which might be used to decode. Also, it nicely reveals that CPU INT2 is used for the UART.

The LR33000 datasheet tells me that there’s an 4MB IO-area starting at 0x1E000000 reaching up to 0x1EFFFFFF- most likely the 2681 will live there… and the corresponding signal called /IOSEL is available on the UART-port (and will perfectly help as chip-select decoder). Tadaa!

So after the UART we need to get the RX/TX signals to a higher level, i.e. the +/-15V of RS232 – this is the call for our old friend MAX232.

[current bread-board experiments sadly didn’t yield into ‘instant success(tm)’… I’m missing out something – need more time to investigate]

Bootcode / BIOS

The LR330xx CPU also has an /EPSEL EPROM select signal, indicating it’s accessing an EPROM expected to start at 0x1F000000 and ends at 0x1FFFFFFF (4MB right below the IO-area).
Using this knowledge and knowing that the MIPS standard boot-vector is at 0x1FC00000, it’s easy to feed the ROM-dump I did some years ago into the disassembler with the correct start address to do his job.

We need to get an understanding of this bootcode first, so that we can get an idea of “what is where” (e.g. ISA bus, UART etc) and later upload our own code and use those addresses.
Just to stick my head a bit into the clouds, the aim is to first port a then common MIPS monitor-program called ‘PMON’ and use that to run some sort of μLinux. But that’s probably another handful of years ahead…
PMON was a good source of information, because it’s originally written by LSI, supporting all the LSI eval-boards. Lo and behold, some of them had a 2681 UART, too… located at 0xBE000000, which is extensively used in my BIOS disassembly  😉
I have a certain feeling that miro borrowed some design ideas from the LSI Pocket Rocket evaluation board (don’t Google it, it’s a mythic being – if you have documentation, mail me!).

So this is the 2681 memory-map then:

#define BASE_2681 0xbe000000
#define SRA_2681 ((1*4)+BASE_2681) // 0xbe000004 status register 
#define THRA_2681 ((3*4)+BASE_2681) // 0xbe00000C Rx/Tx holding register 
#define ACR_2681 ((4*4)+BASE_2681) // 0xbe000010 Aux contrl. register 
#define ISR_2681 ((5*4)+BASE_2681) // 0xbe000014 interrupt state register 
#define CTU_2681 ((6*4)+BASE_2681) // 0xbe000018 Counter timer upper 
#define CTL_2681 ((7*4)+BASE_2681) // 0xbe00001C counter timer lower 
#define START_2681 ((14*4)+BASE_2681) // 0xbe000038 start timer 
#define STOP_2681 ((15*4)+BASE_2681) // 0xbe00003C stoptimer

Using those addresses we should easily identify the comms routines.

Something happens at 0xBE800000 which seems not UART related. So that’s probably the reason why A23 is available on the connector. That way we can ignore access to that address by OR’ing it with /IOSEL to create a /CS.

The DOS side of things

The tool to load a MIPS executable into the HIGHRISC is called DL.EXE. Loading the test-program prints this to the console:

miroHIGHRISC download program. V 1.00
(c) miro Computer Products AG , Germany

CONFIG: I/O-register-address: 0x368 
CONFIG: DRAM - base-address : 0xD000 
CONFIG: DRAM - size : 8 MB
CONFIG: TIGER - RAM - size : 8 MB

Resetcount = 87340

Loading test.zor
text : start=0x80030000 size=0x52c0
data : start=0x800352c0 size=0x520
bss : start=0x800357e0 size=0x150
entry : 0x800301a0
TIGER comm.address : 0x3ffd00
max_used_address : 0x35930 
real_DRAM : 0x800000 
Heapsize : 0x7CA6D0

test.zor sucessfully downloaded.

This gives us valuable information. The DOS-side uses the IO port 0x368 and has a memory window of 16K from  0xD000 to D3FF.
MIPS programs are loaded to 0x80030000 and the 16K seems to be mapped to 0x003FFD00, just 128K below the 4MB boundary of the LR33k address space.

As usual – this is heavily work-in-progress. So this post will be edited while making any new progress. TBC…

Putting the DSM to use

So after the lengthy description of the DSM cards – how can we make use of them? As said in the previous chapter, they were shipped with an assembler and even an early version of GCC (1.3) so development is pretty straightforward.

Activation

First, you have to understand how the cards integrate themselves into an ISA/EISA system. While the three versions (8, 16, 32bit) differ in some areas, the integration is more or less similar:

Each version offers a latch for controlling the card. This means to activate the card by writing bits to that latch to define a memory-window inside the hosts RAM to blend-in the cards dual-ported RAM  and/or resetting it etc.. The latch is accessible through an IO-port set by jumpers on the card (default 0x300).

So for the ISA cards you have to for example write a 0xC2 at that port-adress to reset & activate the card and use the mem-window of 0xDC000-DC7FF. In Turbo-Pascal this would be something like:

port[$300] := $c2;

This gives you a 2K mem-window to exchange data between the DSM and the host (just 1K for the DSM860-8).

The EISA cards obviously use other ports depending on the slot-number, so this would be an example to do the same for am DSM860-32, this time in Turbo-C:

outportb(slot_no * 0x1000 + 0x800, 0xc2); // For slot #2 this would be 0x2800

This would also open a mem-window at 0xDC000, this time up to 0xDCFFF, i.e. 4K long.

Memory

As mentioned above, the Host and the DSM-card are communicating through a memory-window of diffenrent sizes, depending on the DSM used. Due to their nature, the memory is looking different though. That said, at least they’re both litte-endian, so no byte-swapping needed.

The 80×86 side

For the hosting PC, memory looks pretty straightforward. 1KB-4KB of RAM somewhere in ‘lower-RAM’, that’s it.
While we don’t use it, it’s worth mentioning that there’s a 2nd memory window called “Common“. This is fixed at a specific address and is shared between all possible cards plugged into one host. I guess you already got it: This enables easy multi-processor communication… and gives a lot of possibilities for f**k-ups.

The i860 side

The memory-mapping on the i860-side is the same for the 16 and 32bit cards, the dual-ported RAM is located at 0xd0040000 (0xC0000000 for the DSM-8).
In any case the i860 memory is linear, 64bit wide and always on a 64-bit boundary. This means you have to read the DP-RAM area differently depending on which card you run your code. Here’s an example of how the DP-RAM looks like on the Host- and i860 side:

Host DP-RAM in DOS ‘debug’
-d dc00:0000
DC00:0000 11 22 33 44 55 66 77 88 ...

which would look like this on the i860 side:

DSM/8
C0000000 - 11 xx xx xx xx xx xx xx 22 xx xx xx xx xx xx xx
C0000010 - 33 xx xx xx xx xx xx xx 44 xx xx xx xx xx xx xx
C0000020 - 55 xx xx xx xx xx xx xx 66 xx xx xx xx xx xx xx
C0000030 - 77 xx xx xx xx xx xx xx 88 xx xx xx xx xx xx xx

DSM/16
D0040000 - 11 22 xx xx xx xx xx xx 33 44 xx xx xx xx xx xx
D0040010 - 55 66 xx xx xx xx xx xx 77 88 xx xx xx xx xx xx

DSM/32
D0040000 - 11 22 33 44 xx xx xx xx 55 66 77 88 xx xx xx xx

So reading and writing from/to the DP-RAM involves some thinking to be done by the programmer. Here are two code-snippets showing the difference between reading the DP-RAM on a DSM860-8 and an DSM860-16. First the ‘8 bit version’:

mov 4*8,r4
readloop:
ld.b 0(r15),r14  // Load BYTE from DP-RAM
st.b r14,0(r29)  // store it destination
addu 8,r15,r15   // add 8 to read-mem-pointer
addu 1,r29,r29   // add 1 to desitination-mem-pointer
addu -1,r4,r4    // loop-counter
xor r0,r4,r0     // Test Zero
bnc readloop

And the same for the DSM860-16:

mov 2*8,r4
readloop:
ld.s 0(r15),r14  // Load SHORT (2 Bytes) from DP-RAM
st.s r14,0(r29)  // store it destination
addu 8,r15,r15   // add 8 to read-mem-pointer
addu 2,r29,r29   // add 2(!) to desitination-mem-pointer  (short <> byte)
addu -1,r4,r4    // loop-counter
xor r0,r4,r0     // Test Zero
bnc readloop

Because of reading SHORTs (ld.s) the DSM860-16 version has to loop just 16 times while the 8-bit version has to do that 32 times.
Same applies to writing. You will find an example in the Mandelbrot program (Commented source file).

[This is work-in-progress and will expanded over time]

Action!

So here we go, finally some program running showing all the power behind the i860. I took the Mandelbrot example from R.D.Klein and modified it a bit, well quite a bit as it was written for the DSM860-8 and provided CGA output (yuck!).

Like most “external accelerator” programs, there’s one part running on the accelerator (the i860 in this case) and one part running on the host doing useful things with the provided data. In this case we have an i860 assembler code doing the number-crunching on the Mandelbrot algorithm using the i860’s ability of ‘dual instruction-mode‘ and some code done in Trubo-Pascal handling the display and zooming.
The latter was extended to use SVGA (640x480x256) output and providing an interrupt driven timer. [sourcecode package cleanup is still work in progress]

Here are the two running full steam ahead:

Some things worth to mention:

  • The host being used here is a P1 133MHz, a bit unfair comparing that to a 40MHz i860 – OTOH they seem quite comparable when it comes down to Mandelbrot crunching speed.
  • To calculate the Mandelbrot the same speed as it took the Pentium (~15s) I needed five T800-20 according to my benchmarks.
  • To even achieve the 8.2s of the i860 I had to run 9(!) T800-20 in parallel.
  • A i486DX/33 took 66 sec to do the same (8.25 times slower!), while it still took 34s for a i486DX2/66!

So while all that moaning about the bad ‘programmability’ and slow context-changes of the i860 are completely correct, in certain tasks that CPU was indeed a real screamer!

Run Forest, run!

After having figured out the basics, it was time to prove the concept. First, I needed a (simple) program to run on the NumberSmasher. Here it is, reeeeally simple. It’s just an excerpt of the original 77 assembly lines. If you’re interested in the whole thing, it’s here.

The main-loop is just this. Read a byte from the C012 link, add 1 to it and write it back to the C012:

mov    iobase, %r4

loop:
  call    getlink        # (watch following delay slot!)
   shl    %r0, %r4, %r16    # mov r4, r16 - save base address

  addu    1, %r16, %r17    # add 1 and move into r17

  call    putlink        # (watch following delay slot!)
   shl    %r0, %r4, %r16    # mov r4, r16 - save base address

  br        loop
nop

What a task for a “Cray on a chip”! 😉 Ok, putting this into the assembler (currently gnu-as on Linux), linking and finally making it a pure binary with ‘objcopy -O binary hb_test.cof hb_test.bin‘ I got this binary.

How do I get it into the NumberSmasher? Again, that required a bit of coding… say Hi to nc_load.exe.
This little tool loads a pure i860-binary, pokes it into the NumberSmashers RAM and optionally starts it afterwards, ie. ‘nc_load hb_test.bin 20 start‘ loads my test program to address 0x14 and starts it from there.
[Having read the previous post, you should know that you could omit the ‘start’ parameter and just poke 20 to address 0 to start the code]

Test, test, one, two, three…

And now the exciting part: Does it work? The easiest way to test this is good ol’ debug again:

C:\> debug
-o 151 41
-i 150
42

Yay! If this isn’t proving the sense of life, what else!?!? 😉 Ok, what happened here is simple. I wrote 41 to port 151 on which the C012 is listening for input, then I read from port 150 which is the result of adding 1 to the input. Quod erat demonstandum. Program is running successfully!

Be aware that after starting a programm, the NumberSmasher is continously running that code, i.e. peek & poke do not work anymore because the Boot-ROM is out of the game.
You have to reset the NC which is done in classic INMOS-style, i.e. sending a zero to port 0x160. Thats ‘-o 160 0’ in debug. NB: Resetting the NC does not clear its RAM. The previously uploaded program is still available and can be re-started by pokeing the start-address to 0.