Tag Archives: reverse engineer

The Number Cruncher – FPU for the Apple II

June 23, 2021 Axel 3 Comments

This is a post I’d like to write since 2006… so 15 years after I’ve put all the Transputer, MIPS, i860 and-what-not stuff aside and made some room for my other (late) love: The Apple IIgs

You might have stumbled about my post/project connecting a Transputer to the Apple II called the T2A2… well, what I did not mention was, that the inspiration to this came from another card built in 1988, called the FPE made by Innovative Systems. That’s a small card featuring just a buffer, an old XILINX FPGA (actually the first of its kind) and the Motorola 68881 floating point co-processor.

Co-pro-ces-sor! I was hooked! 😯

After more research I learned that the FPE was actually a diva like this newsgroup post says:

“The FPE is suffering from a major problem, namely the coproc is crashing internally and has to be reset in software. This happens in a non deterministic way, and software written for that engineering junk must be adapted to that. “

But a bit later there was something better available: The Number Cruncher (NC for short). Fine German engineering 😉 And the newsgroup post was quite nice to it:
“The Number Cruncher is compatible with the FPE but is actually what the FPE was supposed to be – a math coproc that works. It perfoms very well.”

This is the “marketing blurb” from back then

totally compatible with the FPE from Innovative Systems
much less sensible to heat, voltage problems etc.
supports the FPE SANE patch for speeding up any program that does floating point calculations
you can compile ORCA/Pascal and C programs to use it directly with the special floatlib for the FPE provided by The Byte Works
works in any slot (slot 3 & 4 without need for setting it to “Your Card”)
works with TransWarp GS and Zip GS accelerators and RamFAST SCSI card
comes with lots of mathematical software and an enhanced SANE patch by Albert Chin-A-Young

I directly contacted the creators of the NC, Dirk and Andreas, but both of them hadn’t had a tiny bit of the NC left. No schematic, no HDL, no nothing 🙁

During looooong and wild eBay raids I managed to get my hands onto an FPE as well as on an NumberCruncher. Woohoo! They both work with the same driver and yes, the NC is much more robust than the FPE.
But it seems that there are only a handful of them survived, if anymore at all. So I thought it might be interesting for me and others to make a re-release…

Let’s re-animate the Number Cruncher!

So, again, it’s up to me to save the world… reverse-engineering time again!
This job consists of two parts – dissect the hardware and to get a better understanding of this, the software/driver, too.

Hardware

Well, you could simply rebuild the card, copy the bitstream from the serial PROM for the FPGA and you’re done.
But it’s not so easy, as the FPGA which has been used, the XILINX XC2064 is long time EOL’ed and if you buy old stock, they’re more expensive than a comparably recent CPLD – if they’re not Chinese fakes and/or broken… not mentioning that a simple copy-paste job is not really manly 😉

The XC2064 is a 5V FPGA (the first of its kind actually), has 600-1000 logical gates and 58 user IO pins of which 32 are used. It is getting its bitstream fed by an external 12K serial PROM. I’ve pulled the bitstream in a file available here, but because XILINX never documented that format, it’s pretty much useless.

The rest is all standard stuff. A ‘245 transceiver, an oscillator (~12MHz) and some chicken feed.

Revving it up

To prove that I got everything right and having a better ~~attack~~ analysis vector I designed the “NumberCruncher Reloaded v0.1“… more or less a 1:1 clone but bringing out all FPGA signals to pin-headers to have a convenient access for my logic analyzer.
Also I used a PLCC version of the 68881, because I have more of those.

Then, out went the logic analyzer and weeks and weeks of looking/listening to the the FPGAs conversation to the the FPU and Apple bus while having the 68881 manual on my lap, I got an idea of what’s going on.

Software

Later I found the original floppy which came with the NumberCruncher containing just the init-files for GS/OS patching SANE to use the FPU instead of the 65c816 (download the ShrinkIt archive here) – well, at least something!

Next came my ol’ buddy Mr. disassembler… a lot. About a year on-and-off… and here’s my initial, slightly commented disassembly of the SANE patch.

Again, some years down the road I found the floppies which were delivered with the FPE and those are quite helpful with code examples, assembler macros and libs etc. – that would have been quite helpful during the disassembly 😕 Anyhow, it’s good to see that most of my interpretations were correct.
Also, the FPE came with quite a nice manual – which actually was more valuable than the card 😉
It very well describes the basic functionality, the FPU registers used and the programming side of things – even the most important parts from the 68881 manual are cited. Very good job on that Innovative Systems!

Having all this at my hands I was prepared to start and after some huffing and puffing the NumberCruncher Reloaded v.1.0 was born.

68000, Apple 68k, Carrera040

Carrera in an SE/30 – the code part 3

April 11, 2019 Axel Leave a comment

Ahh, back in cosy main: – looks much easier now after that crazy MMU stuff in the previous part, right?

The next subroutine called is proc32. In the complete source code (reminder: Available at GitHub) I commented that with “works (get some RSC strings)“… and well, that sums it up pretty good.
proc32 loads (i.e. creates handles) from the resource-fork, e.g. the icons used in the menu-bar and several error-messages like “This application must run on the 68030 processor, please quit all other 68040 applications and re-run this application.“. That’s it. Boring…

That boredom instantly changes when we get to the next subroutine proc43located at 0x29DA…

I did it my way…

One fascinating thing about classic Mac OS is how easy it is to patch system calls, aka Toolbox traps. For example in the previous post we came about _BlockMove, which is a Toolbox call to copy an amount of RAM from A to B.
For example you have just read this article about a faster BlockMove method, you’re totally free to patch (read: replace) _BlockMove with your speedier version and automatically use this throughout your application – or even system-wide, if you’ve created an INIT… [If you want to know all about it… here’s a book for you]

And that’s what proc43 heavily does. Because it’s a long subroutine (230 lines) so I will give you just one example – the inline comments should do…

2BE2:        MOVE    #$A02E,D0     ; BlockMove
2BE6:        _GetTrapAddress newOS ; (D0/trapNum:Word):A0\ProcPtr 
2BE8:        MOVE.L  A0,$270(A5)   ; oldBlockmove
2BEC:        LEA     data42,A0     ; myBlockMove
2BF0:        TST.B   MMU32bit      ; loMem global "current address mode"
2BF4:        BNE.S   lae_70        ; skip if 32bit clean machine else
2BF6:        LEA     data43,A0     ; use a different entry for dirty machines
2BFA: lae_70 MOVE.L  A0,$274(A5)   ; save routine pointer to $274(A5)	
2BFE:        LEA     data41,A0     ; DC.L 0000 0000
2C02:        MOVE.L  $270(A5),(A0) ; save oldBlockmove vector into there
2C06:        MOVE.L  #$A02E,D0     ; BlockMove
2C0C:        LEA     data40,A0     ; aaaand replace it by myBlockmove
2C10:        _SetTrapAddress newOS; (A0/trapAddr:ProcPtr; D0/trapNum:Word)

This is the sum up what else being done:

Save all debugger vectors into A5-world locations (suspicious. I sense Macsbug killing…)
Load the PACK4 resource, that’s the Floating Point emulation package (aka SANE) if no FPU found
Check & read several system Gestalt codes into A5-world (0x2AAC-0x2B44)
Patch several Toolbox traps
- SwapMMUMode replaced by data19
- VM_Displatch by data22
- Pack4 by data10
- Pack5 by data11
- BlockMove by data40
- jClearCache by myClearCache
- GetNextEvent by myGetNextEvent
- GetResource by myGetResource
- SCSIdispatch by mySCSIdispatch
- DrawMenuBar by myDrawMB
- LoadSeg by data31
- UnLoadSeg by data32
- HWPriv by data33
- vStdExit by data34

So far, so many. Then there’s some RAM copying going on, of which I’m currently not quite sure what it is good for (0x2CAC-0x2CD8) 💡 .

Finally, the myShutdown routine is installed into the Shutdown Manager, i.e. it will executed before the Mac is powered down/restarting (it simply switches the host back to its own 68030). After that, RTS into main…

“There and back again…”

Barely back in main, a JSR 12(A6) warps us into MacII_4th, the last of the four handlers every supported system has.

This loads specific data from the FPSP into RAM (namely IDs 0x12C and 0x12D).
Finally a special floppy driver is installed (myFloppyDrvr @ 0x954) which IMHO just differs from the original in handling the ‘040 caches correctly. That was that and back to main…

The next sub-routine in line is chkATalkVer. I can rightfully name that routine because it’s short and crystal clear: Figure out if AppleTalk is installed, and if true, return its version in D0 (and also write it into A5-world). C’est ca…

This is the end…

It’s getting ugly (for now)… proc42 will be called – the last subroutine in main before my SE/30 crashes and burns 😥

The first few lines (0x28F4-0x293C) are comparably harmless. They are working around a bug in System 7.1 which was corrected in 2/17/92 according to some dark sources (“Corrected value of timeSCSIDB from 0DA6 to 0B24”).
After that, proc38 (0x293C) is called which again calls proc39 and something’s done with the TimeManager, not really sure what’s exactly going on, but it feels like a timing-benchmark heavily using InsTime, PrimeTime and RmvTime Toolbox calls.

[hold yer breath] Then we’re getting closer to the flat line… The stack is filled with these parameters:

2940:   CLR.L   -(A7)       ;PUSH.L 00000000 
2942:   CLR.L   -(A7)       ;PUSH.L 00000000 
2944:   CLR.L   -(A7)       ;PUSH.L 00000000 
2946:   PUSH.L  #$80008000  ;       80008000
294C:   CLR.L   -(A7)       ;PUSH.L 00000000 
294E:   PUSH.L  #64         ;       00000040
2954:   PUSH.L  #1          ;       00000001

and SpeedProc is called…

…To be continued 😉

P.S: I changed course (again) and started to investigate more into the C040’s hardware. The more I understand of the INIT/CP workings the more I can’t fight the idea that it really might be a hardware timing issue.

68000, Apple 68k, Carrera040

Carrera in an SE/30 – the code part 2

January 29, 2019 Axel 6 Comments

3rd handler

Next up is the 3rd handler, MacII_3rd: (0x3F94) in our case. Actually it’s called with JSR 8(A6), but that’s an 8 byte offset to the ‘base-address’ of any handler. Clever stuff, huh (Google for ‘pointer-table’)?

This subroutine contains serious magic and was a real hard nut to crack. Especially because it tricked me into believing that I’ve found the ‘crashsite’… which, to spoil the tension, isn’t.
It just kept on killing Macsbug, because it’s so low-level.

What this routine does is replacing the Vector Base Register (VBR) which ‘lives’ at address 0x00000000. Evil stuff.

After disabling interrupts and switching to 32bit-mode a field with 6 long-words (data107) will be populated with data generated in other routines.
For now I can only guess what these entries are (Values from my SE/30 given in brackets). We’ll discuss all that further down.
0x3FC6 to 0x3FD8 calculates the size of the chunk of code starting at data106 (0x4008) to the beginning of MacII_4th (i.e. the end of Mac_3rd), which is 180 bytes.
Using this length, the routine first saves the current VBR onto the stack using the system call _BlockMove.
Then the original VBR (+some more) will be replaced by the new version beginning at data106. (Killing Macsbug – more on that later)
BSR 53_cmd_1x is been called. This brings the Carrera040 into life most likely using the just copied VBR (This is discussed in much detail further down).
Now the contents of the stack (= copy of the original VBR) will be copied back into its place, this time using a classic DBRA loop (0x3FF4). My guess, no Toolbox call possible at the moment.
Adjust the stack, back to 16bit mode, restore Registers and return-from-subroutiene. Done.

Here’s the code doing all this:

3F94:MacII_3rd: MOVE    SR,-(A7)     ; 3rd call from MacII handler
3F96:           ORI     #$700,SR       ; Set bit 9-11 of SR (disable Interrupts)
3F9A:           MOVEM.L D0-D2/A0-A2,-(A7)
3F9E:           MOVEQ   #1,D0
3FA0:           _SwapMMUMode  
3FA2:           PUSH.B  D0
3FA4:           SUBA.L  A2,A2         ; faster movea.l #0,a2
3FA6:           LEA     data107,A0    ; Filling the data into the 6x32 field
3FAA:           MOVE.L  96(A5),D0
3FAE:           MOVE.L  D0,(A0)+      ; SE30: 9FE00
3FB0:           LEA     data69,A1
3FB4:           MOVE.L  A1,(A0)+      ; SE30: 9D6E2 (User/Supervisor Rootpointer?)
3FB6:           MOVE.L  $64(A5),(A0)+ ; 807FC040
3FBA:           MOVE.L  $6C(A5),(A0)+ ; 807FC040
3FBE:           MOVE.L  $68(A5),(A0)+ ; 00000000
3FC2:           MOVE.L  $70(A5),(A0)+ ; 00000000
3FC6:           LEA     MacII_4th,A0
3FCA:           MOVE.L  A0,D2
3FCC:           LEA     data106,A0
3FD0:           SUB.L   A0,D2         ; 'distance' from data106 to MacII_4th
3FD2:           SUBA.L  D2,A7
3FD4:           MOVEA.L A2,A0
3FD6:           MOVEA.L A7,A1
	  	; save the current VBR to the stack
3FD8:           MOVE.L  D2,D0
	  	; A0 = SE30: 00000000 (src)  - IIci: $FBB08000
	  	; A1 = SE30: 027ff34c (dest) - IIci: $3BF9FC6
	  	; D0 = B4    (count)   - SAME on the IIci!    
3FDA:           _BlockMove ; (A0/srcPtr,A1/destPtr:Ptr; D0/byteCount:Size) 
	  	; write my own VBR...
                ; This copies 180 bytes into 0x000000000 replacing the original VBR. 
                ; ... and kills Macsbug if not circumvented properly.
3FDC:           LEA     data106,A0
3FE0:           MOVEA.L A2,A1
3FE2:           MOVE.L  D2,D0
	  	; A0 = 9F900 (src)   - IIci 10C4EA (data88)    
	  	; A1 = 00000 (dest)  - IIci FBB08000
	  	; D0 = B4    (count) - IIci same  
3FE4:           _BlockMove ; (A0/srcPtr,A1/destPtr:Ptr; D0/byteCount:Size) 
3FE6:           BSR     53_cmd_1x  ; Bring the C040 to life
3FEA:           MOVEA.L A7,A0  ; SP to A0
3FEC:           MOVEA.L A2,A1  ; SE30: 00000000
3FEE:           MOVE.L  D2,D0  ; the code length (B4 again)
3FF0:           BRA.S   lae_163
3FF2:   lae_162 MOVE.B  (A0)+,(A1)+ ; Write the VBR back from the stack
3FF4:   lae_163 DBRA    D0,lae_162
3FF8:           ADDA.L  D2,A7 ; adjust the stack
3FFA:           POP.B   D0
3FFC:           _SwapMMUMode  
3FFE:           MOVEM.L (A7)+,D0-D2/A0-A2
4002:           MOVE    (A7)+,SR
4004:           MOVEQ   #0,D0
4006:           RTS     
 
; Start of VBR replacement- and 040-Code being copied to 0x0 the by line 0x3FE4 
; /if/ theses are the Vectors 0-17, then their meaning would be:
 
4008: data106:  DC.L    #$00001000 ; Reset initial Stack Pointer
400C:           DC.L    #$00000050 ; Reset initial Program Counter
; - ALL of these Vectors point to addr 4050 (offset 0x48) -
4010:           DC.L    #$00000048 ; Buserror  
4014:           DC.L    #$00000048 ; Adress Error
4018:           DC.L    #$00000048 ; Illegal Instruction
401C:           DC.L    #$00000048 ; Zero Divide
4020:           DC.L    #$00000048 ; CHK, CHK2 instruction
4024:           DC.L    #$00000048 ; cpTRAPcc, TRAPcc, TRAPV instruction
4028:           DC.L    #$00000048 ; Privilige Violation
402C:           DC.L    #$00000048 ; Trace
4030:           DC.L    #$00000048 ; LINE 1010 Emulation
4034:           DC.L    #$00000048 ; LINE 1111 Emulation
 
; THESE are definitely no vectors, they are dynamically written by the code above
; and to be used to setup the 040 MMU registers.
4038: data107:  DC.L    #$0009FE00 ;  
403C:           DC.L    #$0009D6E2; 
4040:           DC.L    #$807FC040 ; 
4044:           DC.L    #$807FC040 ; SE30: 
4048:           DC.L    #$00000000 ; SE30: 00000000 
404C:           DC.L    #$00000000 ; SE30: 00000000 
 
4050:           CLR.L   $53000000  ; Poke 0 to $53000000
4056:           BRA     lae_164    ; This points to itself... I'm lost at the moment.
4058:           LEA     data107,A0 ; SE30: 9F900
405C:           MOVE.L  (A0)+,D1   ; SE30: 0009FE00 (User/Supervisor Rootpointer)
405E:           MOVEA.L (A0)+,A1   ; 0009D6E2
4060:           MOVE.L  (A0)+,D4   ; 807FC040
4062:           MOVE.L  (A0)+,D5   ; 807FC040
4064:           MOVE.L  (A0)+,D6   ; 00000000
4066:           MOVE.L  (A0)+,D7   ; 00000000
4068:           MOVE.L  #$C000,D0
406E$           MOVEC   D0,ITT0   ; Set Instruction Transparent Translation
4072$           MOVEC   D0,DTT0   ; Set Data Transparent Translation
4076$           MOVEC   D1,SRP    ; Set Supervisor Rootpointer
407A$           MOVEC   D1,URP    ; Set User Rootpointer
407E:           MOVE.L  #$C000,D0
4084$           PFLUSHA           ; Invalidates all entries in the address translation cache
4086$           MOVEC   D0,TC 
408A:           LEA     data108,A0
408E:           ADDA.L  #$53002000,A0 ; (=0x530A1900)
4094:           JMP     (A0)      ; JuMP to data108 code (below) in C040 RAM range?     
 
4096:  data108: MOVEQ   #0,D0
4098$           MOVEC   D0,ITT0 
409C$           MOVEC   D0,DTT0 
40A0$           MOVEC   D4,ITT0 
40A4$           MOVEC   D5,DTT0 
40A8$           MOVEC   D6,ITT1 
40AC$           MOVEC   D7,DTT1 
40B0$           CINVA   BC
40B2:           NOP     
40B4:           MOVEQ   #0,D0
40B6$           MOVEC   D0,CACR
40BA:           JMP     (A1)    ; 0009D6E2
 
; END 040 Code being copied to somewhere by line 3FE4 
40BC: MacII_4th: MOVEM.L D1-D7/A0-A4,-(A7)  ; 4th subroutine called my MacII_handler
[...]

The Vector Base Register

I wasn’t precise when I initially said “replacing the VBR”. What actually happens is that this routine uses what I’d call an interim-VBR for the moment it initializes the 68040 on the C040. You’ve probably saw the link referring to what the VBR is in the 1st post of this series, but let me go a bit more into detail.

The VBR is a list of addresses (aka vectors) the CPU refers to in case of an exception – and this is true for every 68k system out there, e.g. Mac, SUN, NeXT, Amiga or Atari. Some of them might do some relocation using their MMU, but even the virtual address will be 0x00000000 and the order is the same. There are 16 basic vectors as listed here:

If for example a divide-by-zero happens, the CPU would call a handler which address is stored in 0x14. Pretty simple.
So let’s have a look what MacII_3rd left in the VBR (and below that) when the ‘interim VBR’ is in place:

0000: data106:  DC.L    #$00001000 ; Reset initial Stack Pointer
0004:           DC.L    #$00000050 ; Reset initial Program Counter
     ; - ALL of these Vectors point to addr 0x48 -
0008:           DC.L    #$00000048 ; Buserror  
000C:           DC.L    #$00000048 ; Adress Error
0010:           DC.L    #$00000048 ; Illegal Instruction
0014:           DC.L    #$00000048 ; Zero Divide
0018:           DC.L    #$00000048 ; CHK, CHK2 instruction
001C:           DC.L    #$00000048 ; cpTRAPcc, TRAPcc, TRAPV instruction
0020:           DC.L    #$00000048 ; Privilige Violation
0024:           DC.L    #$00000048 ; Trace
0028:           DC.L    #$00000048 ; LINE 1010 Emulation
002C:           DC.L    #$00000048 ; LINE 1111 Emulation
     ; - THESE are definitely no vectors, they are dynamically written by the 
     ;   code above and to be used to setup the 040 MMU registers.
0030:  data107: DC.L    #$00000000 ; SE30: 0009FE00  (12)
0034:           DC.L    #$00000000 ; SE30: 0009D6E2  (13)
0038:           DC.L    #$00000000 ; SE30: 807FC040  (14) 
003C:           DC.L    #$00000000 ; SE30: 807FC040  (15)
0040:           DC.L    #$00000000 ; SE30: 00000000 
0044:           DC.L    #$00000000 ; SE30: 00000000 
 
0048:           CLR.L   $53000000  ; Poke 0 to $53000000 ; C040 off
004C:  blocker3 BRA     blocker3    ; Points to itself... probably a "blocker"
0050:           LEA     data107,A0 ; initial Program Counter (SE30: 9F900)
0054:           MOVE.L  (A0)+,D1   ; SE30: 0009FE00 (User/Supervisor Rootpointer)
0058:           MOVEA.L (A0)+,A1   ; 0009D6E2
005C:           MOVE.L  (A0)+,D4   ; 807FC040
0060:           MOVE.L  (A0)+,D5   ; 807FC040
0064:           MOVE.L  (A0)+,D6   ; 00000000
0068:           MOVE.L  (A0)+,D7   ; 00000000
006C:           MOVE.L  #$C000,D0
0070:           MOVEC   D0,ITT0    ; Set Instruction Transparent Translation
0074:           MOVEC   D0,DTT0    ; Set Data Transparent Translation
0078:           MOVEC   D1,SRP     ; Set Supervisor Rootpointer
007C:           MOVEC   D1,URP     ; Set User Rootpointer
0080:           MOVE.L  #$C000,D0
0084:           PFLUSHA            ; Invalidates all entries in the address translation cache
0088:           MOVEC   D0,TC 
008C:           LEA     data108,A0
0090:           ADDA.L  #$53002000,A0 ; (=0x530A1900)
0094:           JMP     (A0)       ; JuMP to data108 code (below) in C040 RAM range? 
 
009C:  data108: MOVEQ   #0,D0
00A0:           MOVEC   D0,ITT0 ; 0
00A4:           MOVEC   D0,DTT0 ; 0
00A8:           MOVEC   D4,ITT0 ; 807FC040
00AC:           MOVEC   D5,DTT0 ; 807FC040
00B0:           MOVEC   D6,ITT1 ; 00000000
00B4:           MOVEC   D7,DTT1 ; 00000000
00B8:           CINVA   BC
00BC:           NOP     
00C0:           MOVEQ   #0,D0
00C4:           MOVEC   D0,CACR
00C8:           JMP     (A1)    ; 0009D6E2

Farewell, old friend

At this point, my SE/30 always froze and I thought this must be the point where to find incompatibilities between the IIci and SE/30.
But after understanding, what’s really going on, it was clear that overwriting the TRAP exception (Nr.7), Macsbug was simply kicked out of the game as this exception is triggered after every step/trace you do in a debugger…
So to get beyond this point, I had to modify the program counter to skip the point where TRAP is copied-over… which is done inside the Toolbox’ _BlockMove call. So I had to single-step into that and find the right call/time to do a ‘pc=pc+2’ 😉 (Good thing you can define a macro for that).

Okayyyyy. After that’s been written, 53_cmd_1x is called, presumably telling the C040 to come to life.
And keen as it is, it’ll look up the “Reset initial Program Counter” (VBR: 0xC) and starts executing code from 0x50. Any other occurring exception will call the ‘handler’ at 0x48, simply switching the C004 off and sit in an endless loop (0x4C) – probably making the 68030 to take over again.

EmEmYou!

Given everything’s fine, the code at 0x50 will start reading the previously populated data from data107 into several registers.
Then some serious 68040 MMU table setup happens – so this is some kind of ‘040 initialization routine… and the ‘040 is actually running. Woohoo!

Time for some special register explanation:
As we all know, the 68040 has two in-build 4k caches and an MMU. The latter can be programmed how and what to cache. This is defined in 4 registers of which only 2 are of interest here: ITT0 and DTT0, the Instruction and Data Transparent Translation registers, both sharing the same bit-fields following this pattern:

BBBBBBBBMMMMMMMMESS000UU0CC00W00

B – Logical Address Base – compared with address bits A31-A24. Addresses that match in this comparison are transparently translated
M – Logical Address Mask – setting a bit in this field causes corresponding bit in Base field to be ignored
E – Enable Bit – 1 – translation enabled; 0 – disabled
S – Supervisor Mode – 00 – match only in user mode 01 – match only in supervisor mode 1x – ignore mode when matching
U – User Page Attributes – ignored by 040
C – Cache mode – 00 – Cacheable, Write-through 01 – Cacheable, Copyback 10 – Noncacheable, Serialized 11 – Noncacheable
W – Write protect – 0 – write permitted; 1 – write disabled

Here’s an example:

807FC040 = 10000000011111111100000001000000 
           BBBBBBBBMMMMMMMMESS000UU0CC00W00

which means:

a bit less than 2GB transparently translated (2032MB)
translation enabled
Supervisor Mode: ignore mode when matching
Cache mode: Noncacheable, Serialized
Write permitted

So let’s have a look at the code again:

At 0x70/0x74 the MMU is set to 0xC000, i.e. Enable translation, apply for user & supervisor mode, write-though cache, for logical address space 0x80000000-0x00ffffff (2GB minus the bottom 16MB).
Then Supervisor & User Rootpointer are set to 0x9FE00, then the address translation cache is flushed to finally set the Translation Control register to Enable & 8K page size (0x88)… up to here this was pretty much ‘by the book’ of how to set-up MMU tables.

Having its MMU all set, the 68040 now gets something to chew on:
The address of data108 is added to 0x53002000 and jumped to!
💡 Does 0x53002000 equal 0x00000000 for the C040?

Let’s assume the C040 executes the code at data108 for now. That is:

Clear the ITT/DTT registers
Set the MMU to 0x807FC040 (see decoding example above)
invalidate caches and wait’a’NOP to have that happened
then disable all caches
and jump to where A1 points to. In my SE/30 that’s 0x9D6E2, previously loaded from data107 in 0x58

Writing all this from the top of my head, I’m not 100% sure where this address is pointing to. I must be somewhat back into MacII_3rd (0x3FEA), because this is where the program execution resumes (Need to check this with Macsbug and will update).

For now, I’m tempted to call MacII_3rd something like ‘C040_MMU_setup‘… but I’d love to have this confirmed 💡 by somebody who knows more than me 😉

Next up will be continuing working further through the main: procedure again… so move over here.

68000, Apple 68k, Carrera040

Carrera in an SE/30 – the code part 1

January 28, 2019 Axel 2 Comments

The disassembled code of the Micromac Carrera 040 control panel is quite big: 6000+ lines of 68030/40 assembly…

While these posts might be entertaining and giving you an insight into classic MacOS driver code, they are also meant as a notebook to myself to get into the source quickly – especially after some weeks or months of distraction 😉

That said, I will not discuss each and every line of code. There are many parts which aren’t important (for now) or just not reached yet.
Still, it will take several parts/chapters to cover everything I worked on.

The complete code is available over here on GitHub and will updated every time I’m working on it.
Whenever I’m mentioning addresses I’m referring to this code on GitHub. NB: I will never use line-numbers as these might change during editing the source.
Also, when you’ll see a light-bulb 💡 somewhere, this is where I’m not sure and happy about enlightenment or comments from you 😉

This article is ~~totally work-in-progress~~ closed.
~~E.g. every now and then my theories about what a certain code does changes, I learn new things and all the sudden whole blocks of code make sense… so this post will change/grow, too.~~Bolle did a lot of hardware research and in the end it became clear that the INIT/Driver has nothing to do with the non-function of the Carrera in an SE/30. After Bolle modified his adapter, the Carrera 040 is happily running in my ’30 now.But still, this series of posts is definitely worth reading, especially if you’re into reverse engineering 68k assembly code.

Approaching… difficulties.

What’s the main job of this code? From a 30000ft perspective the simple answer is “switching the Carrera040 on and off”, i.e. toggling between the hosts slower on-board 68030 and the insanely fast 68040 on the C040. At boot-time… as well as during the system is running (by user interaction).

Sounds pretty simple, huh? Lowering our flight altitude to 3000ft more things come into play:
Identify the hosting Macintosh. As mentioned in the previous chapter, the C040 was able to run in a Mac II, IIx, IIcx, IIvx, IIvi, IIvm, IIsi, IIci, LC and LCII… all of them different in many places. These differences have to be handled…
Down at 30ft we have to admit that there are differences between a 68030 and his younger brother 68040, mainly concerning caches, FPU and the MMU.
Finally hitting the ground, it’s becoming clear that it is everything but trivial to halt a running processor, save its complete context and start another (slightly different) processor with that. And back again…

Some given things before we start:

We will concentrate on the IIx “branch” as this machine is closest to the SE/30 like not-32bit-clean, memory-map, the GLUE chip, two real VIAs with the same register layout etc.
I learned from the code that the C040 is memory-mapped at 0x53000000 in some of the supported models, especially the IIx and IIci. This means 32bit addressing is a must (-> need “mode32” INIT or clean ROM)
I tried to comment as much as possible/understood inline (i.e in the code) – a good bit of 68k machine language knowledge is still required 😉
If something needs more explanation, I’ll try to provide this before the code quote or afterwards.

So this is the main routine (at 0x21FC):

main     MOVEM.L A4-A6,-(A7)
         MOVE.L  D0,D7
         MOVE.L  #$31E,D0    ; need 798 bytes
         _NewPtr ,CL_SY      ; allocate requested amount of memory (D0) in system
                             ; heap (returned in A0) and initialize to zeroes
         TST     D0          ; success?
         BNE.S   lae_6       ; nope, exit.
         LEA     data2,A1    ; else
         MOVE.L  A0,(A1) ; Init A5 world and save into data2
         MOVEA.L A0,A5
         MOVE    #$A89F,D0   ; UnimplTrap
         _GetTrapAddress     ; (D0/trapNum:Word):A0\ProcPtr 
         MOVE.L  A0,$29C(A5) ; save the trap addr into 2 places 
         MOVE.L  A0,$2A0(A5) ; in the A5 world
         BSR     sysDetect   ; Jump to Machine detection routine 
         BNE.S   lae_6       ; success?
         MOVE.L  D7,D0
         JSR     4(A6)       ; We jump to the subroutine set in the detection routine
                             ; for the second time, this time offset 4...
			     ; i.e. we skip the 1st 'BRA' there
         BNE.S   lae_6       ; success?
         BSR     instFPSP    ; Install Motos FPSP
         BNE.S   lae_6       ; success?
         JSR     8(A6)       ; That's the 3rd call in the handler call cascade (needs hack for MacsBug!)
         BNE.S   lae_6       ; success?
         BSR     proc32      ; works (get some RSC strings)
         BSR     proc43      ; install traps
         BNE.S   lae_6       ; success?
         JSR     12(A6)      ; That's the 4th call in the handler call cascade
         BNE.S   lae_6       ; success?
         BSR     proc41      ; works (atalk?)
         BSR     proc42      ; VIA stuff and such - BOOM
         BSR     proc29
         MOVEM.L (A7)+,A4-A6
         MOVEQ   #0,D0

As you can see, there are 10 calls to subroutines- currently it crashes inside the 8th subroutine, currently called proc42… But let’s check these subroutines one by one.

sysDetect

This is the subroutine I had to “patch” to initially make the driver work with an SE/30. It starts at 0x2022 and does these things:

Check if the ‘Gestalt‘ trap is available at all (very good style!) else throw an error
If it is, read the machines Gestalt code into D0, throw an error if zero
Decide which ‘handler’ to choose given the Gestalt code.

Based on their Gestalt codes there are four groups of Macs defined in the following lines (0x204C – 0x20AC):

Mac II/IIx/IIcx — “dirty Macs”, not 32bit clean, no PDS
- “Expansion I/O Space” from 0x51000000 to 0x5FFFFFFF
- the C040 installs with an adapter right into the CPU socket in the II/IIx/IIcx
- SE/30 is also “dirty”, need mode32 or IIsi ROM in slot
- these machines also use the GLUE chip to emulate the VIA2 like the SE/30
Mac IIvx, IIvi, IIvm — special kind of PDS slot
- there’s no mentioning of support on the MicroMac page
Mac IIsi, IIci
- Kind of interesting because the si has the same PDS slot like the SE/30
- Uses the RBV (Ram Based Video) controller which emulates the VIA2
- Therefore totally different memory layout (VRAM at 0x00000000 mapped by the MMU etc.)
Mac LC, LCII, Color Classic
- These share the same LC-PDS slot

If your Mac is one of those (or patched at 0x2058), you’ll branch into sys_check: (0x20BA) which will make sure you run at least System 6.0.5, have virtual memory switched off and jumps into the selected handler code (address saved in A6) at 0x20EA for the first time.

Here’s the code of what’s discussed above:

2022: sysDetect: MOVE.L  #$A0AD,D0  ; Gestalt
2028:         _GetTrapAddress newOS ; (D0/trapNum:Word):A0\ProcPtr 
202A:         MOVE.L  A0,D2
202C:         MOVE.L  #$A89F,D0     ; UnimplTrap
2032:         _GetTrapAddress newTool; (D0/trapNum:Word):A0\ProcPtr 
2034:         CMP.L   A0,D2
2036:         BEQ     OS_bad
203A:         MOVE.L  #'mach',D0
2040:         _Gestalt ; (A0/selector:OSType):D0\OSErr 
2042:         BNE     bad_conf      ; If we can't read it, fire general Error Msg
2046:         MOVE.L  A0,D0
2048:         MOVE.L  D0,2(A5)
 
; Check for several Mac models which are grouped into 3, each having its own handler routine. 
; 1) Mac II/IIx/IIcx 
; 2) IIvx, IIvi, IIvm 
; 3) IIsi, IIci     
; 4) LC, LCII, Color Classic
 
204C:         LEA     MacII_handler,A6   ; -- The dirty gang
2050:         CMPI.L  #6,D0        ; MacII 
2056:         BEQ.S   sys_check
2058:         CMPI.L  #7,D0        ; MacIIx - we replace this by the SE/30 #9
205E:         BEQ.S   sys_check
2060:         CMPI.L  #8,D0        ; IIcx
2066:         BEQ.S   sys_check
 
2068:         LEA     V_handler,A6   ; -- The "V" Macs.
206C:         CMPI.L  #48,D0       ; IIvx
2072:         BEQ.S   sys_check
2074:         CMPI.L  #44,D0       ; IIvi
207A:         BEQ.S   sys_check
207C:         CMPI.L  #45,D0       ; IIvm
2082:         BEQ.S   sys_check
 
2084:         LEA     IIci_handler,A6   ; -- IIci and IIsi
                                        ; BOTH share the same "Expansion I/O Space" (0x5300 0000)
2088:         CMPI.L  #11,D0       ; IIci
208E:         BEQ.S   sys_check
2090:         CMPI.L  #18,D0       ; IIsi 
2096:         BEQ.S   sys_check
 
2098:         LEA     LC_handler,A6    ; -- The LC-PDS family
209C:         CMPI.L  #19,D0       ; LC
20A2:         BEQ.S   sys_check
20A4:         CMPI.L  #37,D0       ; LCII
20AA:         BEQ.S   sys_check
20AC:         CMPI.L  #49,D0       ; Color Classic
20B2:         BEQ.S   sys_check
 
; Any other Model/Gestalt will bring up an error alert-box 
 
20B4:         MOVE    #$1B5B,D0    ; "Carrera040 does not support this Macintosh model."
20B8:         BRA.S   RET_err      ; -&gt; "TST     D0 &amp; RTS"
 
; We found a supported model, so keep on going checking for the OS version...
 
20BA: sys_check: MOVE.L  #'sysv',D0    ; Check OS version
20C0:         _Gestalt              ; (A0/selector:OSType):D0\OSErr 
20C2:         BNE.S   bad_conf      ; If we can't read it, fire general Error Msg
20C4:         MOVE.L  A0,D0
20C6:         CMPI    #$605,D0     ; System 6.0.5
20CA:         BGE.S   OS_ok        ; or greater
20CC: OS_bad: MOVE    #$1B5C,D0    ; "Carrera040 does not work with this version of the operating system."
20D0:         BRA.S   RET_err      ;
20D2: OS_ok:  MOVE.L  #'vm  ',D0   ; Check for enabled Virtual Memory
20D8:         _Gestalt             ; (A0/selector:OSType):D0\OSErr 
20DA:         BNE.S   bad_conf     ; If we can't read it, fire general Error Msg
20DC:         MOVE.L  A0,D0
20DE:         BTST    #0,D0
20E2:         BEQ.S   VM_ok
20E4:         MOVE    #$1B5D,D0    ; "Carrera040 does not work with Virtual Memory turned on. 
                                   ; Please turn off Virtual Memory in the Memory control panel and restart your Mac."
20E8:         BRA.S   RET_err
20EA: VM_ok:  JSR     (A6)         ; This is the actual HANDLER CALL, been set in $204C-$2098
20EC:         BNE.S   RET_err
20EE:         MOVE.B  34(A5),D0    ; 34(A5) seems to contanin the Jumper settings at the lowest 3 bits and only three of them are valid:
20F2:         CMPI.B  #7,D0        ; 7 -&gt; 111
20F6:         BEQ.S   RET_ok
20F8:         CMPI.B  #6,D0        ; 6 -&gt; 110
20FC:         BEQ.S   RET_ok
20FE:         CMPI.B  #5,D0        ; and 5 -&gt; 101 
2102:         BEQ.S   RET_ok
2104:         MOVE    #$1B5E,D0    ; "Carrera040 does not recognize the jumper settings on the Speedster card. 
                                   ; Please check the settings against the manual.
2108:         BRA.S   RET_err
210A: RET_ok: MOVEQ   #0,D0        ; clear D0 (no errors)
210C:RET_err: TST     D0           ; Set the Z-Flag (D0 contains Err-Code) and
210E:         RTS                  ; return from Subroutine
2110:bad_conf:MOVE    #$1B5A,D0    ; "Carrera040 does not support your system configuration."
2114:         BRA     RET_err

Yes, there’s also stuff after the call to the handler, but let’s check that handler first.
As said in the beginning, I chose to take the “IIx route”. The MacII_handler code is actually just another vector jump-table which will later be used with offsets:

414E:  MacII_handler:  BRA   MacII_1st ; From II, IIx &amp; IIcx
4152:                  BRA     MacII_2nd
4156:                  BRA     MacII_3rd
415A:                  BRA     MacII_4th

Let’s have a look into the first call MacII_1st:

3D14:  MacII_1st  MOVEM.L D1-D3/A0-A2/A6,-(A7) ; 1st call from MacII handler
3D18:           PUSH.L  8
3D1C:           LEA     data105,A0
3D20:           MOVE.L  A0,8       ; Is that the Bus Error Handler at 0x00000008?
3D24:           MOVE    #$1B5F,D3  ; 7007
3D28:           MOVEQ   #1,D0
3D2A:           _SwapMMUMode  
3D2C:           PUSH.B  D0
3D2E:           MOVEA.L A7,A6
3D30:           BSR     read_5300k2
3D34:           MOVEQ   #0,D3
3D36:  data105  MOVEA.L A6,A7
3D38:           POP.B   D0
3D3A:           _SwapMMUMode  
3D3C:           POP.L   8
3D40:           MOVEQ   #0,D0   ; ?
3D42:           MOVE    D3,D0   ; overwriting?
3D44:           BNE.S   lae_153
3D46:           MOVEM.L D1-D2/A0-A2,-(A7)
3D4A:           LEA     53_cmd_0,A0
3D4E:           MOVE.L  A0,6(A5)
3D52:           LEA     53_cmd_1x,A0
3D56:           MOVE.L  A0,10(A5)
3D5A:           LEA     read_5300k2,A0
3D5E:           MOVE.L  A0,14(A5)
3D62:           LEA     53_cmd_5.3,A0
3D66:           MOVE.L  A0,18(A5)
3D6A:           LEA     53_cmd_5.1,A0
3D6E:           MOVE.L  A0,26(A5)
3D72:           LEA     53_cmd_5.3.5.1,A0
3D76:           MOVE.L  A0,22(A5)
3D7A:           MOVEM.L (A7)+,D1-D2/A0-A2
3D7E:           BSR     read_5300k2
3D82:           ANDI.B  #7,D0
3D86:           MOVE.B  D0,34(A5)
3D8A:           MOVEQ   #0,D0
3D8C:  lae_153: MOVEM.L (A7)+,D1-D3/A0-A2/A6
3D90:           TST     D0
3D92:           RTS

As you can see, even in such simple and short subroutines are some things I just don’t get? For example why is the effective address of data105 written to 0x8? Is that replacing the Error Handler in the VBR?
Anyhow, I think I got the overall meaning of the rest of it. What happens is this:

After switching into 32bit mode (_SwapMMUMode) it reads a longword from 0x53000000. As initially mentioned, the C040 is mapped to this address. There are 2 identical functions to read from there, that’s why this one here called read_5300k2.
It looks like reading is sufficient because the result (returned in D7) is immediately overwritten by a pop. Also that BNE after two moves is beyond me (0x3D40)… OTOH the rest of the code is pretty clear: It’s ‘populating’ the A5-world with subroutines I’d call 53-commands. These commands write a specific byte sequence to 0x53000000, obviously communicating with the C040. For better understanding I’ve named them e.g. 53_cmd_5.3.5.1 meaning writing 5, then 3, then 5 and finally 1 to this address.
At the end, 0x5300k is read again, this time the result is masked to the last bit and written to 34(A5) – this represents the C040 jumper-settings by the way. Return from Subroutine…

Back in sys_check: this jumper-setting will be checked immediately for three valid settings: 111, 110 or 101 representing the supported CPU types (68040,68LC040,68EC040). If the setting is ok we’re done with sysDetect:and return to main:.

2nd handler

Located at 0x3D94 this is kind of a ’50/50 subroutine’. One half is totally obvious (check RAM, ROM and addressing mode) and the other half is all greek to me… e.g. what is all that PUSHing about? There’s not a single POP inside this routine (or subroutines call from within). Here’s a wild guess of mine:
It looks like 1 to 4 ‘RAM range triplets’ being pushed onto the stack and after that gestaltPhysicalRAMSize (#’ram ‘) is called, for example:

    3D9A:          CLR.L   -(A7) ; faster 'PUSH.L #00000000'
    3D9C:          CLR.L   -(A7) ; PUSH.L #00000000
    3D9E:          CLR.L   -(A7) ; PUSH.L #00000000
 
    3DA0:          PUSH.L  #$100000
    3DA6:          PUSH.L  #$50F00041
    3DAC:          PUSH.L  #$50F00000
 
    3DB2:          PUSH.L  #$2000
    3DB8:          PUSH.L  #$53000041
    3DBE:          PUSH.L  #$53000000
 
    3DC4:          PUSH.L  #$2000
    3DCA:          PUSH.L  #1
    3DD0:          PUSH.L  #$53002000
 
    3DD6:         MOVE.L  #'ram ',D0  ; Returns the number of bytes of the physical RAM 
    3DDC:         _Gestalt ; (A0/selector:OSType):D0\OSErr

But the gestaltPhysicalRAMSize call does not take parameters and simply returns the amount of available RAM.

The good thing is, this sub-routine works flawlessly on the SE/30 and we can move on…

instFPSP

instFPSP is the next call in line. I’m not going to discuss this code in detail because it actually doesn’t do much. Still there are many inline comments in this routine if you like to know more. Here’s the background:

The FPU in the 68040 was made incapable of IEEE transcendental functions, which had been supported by both the 68881 and 68882 and were used by the popular fractal generating software of the time and little else. The Motorola floating point support package (FPSP) emulated these instructions in software under interrupt. As this was an exception handler, heavy use of the transcendental functions caused severe performance penalties.

TLDR; Check for FPU(type) and load the FPSP code from the resource-fork into RAM. Done. Return to main:.

Phew, that’s it for now. In the next post/chapter we’ll touch the 3rd handler, which was really hard to decipher but interesting stuff to learn, too.

68000, Apple 68k, Carrera040, Hardware

Carrera 040 in an SE/30

January 28, 2019 Axel 5 Comments

As promised in my blog entry nearly one year ago, here’s the (monster) post about this project.

Background

Boy, what a ride! This is definitely my most complex (and ~~still ongoing~~ finished) software reverse engineering stunt ever!!
When starting this venture I was a blue-eyed Mac user and just-for-fun programmer and never imagined to learn this much about those machines I loved since 1985… by the way of a very nice guy I was finally able to get an SE/30. Immediately I thought of accelerating the cutie.
This first post will give you an insight about the workflow, hardware and software used. Following posts will then guide you deep into the code…

The MicroMac Carrera040

For many years I had a Carrera040 (or C040 for short) – a Motorola 68040 accelerator for Apple Macintoshes – in my locker which I bought in wise foresight without even owning a Mac to plug it in. The C040 I got was meant for usage in a Macintosh IIci, plugged into its L2 cache-slot. That said, using special adapters, the C040 could also be used in other 68030 Macs like the IIx, IIcx, IIsi, IIvi/vx and the LC/LC II.

Is the Carrera a Speedster?

What’s this question about? Well, you might also have come about notions of an accelerator called the ‘Mobius Speedster‘ which is pretty similar to the C040.
Well, it is and my wild assumption is that at one point MicroMac bought the design from Mobius. There’s even a leftover in the C040’s ReadMe:
“Applications that do not work with Quadra or Centris Macs are not likely to work on ‘040 accelerators, including the Carrera040. Generally, these incompatibilities are limited to the ‘040’s copy-back cache, or FAST mode on the Speedster.“

So when I had my glorious SE/30 sitting on my desk it immediately came to my mind to make this card running in it.
You have to know, that the SE/30 is a somewhat shrinked-down version of a Mac IIx which again is pretty close to the IIci – and there was an adapter in existence to use another popular IIci accelerator in an SE/30 (Daystar Turbo 040). But it’s very rare and there’s next to no chance to find one. Anyhow, it’s doable, so I was hooked.
I stumbled across a cry for help in the 68kmla forum, a user owning such an adapter and a C040 tried to get it running in his SE/30… to no avail. So while still not having the proper adapter (yet) I thought “why not start looking into the driver while waiting for the hardware?”.
So the journey started…

MacNosy – a users nightmare, a hackers heaven.

My natural reflex is to reach deep into my tool-bag, get out my favorite disassembler/hex-viewer and start digging through its output. But for System 7 my bag was empty. Is there any disassembler at all?
While the good thing is, that most software packages which cost plenty of $$$ back then are abandonware today, the bad thing is that many are undocumented and unsupported. After some research it became clear that MacNosy was and still is the best m68k MacOS disassembler around.

Boy, this disassembler is powerful! But it seems to be written by Steve Jasic for, well, Steve Jasic. I know that kind of tools – I’ve written some of those… and never showed it to anybody because it was, erm, special. Prepare yourself for “everything will be different than you’ll expect it”. Steve gave a sh!# about UI or keyboard conventions. Cope with it.

Luckily there’s a very good review and some sort of documentation can be found here.

Same but different – which is where?

Does ‘A5-world’ ring a bell to you? No? Don’t worry, it was the same for me, even I am using Macs for a long time.
Even it’s an 68k system, there are so many things done different than e.g. in Amiga OS or Ataris TOS – so you have to learn a lot.

Because it would absolutely bloat this post, I will link to external pages explaining the used term. So watch for the first mentioning, it’ll be a URL…

The provided Carrera040 “drivers” consist of an INIT/Extension (“Startup Carrera”) and a Control Panel (“Carrera 040 1.8”).
In the provided readme file there’s the line “With version 1.8 we have included an extension which ensures the Carrera040 code to load very early in the boot process.”

And indeed, the INIT code does not do much more than loading a specific resource from the control panels resource-fork.
So I concentrated on the control panel (CP for short). Using ResEdit, you’ll find the main detection and control-code in its resource fork called “SPDR’ (SPeedster DRiver, got it?).
While working through the code, commenting whatever I immediately understood (which wasn’t much in the beginning), I stumbled over several things you should also have an idea about before reading the disassembly in the coming chapters – so here’s a growing reading list:

The 68040 User Manual is always a good thing to have handy…
A5-World
Low Memory Globals
Vector base register (VBR, it’s a 68k thing, need to find a better doc)
to be continued...

Macsbug reloaded

During all that code-gazing, head-scratching and learning-new-things-every-day great luck struck and I virtually-met ‘Bolle‘. A guy who created a clone of the mystical PDS-to-IIci-slot-adapter. Woohoo!

Even those 120pin DIN connectors are incredibly hard to find.

So after spending some Euros I was finally able to jump into the ‘the real thing’ and try my patches in-vivo, or watch the code being executed. Thanks again, Bolle!

The drill

The weapon-of-choice for watching code run is definitely Macsbug, the official debugger from Motorola, heavily modified by Apple through all the years until MacOS 9.2.
Back in the days my contact with Macsbug was very brief. When a program ‘bombed’, I’ve entered “g” (for Go) and hoped the system will somewhat heal and keeps running…

Ok, now I had to be somewhat more serious – and my skills had improved over the last 20 years, so my routine turned into single-stepping and tracing through the code, skip certain instructions which might kill the code, watching all the registers and most important and watch how the Carrera “driver” behaves in an SE/30 vs. IIci.
I even created some macros (which have to saved into Macsbug own resource-fork!) and started an endless try-and-crash drill.

The working drill is tedious: You step through the instructions, while following your steps in the disassembled source, to the point where it crashes. Remember/note the point (address) where it crashed and try again.
This means you have to manually trace closer to “the edge” but try not to fall off the cliff. And when you did – and I did many times – rinse & repeat.
Sometimes you can ‘skip’ complete function calls containing hundreds of instructions (called ‘Trace’), sometimes you have to sit-through (i.e. single-step) a very, very, very long loop just to be sure it works 100%.

The next post/chapters will finally dive into the control panels code.
While it’s all about this specific ‘driver’ I’m sure it’ll help everybody who starts the adventure of understanding pretty low-level 68k Macintosh code.

That said, in Dec. 2019, continuously working with Bolle, we came to the conclusion it has to be a hardware problem and Bolle was able to prove this and most importantly found a way to fix it.
There will be a 4th and final post concluding all of our findings.

News

Nothing happening in 2018?

March 30, 2018 Axel 3 Comments

Hey Axel, what’s up? No new post for months… are you still alive?

Yes, shame on me… it’s been really silent the last months. But that does not necessarily mean I’m not working on something.
Actually I was pulled into a 68k Apple Macintosh vortex which really got be big time. As you probably know I have a soft spot for all things 68000 and was a Mac user from day one (well, ~1985). User means that even I did not own one until 1994 (i.e. paying for it with own money) I borrowed, worked, sat in front of many models…

Long story short: Got my hands onto a SE/30, pulled an 040 accelerator from the basement, learned that it will never work in there… got lit 😉 Bought a IIci for checking and from there it went on uncontrolled… So the challenge is on to get this IIci accelerator working in sweet lil’ SE/30. Read the thread here (68kmla.org forum).

While I was a Mac user, I never went real deep into programming them. So here I am, digging through 6098 lines of disassembly and learn ‘new’ things like A5-world, Mac boot processes, the dirtiest tricks in MacsBug, INITs, how control panels work and last but not least patching Toolbox calls from the machine language side of things. It’s a steep learning curve, a total waste of time & energy. And I love it!

P.S.: ~~I’ll write down the whole stunt into a post one day~~. But for now it’s totally work-in-progress… It’s done. Read the whole thing here. (There are 4 chapters)

miroHIGHRISC & miroTIGER

Digging deeper into the highRISC

June 26, 2017 Axel Leave a comment

After 7 years mainly doing research on Transputers and the i860, I had the feeling it’s time to do some more digging into the highRISC card.
If you have read my initial post about the miroHIGHRISC (and the Tiger) you remember the undocumented 20pin socket on the card (pictured in the upper right corner):

Let’s have another look at the “UART port” again:

The pinout (the connector is rotated 90° counter clock-wise):

GND  oo  /WR0
D0   oo  INT2
D1   oo  /RD
D2   oo  /IOSEL
D3   oo  RESET (most likely)
D4   oo  A2
D5   oo  A3
D6   oo  A4
D7   oo  A5
VCC  oo  A23

Reading a bit of the BIOS’ disassembly, I stumbled across routines to talk to an UART. A very common (D)UART of those days was the SCN2681. If you take a look at this chips specs, they perfectly fit to the signals provided at the HIGRISCs UART-port!

Here’s its pinout with the corresponding pins marked:

A2-A5 are used for A0-A3 on the 2681 and the only pin not directly represented is A23 which might be used to decode. Also, it nicely reveals that CPU INT2 is used for the UART.

The LR33000 datasheet tells me that there’s an 4MB IO-area starting at 0x1E000000 reaching up to 0x1EFFFFFF- most likely the 2681 will live there… and the corresponding signal called /IOSEL is available on the UART-port (and will perfectly help as chip-select decoder). Tadaa!

So after the UART we need to get the RX/TX signals to a higher level, i.e. the +/-15V of RS232 – this is the call for our old friend MAX232.

[current bread-board experiments sadly didn’t yield into ‘instant success(tm)’… I’m missing out something – need more time to investigate]

Bootcode / BIOS

The LR330xx CPU also has an /EPSEL EPROM select signal, indicating it’s accessing an EPROM expected to start at 0x1F000000 and ends at 0x1FFFFFFF (4MB right below the IO-area).
Using this knowledge and knowing that the MIPS standard boot-vector is at 0x1FC00000, it’s easy to feed the ROM-dump I did some years ago into the disassembler with the correct start address to do his job.

We need to get an understanding of this bootcode first, so that we can get an idea of “what is where” (e.g. ISA bus, UART etc) and later upload our own code and use those addresses.
Just to stick my head a bit into the clouds, the aim is to first port a then common MIPS monitor-program called ‘PMON’ and use that to run some sort of μLinux. But that’s probably another handful of years ahead…
PMON was a good source of information, because it’s originally written by LSI, supporting all the LSI eval-boards. Lo and behold, some of them had a 2681 UART, too… located at 0xBE000000, which is extensively used in my BIOS disassembly 😉
I have a certain feeling that miro borrowed some design ideas from the LSI Pocket Rocket evaluation board (don’t Google it, it’s a mythic being – if you have documentation, mail me!).

So this is the 2681 memory-map then:

#define BASE_2681 0xbe000000
#define SRA_2681 ((1*4)+BASE_2681) // 0xbe000004 status register 
#define THRA_2681 ((3*4)+BASE_2681) // 0xbe00000C Rx/Tx holding register 
#define ACR_2681 ((4*4)+BASE_2681) // 0xbe000010 Aux contrl. register 
#define ISR_2681 ((5*4)+BASE_2681) // 0xbe000014 interrupt state register 
#define CTU_2681 ((6*4)+BASE_2681) // 0xbe000018 Counter timer upper 
#define CTL_2681 ((7*4)+BASE_2681) // 0xbe00001C counter timer lower 
#define START_2681 ((14*4)+BASE_2681) // 0xbe000038 start timer 
#define STOP_2681 ((15*4)+BASE_2681) // 0xbe00003C stoptimer

Using those addresses we should easily identify the comms routines.

Something happens at 0xBE800000 which seems not UART related. So that’s probably the reason why A23 is available on the connector. That way we can ignore access to that address by OR’ing it with /IOSEL to create a /CS.

The DOS side of things

The tool to load a MIPS executable into the HIGHRISC is called DL.EXE. Loading the test-program prints this to the console:

miroHIGHRISC download program. V 1.00
(c) miro Computer Products AG , Germany

CONFIG: I/O-register-address: 0x368 
CONFIG: DRAM - base-address : 0xD000 
CONFIG: DRAM - size : 8 MB
CONFIG: TIGER - RAM - size : 8 MB

Resetcount = 87340

Loading test.zor
text : start=0x80030000 size=0x52c0
data : start=0x800352c0 size=0x520
bss : start=0x800357e0 size=0x150
entry : 0x800301a0
TIGER comm.address : 0x3ffd00
max_used_address : 0x35930 
real_DRAM : 0x800000 
Heapsize : 0x7CA6D0

test.zor sucessfully downloaded.

This gives us valuable information. The DOS-side uses the IO port 0x368 and has a memory window of 16K from 0xD000 to D3FF.
MIPS programs are loaded to 0x80030000 and the 16K seems to be mapped to 0x003FFD00, just 128K below the 4MB boundary of the LR33k address space.

As usual – this is heavily work-in-progress. So this post will be edited while making any new progress. TBC…

Myriad Dash860

ThruView – Software for the DASH!860

April 30, 2016 Axel Leave a comment

With the DASH!860/ShadeMASTER combo came the software package “ThruView” (this is the sales brochure).

Sadly -and usual in those days- it’s protected by one of those nasty dongle keys plugged into the PCs parallel port. If you were into computers in the 80s/90s you surely remember them, most likely full of hate:
They were flaky sometimes, didn’t reliably work with the printer looped through them and there was a 50% chance they refrain from working when upgrading your PC.

Running ThruView

Anyhow, starting ThruView, it greets you with a friendly

Ok. Thanks for that ma’am. Here we are. 25 years later and the last dongle for it probably went the way of the Dodo.

But you wouldn’t be at “the home of real mens hardware” when you wouldn’t do, what a man has to do in this case…
Out went the Disassemblers. I recommend IDA Pro for comfortable work on your recent Windoze workstation and SoftICE to work on the bare metal itself.
Ahh, finally, cracking time again. Missed that during the last years…
Half a day later, Thru-View greets me with this:

Yay. One hurdle taken… here’s the next one: You have to open an image file. In the ubiquitous PICtor format. doh! (So I thought…)

Ok, somewhere I was able to get some sample .PIC files… select that and open the damn thing, click OK and…

…followed by…

As far as I can see, XNIX is loaded correctly. For now I have the suspicion that there are communication issues with the DASH!860 due to my PC workhorse is a ‘modern’ Pentium 1 MMX… and we all know how lazy programmers were back in the days when it came to delay-loops etc..

So next up: Tow the trusty 80486 system out of the basement and check it with that one.

Two days later…

Ok, on my 486-PC I was able to successfully load the XNIX kernel in real-mode (‘x.exe /r’) and using the debug-flag I saw some errors about config-files not found in C:\TVPLUS… wtf? All paths are set in the .cfg file but it seems some are just ignored and hard-coded.
Ok, so I created that folder and copied everything over and called ‘rstub ___tv1’ again… and this time it worked!
So let’s open that PIC file… it reads and reads and:

Ohhhh-kayyyy. So my assumption that ‘.pic’ meant a PICtor file was wrong. Some intense Google’ing later I’ve learned that the file format is the “Biorad PIC“. I could have guessed that before. Those times were the times of proprietary formats. How to get such a file to play around with it?
Luckily others had the same problem. ImageJ seems to be the main tool for converting scientific visual data, and it has a native support for reading Biorad PICs. But how to create one?
Well, thanks to this tool, you can create it when creating .raw files before using ImageJ. A bit cumbersome – but hey at least something.

…another day later…

Alrighty – that brought me a bit further: As far I can see, ThruView is working! My self-built file was successfully loaded and I was able to play with all modules. Here’s a slideshow, showing the available modules (loader, builder, animator):

While my file loads fine, I wasn’t able to get a picture on the ShadeMASTER VGA output yet. I can hear the relay switching the outputs and my LCD display catches the signal correctly (dimensions as well as refreshes) but it’s just black.

Here’s another “finally!”: Loading my Z-Stack PIC file takes 4.7MB of the DASH’s RAM… changing back from the builder into the loader module I got this error message:

This is a ‘special moment’ for me, as this is the first time that the available RAM of one of my i860 cards was actually filled with meaningful data.
But rest assure: As soon the couple works as supposed the hardware tweaking will begin 😉

Intel 80860, Number Smasher 860

Dissection time!

November 5, 2010 Axel Leave a comment

Ok, as I’m probably the last person on this planet fiddling around with the NumberSmasher i860, it was either “help yourself” or bust.

Given the fact that there’s an INMOS C012 on the card I tried my luck with the standard address of 0x150 and checked it with DOS’ crappy old ‘debug’. To my surprise I was able to talk to the C012, so it was very worth to investigate further.
So out went the good ol’ EPROM programmer and the EPROM of the card was dumped into a file.
I have 3 of those boards, two having a label on the EPROM saying “v1.1” and “BOOT_B2”. Both are identical… if you happen to own a NumberSmasher with a different label, get the dumpfile here for comparison.

That was easy, now the harder part: Disassembly. I had to revive my i860 machine language skills again, so it took me 2 days (on and off) to get a full understanding, what’s happening in there.
For those i860 assembly geeks among you, here‘s the fully commented code.

This “BIOS” is actually pretty simple. It’s just what I’d call a “PeekPokeStarter”. The main loop is waiting for a ‘command’ coming in by the way of the C012. This command can be either “0” or “1” as mentioned in the article on the previous page.

“0” means POKE (ie. write) and expects 4 bytes for the address and 4 bytes of data to be written (Both LSB first, Intel-style). So the full command reads: 0 00 00 00 20 78 56 34 12 or “write to address 0x20000000 the value 0x12345678”

So in consequence “1” means PEEK (ie. read) which just needs 4 bytes for the address to be read. The command would then be 1 00 00 00 20 or “read from 0x20000000”. The “BIOS” will then put 4 bytes to the C012 port at 0x150, which requires 4 reads from the PC side getting “78”, “56”, “34” and “12”.

Pretty simple, huh? But how can I start a program after it’s been painstakingly poked into the NumberSmashers RAM? Here’s the trick:

Poking to address 0x00000000 means start from the address given as data. E.g. “write to address 0x00000000 the value 0x20000000” is actually “start from 0x20000000”, or as command-chain: 0 00 00 00 00 00 00 00 20 – so beware of poking to 0!
Also, starting a program seems to disable the EPROM, so communication to the C012 is cut off if the running program isn’t handling this itself.

That’s about it. Nothing more in the “BIOS”… that’s why only 495 bytes(!) of the 8K EPROM are actually used. This simplicity leads to a very simple memory-map:

Base = 0xF8000000
C012-InData = [base] + 0x07
C012-OutData = [base] + 0x0F
C012-InStatus = [base] + 0x17
C012-OutStatus = [base] + 0x1F

Next task: Get a program running on the NumberSmasher.

[11/11/10] Great News…again! It was easier than expected… the first program running on the NumberSmasher-860! So read on in the next post…

GeekDot

Tag Archives: reverse engineer

The Number Cruncher – FPU for the Apple II

Let’s re-animate the Number Cruncher!

Hardware

Revving it up

Software

Carrera in an SE/30 – the code part 3

I did it my way…

“There and back again…”

This is the end…

Carrera in an SE/30 – the code part 2

3rd handler

The Vector Base Register

EmEmYou!

Carrera in an SE/30 – the code part 1

Approaching… difficulties.

sysDetect

2nd handler

instFPSP

Carrera 040 in an SE/30

Background

The MicroMac Carrera040

MacNosy – a users nightmare, a hackers heaven.

Same but different – which is where?

Macsbug reloaded

The drill

Nothing happening in 2018?

Hey Axel, what’s up? No new post for months… are you still alive?

Digging deeper into the highRISC

Bootcode / BIOS

The DOS side of things

ThruView – Software for the DASH!860

Running ThruView

Dissection time!

home of real men's hardware