Category Archives: 68000

All 68k systems are great!

Carrera 040 in an SE/30

January 28, 2019 Axel 5 Comments

As promised in my blog entry nearly one year ago, here’s the (monster) post about this project.

Background

Boy, what a ride! This is definitely my most complex (and ~~still ongoing~~ finished) software reverse engineering stunt ever!!
When starting this venture I was a blue-eyed Mac user and just-for-fun programmer and never imagined to learn this much about those machines I loved since 1985… by the way of a very nice guy I was finally able to get an SE/30. Immediately I thought of accelerating the cutie.
This first post will give you an insight about the workflow, hardware and software used. Following posts will then guide you deep into the code…

The MicroMac Carrera040

For many years I had a Carrera040 (or C040 for short) – a Motorola 68040 accelerator for Apple Macintoshes – in my locker which I bought in wise foresight without even owning a Mac to plug it in. The C040 I got was meant for usage in a Macintosh IIci, plugged into its L2 cache-slot. That said, using special adapters, the C040 could also be used in other 68030 Macs like the IIx, IIcx, IIsi, IIvi/vx and the LC/LC II.

Is the Carrera a Speedster?

What’s this question about? Well, you might also have come about notions of an accelerator called the ‘Mobius Speedster‘ which is pretty similar to the C040.
Well, it is and my wild assumption is that at one point MicroMac bought the design from Mobius. There’s even a leftover in the C040’s ReadMe:
“Applications that do not work with Quadra or Centris Macs are not likely to work on ‘040 accelerators, including the Carrera040. Generally, these incompatibilities are limited to the ‘040’s copy-back cache, or FAST mode on the Speedster.“

So when I had my glorious SE/30 sitting on my desk it immediately came to my mind to make this card running in it.
You have to know, that the SE/30 is a somewhat shrinked-down version of a Mac IIx which again is pretty close to the IIci – and there was an adapter in existence to use another popular IIci accelerator in an SE/30 (Daystar Turbo 040). But it’s very rare and there’s next to no chance to find one. Anyhow, it’s doable, so I was hooked.
I stumbled across a cry for help in the 68kmla forum, a user owning such an adapter and a C040 tried to get it running in his SE/30… to no avail. So while still not having the proper adapter (yet) I thought “why not start looking into the driver while waiting for the hardware?”.
So the journey started…

MacNosy – a users nightmare, a hackers heaven.

My natural reflex is to reach deep into my tool-bag, get out my favorite disassembler/hex-viewer and start digging through its output. But for System 7 my bag was empty. Is there any disassembler at all?
While the good thing is, that most software packages which cost plenty of $$$ back then are abandonware today, the bad thing is that many are undocumented and unsupported. After some research it became clear that MacNosy was and still is the best m68k MacOS disassembler around.

Boy, this disassembler is powerful! But it seems to be written by Steve Jasic for, well, Steve Jasic. I know that kind of tools – I’ve written some of those… and never showed it to anybody because it was, erm, special. Prepare yourself for “everything will be different than you’ll expect it”. Steve gave a sh!# about UI or keyboard conventions. Cope with it.

Luckily there’s a very good review and some sort of documentation can be found here.

Same but different – which is where?

Does ‘A5-world’ ring a bell to you? No? Don’t worry, it was the same for me, even I am using Macs for a long time.
Even it’s an 68k system, there are so many things done different than e.g. in Amiga OS or Ataris TOS – so you have to learn a lot.

Because it would absolutely bloat this post, I will link to external pages explaining the used term. So watch for the first mentioning, it’ll be a URL…

The provided Carrera040 “drivers” consist of an INIT/Extension (“Startup Carrera”) and a Control Panel (“Carrera 040 1.8”).
In the provided readme file there’s the line “With version 1.8 we have included an extension which ensures the Carrera040 code to load very early in the boot process.”

And indeed, the INIT code does not do much more than loading a specific resource from the control panels resource-fork.
So I concentrated on the control panel (CP for short). Using ResEdit, you’ll find the main detection and control-code in its resource fork called “SPDR’ (SPeedster DRiver, got it?).
While working through the code, commenting whatever I immediately understood (which wasn’t much in the beginning), I stumbled over several things you should also have an idea about before reading the disassembly in the coming chapters – so here’s a growing reading list:

The 68040 User Manual is always a good thing to have handy…
A5-World
Low Memory Globals
Vector base register (VBR, it’s a 68k thing, need to find a better doc)
to be continued...

Macsbug reloaded

During all that code-gazing, head-scratching and learning-new-things-every-day great luck struck and I virtually-met ‘Bolle‘. A guy who created a clone of the mystical PDS-to-IIci-slot-adapter. Woohoo!

Even those 120pin DIN connectors are incredibly hard to find.

So after spending some Euros I was finally able to jump into the ‘the real thing’ and try my patches in-vivo, or watch the code being executed. Thanks again, Bolle!

The drill

The weapon-of-choice for watching code run is definitely Macsbug, the official debugger from Motorola, heavily modified by Apple through all the years until MacOS 9.2.
Back in the days my contact with Macsbug was very brief. When a program ‘bombed’, I’ve entered “g” (for Go) and hoped the system will somewhat heal and keeps running…

Ok, now I had to be somewhat more serious – and my skills had improved over the last 20 years, so my routine turned into single-stepping and tracing through the code, skip certain instructions which might kill the code, watching all the registers and most important and watch how the Carrera “driver” behaves in an SE/30 vs. IIci.
I even created some macros (which have to saved into Macsbug own resource-fork!) and started an endless try-and-crash drill.

The working drill is tedious: You step through the instructions, while following your steps in the disassembled source, to the point where it crashes. Remember/note the point (address) where it crashed and try again.
This means you have to manually trace closer to “the edge” but try not to fall off the cliff. And when you did – and I did many times – rinse & repeat.
Sometimes you can ‘skip’ complete function calls containing hundreds of instructions (called ‘Trace’), sometimes you have to sit-through (i.e. single-step) a very, very, very long loop just to be sure it works 100%.

The next post/chapters will finally dive into the control panels code.
While it’s all about this specific ‘driver’ I’m sure it’ll help everybody who starts the adventure of understanding pretty low-level 68k Macintosh code.

That said, in Dec. 2019, continuously working with Bolle, we came to the conclusion it has to be a hardware problem and Bolle was able to prove this and most importantly found a way to fix it.
There will be a 4th and final post concluding all of our findings.

68000, Apple 68k, Carrera040

Carrera in an SE/30 – the code part 1

January 28, 2019 Axel 2 Comments

The disassembled code of the Micromac Carrera 040 control panel is quite big: 6000+ lines of 68030/40 assembly…

While these posts might be entertaining and giving you an insight into classic MacOS driver code, they are also meant as a notebook to myself to get into the source quickly – especially after some weeks or months of distraction 😉

That said, I will not discuss each and every line of code. There are many parts which aren’t important (for now) or just not reached yet.
Still, it will take several parts/chapters to cover everything I worked on.

The complete code is available over here on GitHub and will updated every time I’m working on it.
Whenever I’m mentioning addresses I’m referring to this code on GitHub. NB: I will never use line-numbers as these might change during editing the source.
Also, when you’ll see a light-bulb 💡 somewhere, this is where I’m not sure and happy about enlightenment or comments from you 😉

This article is ~~totally work-in-progress~~ closed.
~~E.g. every now and then my theories about what a certain code does changes, I learn new things and all the sudden whole blocks of code make sense… so this post will change/grow, too.~~Bolle did a lot of hardware research and in the end it became clear that the INIT/Driver has nothing to do with the non-function of the Carrera in an SE/30. After Bolle modified his adapter, the Carrera 040 is happily running in my ’30 now.But still, this series of posts is definitely worth reading, especially if you’re into reverse engineering 68k assembly code.

Approaching… difficulties.

What’s the main job of this code? From a 30000ft perspective the simple answer is “switching the Carrera040 on and off”, i.e. toggling between the hosts slower on-board 68030 and the insanely fast 68040 on the C040. At boot-time… as well as during the system is running (by user interaction).

Sounds pretty simple, huh? Lowering our flight altitude to 3000ft more things come into play:
Identify the hosting Macintosh. As mentioned in the previous chapter, the C040 was able to run in a Mac II, IIx, IIcx, IIvx, IIvi, IIvm, IIsi, IIci, LC and LCII… all of them different in many places. These differences have to be handled…
Down at 30ft we have to admit that there are differences between a 68030 and his younger brother 68040, mainly concerning caches, FPU and the MMU.
Finally hitting the ground, it’s becoming clear that it is everything but trivial to halt a running processor, save its complete context and start another (slightly different) processor with that. And back again…

Some given things before we start:

We will concentrate on the IIx “branch” as this machine is closest to the SE/30 like not-32bit-clean, memory-map, the GLUE chip, two real VIAs with the same register layout etc.
I learned from the code that the C040 is memory-mapped at 0x53000000 in some of the supported models, especially the IIx and IIci. This means 32bit addressing is a must (-> need “mode32” INIT or clean ROM)
I tried to comment as much as possible/understood inline (i.e in the code) – a good bit of 68k machine language knowledge is still required 😉
If something needs more explanation, I’ll try to provide this before the code quote or afterwards.

So this is the main routine (at 0x21FC):

main     MOVEM.L A4-A6,-(A7)
         MOVE.L  D0,D7
         MOVE.L  #$31E,D0    ; need 798 bytes
         _NewPtr ,CL_SY      ; allocate requested amount of memory (D0) in system
                             ; heap (returned in A0) and initialize to zeroes
         TST     D0          ; success?
         BNE.S   lae_6       ; nope, exit.
         LEA     data2,A1    ; else
         MOVE.L  A0,(A1) ; Init A5 world and save into data2
         MOVEA.L A0,A5
         MOVE    #$A89F,D0   ; UnimplTrap
         _GetTrapAddress     ; (D0/trapNum:Word):A0\ProcPtr 
         MOVE.L  A0,$29C(A5) ; save the trap addr into 2 places 
         MOVE.L  A0,$2A0(A5) ; in the A5 world
         BSR     sysDetect   ; Jump to Machine detection routine 
         BNE.S   lae_6       ; success?
         MOVE.L  D7,D0
         JSR     4(A6)       ; We jump to the subroutine set in the detection routine
                             ; for the second time, this time offset 4...
			     ; i.e. we skip the 1st 'BRA' there
         BNE.S   lae_6       ; success?
         BSR     instFPSP    ; Install Motos FPSP
         BNE.S   lae_6       ; success?
         JSR     8(A6)       ; That's the 3rd call in the handler call cascade (needs hack for MacsBug!)
         BNE.S   lae_6       ; success?
         BSR     proc32      ; works (get some RSC strings)
         BSR     proc43      ; install traps
         BNE.S   lae_6       ; success?
         JSR     12(A6)      ; That's the 4th call in the handler call cascade
         BNE.S   lae_6       ; success?
         BSR     proc41      ; works (atalk?)
         BSR     proc42      ; VIA stuff and such - BOOM
         BSR     proc29
         MOVEM.L (A7)+,A4-A6
         MOVEQ   #0,D0

As you can see, there are 10 calls to subroutines- currently it crashes inside the 8th subroutine, currently called proc42… But let’s check these subroutines one by one.

sysDetect

This is the subroutine I had to “patch” to initially make the driver work with an SE/30. It starts at 0x2022 and does these things:

Check if the ‘Gestalt‘ trap is available at all (very good style!) else throw an error
If it is, read the machines Gestalt code into D0, throw an error if zero
Decide which ‘handler’ to choose given the Gestalt code.

Based on their Gestalt codes there are four groups of Macs defined in the following lines (0x204C – 0x20AC):

Mac II/IIx/IIcx — “dirty Macs”, not 32bit clean, no PDS
- “Expansion I/O Space” from 0x51000000 to 0x5FFFFFFF
- the C040 installs with an adapter right into the CPU socket in the II/IIx/IIcx
- SE/30 is also “dirty”, need mode32 or IIsi ROM in slot
- these machines also use the GLUE chip to emulate the VIA2 like the SE/30
Mac IIvx, IIvi, IIvm — special kind of PDS slot
- there’s no mentioning of support on the MicroMac page
Mac IIsi, IIci
- Kind of interesting because the si has the same PDS slot like the SE/30
- Uses the RBV (Ram Based Video) controller which emulates the VIA2
- Therefore totally different memory layout (VRAM at 0x00000000 mapped by the MMU etc.)
Mac LC, LCII, Color Classic
- These share the same LC-PDS slot

If your Mac is one of those (or patched at 0x2058), you’ll branch into sys_check: (0x20BA) which will make sure you run at least System 6.0.5, have virtual memory switched off and jumps into the selected handler code (address saved in A6) at 0x20EA for the first time.

Here’s the code of what’s discussed above:

2022: sysDetect: MOVE.L  #$A0AD,D0  ; Gestalt
2028:         _GetTrapAddress newOS ; (D0/trapNum:Word):A0\ProcPtr 
202A:         MOVE.L  A0,D2
202C:         MOVE.L  #$A89F,D0     ; UnimplTrap
2032:         _GetTrapAddress newTool; (D0/trapNum:Word):A0\ProcPtr 
2034:         CMP.L   A0,D2
2036:         BEQ     OS_bad
203A:         MOVE.L  #'mach',D0
2040:         _Gestalt ; (A0/selector:OSType):D0\OSErr 
2042:         BNE     bad_conf      ; If we can't read it, fire general Error Msg
2046:         MOVE.L  A0,D0
2048:         MOVE.L  D0,2(A5)

; Check for several Mac models which are grouped into 3, each having its own handler routine. 
; 1) Mac II/IIx/IIcx 
; 2) IIvx, IIvi, IIvm 
; 3) IIsi, IIci     
; 4) LC, LCII, Color Classic
    
204C:         LEA     MacII_handler,A6   ; -- The dirty gang
2050:         CMPI.L  #6,D0        ; MacII 
2056:         BEQ.S   sys_check
2058:         CMPI.L  #7,D0        ; MacIIx - we replace this by the SE/30 #9
205E:         BEQ.S   sys_check
2060:         CMPI.L  #8,D0        ; IIcx
2066:         BEQ.S   sys_check

2068:         LEA     V_handler,A6   ; -- The "V" Macs.
206C:         CMPI.L  #48,D0       ; IIvx
2072:         BEQ.S   sys_check
2074:         CMPI.L  #44,D0       ; IIvi
207A:         BEQ.S   sys_check
207C:         CMPI.L  #45,D0       ; IIvm
2082:         BEQ.S   sys_check

2084:         LEA     IIci_handler,A6   ; -- IIci and IIsi
                                        ; BOTH share the same "Expansion I/O Space" (0x5300 0000)
2088:         CMPI.L  #11,D0       ; IIci
208E:         BEQ.S   sys_check
2090:         CMPI.L  #18,D0       ; IIsi 
2096:         BEQ.S   sys_check

2098:         LEA     LC_handler,A6    ; -- The LC-PDS family
209C:         CMPI.L  #19,D0       ; LC
20A2:         BEQ.S   sys_check
20A4:         CMPI.L  #37,D0       ; LCII
20AA:         BEQ.S   sys_check
20AC:         CMPI.L  #49,D0       ; Color Classic
20B2:         BEQ.S   sys_check

; Any other Model/Gestalt will bring up an error alert-box 

20B4:         MOVE    #$1B5B,D0    ; "Carrera040 does not support this Macintosh model."
20B8:         BRA.S   RET_err      ; -> "TST     D0 & RTS"

; We found a supported model, so keep on going checking for the OS version...

20BA: sys_check: MOVE.L  #'sysv',D0    ; Check OS version
20C0:         _Gestalt              ; (A0/selector:OSType):D0\OSErr 
20C2:         BNE.S   bad_conf      ; If we can't read it, fire general Error Msg
20C4:         MOVE.L  A0,D0
20C6:         CMPI    #$605,D0     ; System 6.0.5
20CA:         BGE.S   OS_ok        ; or greater
20CC: OS_bad: MOVE    #$1B5C,D0    ; "Carrera040 does not work with this version of the operating system."
20D0:         BRA.S   RET_err      ;
20D2: OS_ok:  MOVE.L  #'vm  ',D0   ; Check for enabled Virtual Memory
20D8:         _Gestalt             ; (A0/selector:OSType):D0\OSErr 
20DA:         BNE.S   bad_conf     ; If we can't read it, fire general Error Msg
20DC:         MOVE.L  A0,D0
20DE:         BTST    #0,D0
20E2:         BEQ.S   VM_ok
20E4:         MOVE    #$1B5D,D0    ; "Carrera040 does not work with Virtual Memory turned on. 
                                   ; Please turn off Virtual Memory in the Memory control panel and restart your Mac."
20E8:         BRA.S   RET_err
20EA: VM_ok:  JSR     (A6)         ; This is the actual HANDLER CALL, been set in $204C-$2098
20EC:         BNE.S   RET_err
20EE:         MOVE.B  34(A5),D0    ; 34(A5) seems to contanin the Jumper settings at the lowest 3 bits and only three of them are valid:
20F2:         CMPI.B  #7,D0        ; 7 -> 111
20F6:         BEQ.S   RET_ok
20F8:         CMPI.B  #6,D0        ; 6 -> 110
20FC:         BEQ.S   RET_ok
20FE:         CMPI.B  #5,D0        ; and 5 -> 101 
2102:         BEQ.S   RET_ok
2104:         MOVE    #$1B5E,D0    ; "Carrera040 does not recognize the jumper settings on the Speedster card. 
                                   ; Please check the settings against the manual.
2108:         BRA.S   RET_err
210A: RET_ok: MOVEQ   #0,D0        ; clear D0 (no errors)
210C:RET_err: TST     D0           ; Set the Z-Flag (D0 contains Err-Code) and
210E:         RTS                  ; return from Subroutine
2110:bad_conf:MOVE    #$1B5A,D0    ; "Carrera040 does not support your system configuration."
2114:         BRA     RET_err

Yes, there’s also stuff after the call to the handler, but let’s check that handler first.
As said in the beginning, I chose to take the “IIx route”. The MacII_handler code is actually just another vector jump-table which will later be used with offsets:

414E:  MacII_handler:  BRA   MacII_1st ; From II, IIx & IIcx
4152:                  BRA     MacII_2nd
4156:                  BRA     MacII_3rd
415A:                  BRA     MacII_4th

Let’s have a look into the first call MacII_1st:

3D14:  MacII_1st  MOVEM.L D1-D3/A0-A2/A6,-(A7) ; 1st call from MacII handler
3D18:           PUSH.L  8
3D1C:           LEA     data105,A0
3D20:           MOVE.L  A0,8       ; Is that the Bus Error Handler at 0x00000008?
3D24:           MOVE    #$1B5F,D3  ; 7007
3D28:           MOVEQ   #1,D0
3D2A:           _SwapMMUMode  
3D2C:           PUSH.B  D0
3D2E:           MOVEA.L A7,A6
3D30:           BSR     read_5300k2
3D34:           MOVEQ   #0,D3
3D36:  data105  MOVEA.L A6,A7
3D38:           POP.B   D0
3D3A:           _SwapMMUMode  
3D3C:           POP.L   8
3D40:           MOVEQ   #0,D0   ; ?
3D42:           MOVE    D3,D0   ; overwriting?
3D44:           BNE.S   lae_153
3D46:           MOVEM.L D1-D2/A0-A2,-(A7)
3D4A:           LEA     53_cmd_0,A0
3D4E:           MOVE.L  A0,6(A5)
3D52:           LEA     53_cmd_1x,A0
3D56:           MOVE.L  A0,10(A5)
3D5A:           LEA     read_5300k2,A0
3D5E:           MOVE.L  A0,14(A5)
3D62:           LEA     53_cmd_5.3,A0
3D66:           MOVE.L  A0,18(A5)
3D6A:           LEA     53_cmd_5.1,A0
3D6E:           MOVE.L  A0,26(A5)
3D72:           LEA     53_cmd_5.3.5.1,A0
3D76:           MOVE.L  A0,22(A5)
3D7A:           MOVEM.L (A7)+,D1-D2/A0-A2
3D7E:           BSR     read_5300k2
3D82:           ANDI.B  #7,D0
3D86:           MOVE.B  D0,34(A5)
3D8A:           MOVEQ   #0,D0
3D8C:  lae_153: MOVEM.L (A7)+,D1-D3/A0-A2/A6
3D90:           TST     D0
3D92:           RTS

As you can see, even in such simple and short subroutines are some things I just don’t get? For example why is the effective address of data105 written to 0x8? Is that replacing the Error Handler in the VBR?
Anyhow, I think I got the overall meaning of the rest of it. What happens is this:

After switching into 32bit mode (_SwapMMUMode) it reads a longword from 0x53000000. As initially mentioned, the C040 is mapped to this address. There are 2 identical functions to read from there, that’s why this one here called read_5300k2.
It looks like reading is sufficient because the result (returned in D7) is immediately overwritten by a pop. Also that BNE after two moves is beyond me (0x3D40)… OTOH the rest of the code is pretty clear: It’s ‘populating’ the A5-world with subroutines I’d call 53-commands. These commands write a specific byte sequence to 0x53000000, obviously communicating with the C040. For better understanding I’ve named them e.g. 53_cmd_5.3.5.1 meaning writing 5, then 3, then 5 and finally 1 to this address.
At the end, 0x5300k is read again, this time the result is masked to the last bit and written to 34(A5) – this represents the C040 jumper-settings by the way. Return from Subroutine…

Back in sys_check: this jumper-setting will be checked immediately for three valid settings: 111, 110 or 101 representing the supported CPU types (68040,68LC040,68EC040). If the setting is ok we’re done with sysDetect:and return to main:.

2nd handler

Located at 0x3D94 this is kind of a ’50/50 subroutine’. One half is totally obvious (check RAM, ROM and addressing mode) and the other half is all greek to me… e.g. what is all that PUSHing about? There’s not a single POP inside this routine (or subroutines call from within). Here’s a wild guess of mine:
It looks like 1 to 4 ‘RAM range triplets’ being pushed onto the stack and after that gestaltPhysicalRAMSize (#’ram ‘) is called, for example:

    3D9A:          CLR.L   -(A7) ; faster 'PUSH.L #00000000'
    3D9C:          CLR.L   -(A7) ; PUSH.L #00000000
    3D9E:          CLR.L   -(A7) ; PUSH.L #00000000

    3DA0:          PUSH.L  #$100000
    3DA6:          PUSH.L  #$50F00041
    3DAC:          PUSH.L  #$50F00000
    
    3DB2:          PUSH.L  #$2000
    3DB8:          PUSH.L  #$53000041
    3DBE:          PUSH.L  #$53000000
    
    3DC4:          PUSH.L  #$2000
    3DCA:          PUSH.L  #1
    3DD0:          PUSH.L  #$53002000
  
    3DD6:         MOVE.L  #'ram ',D0  ; Returns the number of bytes of the physical RAM 
    3DDC:         _Gestalt ; (A0/selector:OSType):D0\OSErr

But the gestaltPhysicalRAMSize call does not take parameters and simply returns the amount of available RAM.

The good thing is, this sub-routine works flawlessly on the SE/30 and we can move on…

instFPSP

instFPSP is the next call in line. I’m not going to discuss this code in detail because it actually doesn’t do much. Still there are many inline comments in this routine if you like to know more. Here’s the background:

The FPU in the 68040 was made incapable of IEEE transcendental functions, which had been supported by both the 68881 and 68882 and were used by the popular fractal generating software of the time and little else. The Motorola floating point support package (FPSP) emulated these instructions in software under interrupt. As this was an exception handler, heavy use of the transcendental functions caused severe performance penalties.

TLDR; Check for FPU(type) and load the FPSP code from the resource-fork into RAM. Done. Return to main:.

Phew, that’s it for now. In the next post/chapter we’ll touch the 3rd handler, which was really hard to decipher but interesting stuff to learn, too.

68000, Apple 68k, Carrera040

Carrera in an SE/30 – the code part 2

January 29, 2019 Axel 6 Comments

3rd handler

Next up is the 3rd handler, MacII_3rd: (0x3F94) in our case. Actually it’s called with JSR 8(A6), but that’s an 8 byte offset to the ‘base-address’ of any handler. Clever stuff, huh (Google for ‘pointer-table’)?

This subroutine contains serious magic and was a real hard nut to crack. Especially because it tricked me into believing that I’ve found the ‘crashsite’… which, to spoil the tension, isn’t.
It just kept on killing Macsbug, because it’s so low-level.

What this routine does is replacing the Vector Base Register (VBR) which ‘lives’ at address 0x00000000. Evil stuff.

After disabling interrupts and switching to 32bit-mode a field with 6 long-words (data107) will be populated with data generated in other routines.
For now I can only guess what these entries are (Values from my SE/30 given in brackets). We’ll discuss all that further down.
0x3FC6 to 0x3FD8 calculates the size of the chunk of code starting at data106 (0x4008) to the beginning of MacII_4th (i.e. the end of Mac_3rd), which is 180 bytes.
Using this length, the routine first saves the current VBR onto the stack using the system call _BlockMove.
Then the original VBR (+some more) will be replaced by the new version beginning at data106. (Killing Macsbug – more on that later)
BSR 53_cmd_1x is been called. This brings the Carrera040 into life most likely using the just copied VBR (This is discussed in much detail further down).
Now the contents of the stack (= copy of the original VBR) will be copied back into its place, this time using a classic DBRA loop (0x3FF4). My guess, no Toolbox call possible at the moment.
Adjust the stack, back to 16bit mode, restore Registers and return-from-subroutiene. Done.

Here’s the code doing all this:

3F94:MacII_3rd: MOVE    SR,-(A7)     ; 3rd call from MacII handler
3F96:           ORI     #$700,SR       ; Set bit 9-11 of SR (disable Interrupts)
3F9A:           MOVEM.L D0-D2/A0-A2,-(A7)
3F9E:           MOVEQ   #1,D0
3FA0:           _SwapMMUMode  
3FA2:           PUSH.B  D0
3FA4:           SUBA.L  A2,A2         ; faster movea.l #0,a2
3FA6:           LEA     data107,A0    ; Filling the data into the 6x32 field
3FAA:           MOVE.L  96(A5),D0
3FAE:           MOVE.L  D0,(A0)+      ; SE30: 9FE00
3FB0:           LEA     data69,A1
3FB4:           MOVE.L  A1,(A0)+      ; SE30: 9D6E2 (User/Supervisor Rootpointer?)
3FB6:           MOVE.L  $64(A5),(A0)+ ; 807FC040
3FBA:           MOVE.L  $6C(A5),(A0)+ ; 807FC040
3FBE:           MOVE.L  $68(A5),(A0)+ ; 00000000
3FC2:           MOVE.L  $70(A5),(A0)+ ; 00000000
3FC6:           LEA     MacII_4th,A0
3FCA:           MOVE.L  A0,D2
3FCC:           LEA     data106,A0
3FD0:           SUB.L   A0,D2         ; 'distance' from data106 to MacII_4th
3FD2:           SUBA.L  D2,A7
3FD4:           MOVEA.L A2,A0
3FD6:           MOVEA.L A7,A1
	  	; save the current VBR to the stack
3FD8:           MOVE.L  D2,D0
	  	; A0 = SE30: 00000000 (src)  - IIci: $FBB08000
	  	; A1 = SE30: 027ff34c (dest) - IIci: $3BF9FC6
	  	; D0 = B4    (count)   - SAME on the IIci!    
3FDA:           _BlockMove ; (A0/srcPtr,A1/destPtr:Ptr; D0/byteCount:Size) 
	  	; write my own VBR...
                ; This copies 180 bytes into 0x000000000 replacing the original VBR. 
                ; ... and kills Macsbug if not circumvented properly.
3FDC:           LEA     data106,A0
3FE0:           MOVEA.L A2,A1
3FE2:           MOVE.L  D2,D0
	  	; A0 = 9F900 (src)   - IIci 10C4EA (data88)    
	  	; A1 = 00000 (dest)  - IIci FBB08000
	  	; D0 = B4    (count) - IIci same  
3FE4:           _BlockMove ; (A0/srcPtr,A1/destPtr:Ptr; D0/byteCount:Size) 
3FE6:           BSR     53_cmd_1x  ; Bring the C040 to life
3FEA:           MOVEA.L A7,A0  ; SP to A0
3FEC:           MOVEA.L A2,A1  ; SE30: 00000000
3FEE:           MOVE.L  D2,D0  ; the code length (B4 again)
3FF0:           BRA.S   lae_163
3FF2:   lae_162 MOVE.B  (A0)+,(A1)+ ; Write the VBR back from the stack
3FF4:   lae_163 DBRA    D0,lae_162
3FF8:           ADDA.L  D2,A7 ; adjust the stack
3FFA:           POP.B   D0
3FFC:           _SwapMMUMode  
3FFE:           MOVEM.L (A7)+,D0-D2/A0-A2
4002:           MOVE    (A7)+,SR
4004:           MOVEQ   #0,D0
4006:           RTS     

; Start of VBR replacement- and 040-Code being copied to 0x0 the by line 0x3FE4 
; /if/ theses are the Vectors 0-17, then their meaning would be:

4008: data106:  DC.L    #$00001000 ; Reset initial Stack Pointer
400C:           DC.L    #$00000050 ; Reset initial Program Counter
; - ALL of these Vectors point to addr 4050 (offset 0x48) -
4010:           DC.L    #$00000048 ; Buserror  
4014:           DC.L    #$00000048 ; Adress Error
4018:           DC.L    #$00000048 ; Illegal Instruction
401C:           DC.L    #$00000048 ; Zero Divide
4020:           DC.L    #$00000048 ; CHK, CHK2 instruction
4024:           DC.L    #$00000048 ; cpTRAPcc, TRAPcc, TRAPV instruction
4028:           DC.L    #$00000048 ; Privilige Violation
402C:           DC.L    #$00000048 ; Trace
4030:           DC.L    #$00000048 ; LINE 1010 Emulation
4034:           DC.L    #$00000048 ; LINE 1111 Emulation

; THESE are definitely no vectors, they are dynamically written by the code above
; and to be used to setup the 040 MMU registers.
4038: data107:  DC.L    #$0009FE00 ;  
403C:           DC.L    #$0009D6E2; 
4040:           DC.L    #$807FC040 ; 
4044:           DC.L    #$807FC040 ; SE30: 
4048:           DC.L    #$00000000 ; SE30: 00000000 
404C:           DC.L    #$00000000 ; SE30: 00000000 

4050:           CLR.L   $53000000  ; Poke 0 to $53000000
4056:           BRA     lae_164    ; This points to itself... I'm lost at the moment.
4058:           LEA     data107,A0 ; SE30: 9F900
405C:           MOVE.L  (A0)+,D1   ; SE30: 0009FE00 (User/Supervisor Rootpointer)
405E:           MOVEA.L (A0)+,A1   ; 0009D6E2
4060:           MOVE.L  (A0)+,D4   ; 807FC040
4062:           MOVE.L  (A0)+,D5   ; 807FC040
4064:           MOVE.L  (A0)+,D6   ; 00000000
4066:           MOVE.L  (A0)+,D7   ; 00000000
4068:           MOVE.L  #$C000,D0
406E$           MOVEC   D0,ITT0   ; Set Instruction Transparent Translation
4072$           MOVEC   D0,DTT0   ; Set Data Transparent Translation
4076$           MOVEC   D1,SRP    ; Set Supervisor Rootpointer
407A$           MOVEC   D1,URP    ; Set User Rootpointer
407E:           MOVE.L  #$C000,D0
4084$           PFLUSHA           ; Invalidates all entries in the address translation cache
4086$           MOVEC   D0,TC 
408A:           LEA     data108,A0
408E:           ADDA.L  #$53002000,A0 ; (=0x530A1900)
4094:           JMP     (A0)      ; JuMP to data108 code (below) in C040 RAM range?     

4096:  data108: MOVEQ   #0,D0
4098$           MOVEC   D0,ITT0 
409C$           MOVEC   D0,DTT0 
40A0$           MOVEC   D4,ITT0 
40A4$           MOVEC   D5,DTT0 
40A8$           MOVEC   D6,ITT1 
40AC$           MOVEC   D7,DTT1 
40B0$           CINVA   BC
40B2:           NOP     
40B4:           MOVEQ   #0,D0
40B6$           MOVEC   D0,CACR
40BA:           JMP     (A1)    ; 0009D6E2
    
; END 040 Code being copied to somewhere by line 3FE4 
40BC: MacII_4th: MOVEM.L D1-D7/A0-A4,-(A7)  ; 4th subroutine called my MacII_handler
[...]

The Vector Base Register

I wasn’t precise when I initially said “replacing the VBR”. What actually happens is that this routine uses what I’d call an interim-VBR for the moment it initializes the 68040 on the C040. You’ve probably saw the link referring to what the VBR is in the 1st post of this series, but let me go a bit more into detail.

The VBR is a list of addresses (aka vectors) the CPU refers to in case of an exception – and this is true for every 68k system out there, e.g. Mac, SUN, NeXT, Amiga or Atari. Some of them might do some relocation using their MMU, but even the virtual address will be 0x00000000 and the order is the same. There are 16 basic vectors as listed here:

If for example a divide-by-zero happens, the CPU would call a handler which address is stored in 0x14. Pretty simple.
So let’s have a look what MacII_3rd left in the VBR (and below that) when the ‘interim VBR’ is in place:

0000: data106:  DC.L    #$00001000 ; Reset initial Stack Pointer
0004:           DC.L    #$00000050 ; Reset initial Program Counter
     ; - ALL of these Vectors point to addr 0x48 -
0008:           DC.L    #$00000048 ; Buserror  
000C:           DC.L    #$00000048 ; Adress Error
0010:           DC.L    #$00000048 ; Illegal Instruction
0014:           DC.L    #$00000048 ; Zero Divide
0018:           DC.L    #$00000048 ; CHK, CHK2 instruction
001C:           DC.L    #$00000048 ; cpTRAPcc, TRAPcc, TRAPV instruction
0020:           DC.L    #$00000048 ; Privilige Violation
0024:           DC.L    #$00000048 ; Trace
0028:           DC.L    #$00000048 ; LINE 1010 Emulation
002C:           DC.L    #$00000048 ; LINE 1111 Emulation
     ; - THESE are definitely no vectors, they are dynamically written by the 
     ;   code above and to be used to setup the 040 MMU registers.
0030:  data107: DC.L    #$00000000 ; SE30: 0009FE00  (12)
0034:           DC.L    #$00000000 ; SE30: 0009D6E2  (13)
0038:           DC.L    #$00000000 ; SE30: 807FC040  (14) 
003C:           DC.L    #$00000000 ; SE30: 807FC040  (15)
0040:           DC.L    #$00000000 ; SE30: 00000000 
0044:           DC.L    #$00000000 ; SE30: 00000000 
       
0048:           CLR.L   $53000000  ; Poke 0 to $53000000 ; C040 off
004C:  blocker3 BRA     blocker3    ; Points to itself... probably a "blocker"
0050:           LEA     data107,A0 ; initial Program Counter (SE30: 9F900)
0054:           MOVE.L  (A0)+,D1   ; SE30: 0009FE00 (User/Supervisor Rootpointer)
0058:           MOVEA.L (A0)+,A1   ; 0009D6E2
005C:           MOVE.L  (A0)+,D4   ; 807FC040
0060:           MOVE.L  (A0)+,D5   ; 807FC040
0064:           MOVE.L  (A0)+,D6   ; 00000000
0068:           MOVE.L  (A0)+,D7   ; 00000000
006C:           MOVE.L  #$C000,D0
0070:           MOVEC   D0,ITT0    ; Set Instruction Transparent Translation
0074:           MOVEC   D0,DTT0    ; Set Data Transparent Translation
0078:           MOVEC   D1,SRP     ; Set Supervisor Rootpointer
007C:           MOVEC   D1,URP     ; Set User Rootpointer
0080:           MOVE.L  #$C000,D0
0084:           PFLUSHA            ; Invalidates all entries in the address translation cache
0088:           MOVEC   D0,TC 
008C:           LEA     data108,A0
0090:           ADDA.L  #$53002000,A0 ; (=0x530A1900)
0094:           JMP     (A0)       ; JuMP to data108 code (below) in C040 RAM range? 
       
009C:  data108: MOVEQ   #0,D0
00A0:           MOVEC   D0,ITT0 ; 0
00A4:           MOVEC   D0,DTT0 ; 0
00A8:           MOVEC   D4,ITT0 ; 807FC040
00AC:           MOVEC   D5,DTT0 ; 807FC040
00B0:           MOVEC   D6,ITT1 ; 00000000
00B4:           MOVEC   D7,DTT1 ; 00000000
00B8:           CINVA   BC
00BC:           NOP     
00C0:           MOVEQ   #0,D0
00C4:           MOVEC   D0,CACR
00C8:           JMP     (A1)    ; 0009D6E2

Farewell, old friend

At this point, my SE/30 always froze and I thought this must be the point where to find incompatibilities between the IIci and SE/30.
But after understanding, what’s really going on, it was clear that overwriting the TRAP exception (Nr.7), Macsbug was simply kicked out of the game as this exception is triggered after every step/trace you do in a debugger…
So to get beyond this point, I had to modify the program counter to skip the point where TRAP is copied-over… which is done inside the Toolbox’ _BlockMove call. So I had to single-step into that and find the right call/time to do a ‘pc=pc+2’ 😉 (Good thing you can define a macro for that).

Okayyyyy. After that’s been written, 53_cmd_1x is called, presumably telling the C040 to come to life.
And keen as it is, it’ll look up the “Reset initial Program Counter” (VBR: 0xC) and starts executing code from 0x50. Any other occurring exception will call the ‘handler’ at 0x48, simply switching the C004 off and sit in an endless loop (0x4C) – probably making the 68030 to take over again.

EmEmYou!

Given everything’s fine, the code at 0x50 will start reading the previously populated data from data107 into several registers.
Then some serious 68040 MMU table setup happens – so this is some kind of ‘040 initialization routine… and the ‘040 is actually running. Woohoo!

Time for some special register explanation:
As we all know, the 68040 has two in-build 4k caches and an MMU. The latter can be programmed how and what to cache. This is defined in 4 registers of which only 2 are of interest here: ITT0 and DTT0, the Instruction and Data Transparent Translation registers, both sharing the same bit-fields following this pattern:

BBBBBBBBMMMMMMMMESS000UU0CC00W00

B – Logical Address Base – compared with address bits A31-A24. Addresses that match in this comparison are transparently translated
M – Logical Address Mask – setting a bit in this field causes corresponding bit in Base field to be ignored
E – Enable Bit – 1 – translation enabled; 0 – disabled
S – Supervisor Mode – 00 – match only in user mode 01 – match only in supervisor mode 1x – ignore mode when matching
U – User Page Attributes – ignored by 040
C – Cache mode – 00 – Cacheable, Write-through 01 – Cacheable, Copyback 10 – Noncacheable, Serialized 11 – Noncacheable
W – Write protect – 0 – write permitted; 1 – write disabled

Here’s an example:

807FC040 = 10000000011111111100000001000000 
           BBBBBBBBMMMMMMMMESS000UU0CC00W00

which means:

a bit less than 2GB transparently translated (2032MB)
translation enabled
Supervisor Mode: ignore mode when matching
Cache mode: Noncacheable, Serialized
Write permitted

So let’s have a look at the code again:

At 0x70/0x74 the MMU is set to 0xC000, i.e. Enable translation, apply for user & supervisor mode, write-though cache, for logical address space 0x80000000-0x00ffffff (2GB minus the bottom 16MB).
Then Supervisor & User Rootpointer are set to 0x9FE00, then the address translation cache is flushed to finally set the Translation Control register to Enable & 8K page size (0x88)… up to here this was pretty much ‘by the book’ of how to set-up MMU tables.

Having its MMU all set, the 68040 now gets something to chew on:
The address of data108 is added to 0x53002000 and jumped to!
💡 Does 0x53002000 equal 0x00000000 for the C040?

Let’s assume the C040 executes the code at data108 for now. That is:

Clear the ITT/DTT registers
Set the MMU to 0x807FC040 (see decoding example above)
invalidate caches and wait’a’NOP to have that happened
then disable all caches
and jump to where A1 points to. In my SE/30 that’s 0x9D6E2, previously loaded from data107 in 0x58

Writing all this from the top of my head, I’m not 100% sure where this address is pointing to. I must be somewhat back into MacII_3rd (0x3FEA), because this is where the program execution resumes (Need to check this with Macsbug and will update).

For now, I’m tempted to call MacII_3rd something like ‘C040_MMU_setup‘… but I’d love to have this confirmed 💡 by somebody who knows more than me 😉

Next up will be continuing working further through the main: procedure again… so move over here.

68000, Apple 68k, Carrera040

Carrera in an SE/30 – the code part 3

April 11, 2019 Axel Leave a comment

Ahh, back in cosy main: – looks much easier now after that crazy MMU stuff in the previous part, right?

The next subroutine called is proc32. In the complete source code (reminder: Available at GitHub) I commented that with “works (get some RSC strings)“… and well, that sums it up pretty good.
proc32 loads (i.e. creates handles) from the resource-fork, e.g. the icons used in the menu-bar and several error-messages like “This application must run on the 68030 processor, please quit all other 68040 applications and re-run this application.“. That’s it. Boring…

That boredom instantly changes when we get to the next subroutine proc43located at 0x29DA…

I did it my way…

One fascinating thing about classic Mac OS is how easy it is to patch system calls, aka Toolbox traps. For example in the previous post we came about _BlockMove, which is a Toolbox call to copy an amount of RAM from A to B.
For example you have just read this article about a faster BlockMove method, you’re totally free to patch (read: replace) _BlockMove with your speedier version and automatically use this throughout your application – or even system-wide, if you’ve created an INIT… [If you want to know all about it… here’s a book for you]

And that’s what proc43 heavily does. Because it’s a long subroutine (230 lines) so I will give you just one example – the inline comments should do…

2BE2:        MOVE    #$A02E,D0     ; BlockMove
2BE6:        _GetTrapAddress newOS ; (D0/trapNum:Word):A0\ProcPtr 
2BE8:        MOVE.L  A0,$270(A5)   ; oldBlockmove
2BEC:        LEA     data42,A0     ; myBlockMove
2BF0:        TST.B   MMU32bit      ; loMem global "current address mode"
2BF4:        BNE.S   lae_70        ; skip if 32bit clean machine else
2BF6:        LEA     data43,A0     ; use a different entry for dirty machines
2BFA: lae_70 MOVE.L  A0,$274(A5)   ; save routine pointer to $274(A5)	
2BFE:        LEA     data41,A0     ; DC.L 0000 0000
2C02:        MOVE.L  $270(A5),(A0) ; save oldBlockmove vector into there
2C06:        MOVE.L  #$A02E,D0     ; BlockMove
2C0C:        LEA     data40,A0     ; aaaand replace it by myBlockmove
2C10:        _SetTrapAddress newOS; (A0/trapAddr:ProcPtr; D0/trapNum:Word)

This is the sum up what else being done:

Save all debugger vectors into A5-world locations (suspicious. I sense Macsbug killing…)
Load the PACK4 resource, that’s the Floating Point emulation package (aka SANE) if no FPU found
Check & read several system Gestalt codes into A5-world (0x2AAC-0x2B44)
Patch several Toolbox traps
- SwapMMUMode replaced by data19
- VM_Displatch by data22
- Pack4 by data10
- Pack5 by data11
- BlockMove by data40
- jClearCache by myClearCache
- GetNextEvent by myGetNextEvent
- GetResource by myGetResource
- SCSIdispatch by mySCSIdispatch
- DrawMenuBar by myDrawMB
- LoadSeg by data31
- UnLoadSeg by data32
- HWPriv by data33
- vStdExit by data34

So far, so many. Then there’s some RAM copying going on, of which I’m currently not quite sure what it is good for (0x2CAC-0x2CD8) 💡 .

Finally, the myShutdown routine is installed into the Shutdown Manager, i.e. it will executed before the Mac is powered down/restarting (it simply switches the host back to its own 68030). After that, RTS into main…

“There and back again…”

Barely back in main, a JSR 12(A6) warps us into MacII_4th, the last of the four handlers every supported system has.

This loads specific data from the FPSP into RAM (namely IDs 0x12C and 0x12D).
Finally a special floppy driver is installed (myFloppyDrvr @ 0x954) which IMHO just differs from the original in handling the ‘040 caches correctly. That was that and back to main…

The next sub-routine in line is chkATalkVer. I can rightfully name that routine because it’s short and crystal clear: Figure out if AppleTalk is installed, and if true, return its version in D0 (and also write it into A5-world). C’est ca…

This is the end…

It’s getting ugly (for now)… proc42 will be called – the last subroutine in main before my SE/30 crashes and burns 😥

The first few lines (0x28F4-0x293C) are comparably harmless. They are working around a bug in System 7.1 which was corrected in 2/17/92 according to some dark sources (“Corrected value of timeSCSIDB from 0DA6 to 0B24”).
After that, proc38 (0x293C) is called which again calls proc39 and something’s done with the TimeManager, not really sure what’s exactly going on, but it feels like a timing-benchmark heavily using InsTime, PrimeTime and RmvTime Toolbox calls.

[hold yer breath] Then we’re getting closer to the flat line… The stack is filled with these parameters:

2940:   CLR.L   -(A7)       ;PUSH.L 00000000 
2942:   CLR.L   -(A7)       ;PUSH.L 00000000 
2944:   CLR.L   -(A7)       ;PUSH.L 00000000 
2946:   PUSH.L  #$80008000  ;       80008000
294C:   CLR.L   -(A7)       ;PUSH.L 00000000 
294E:   PUSH.L  #64         ;       00000040
2954:   PUSH.L  #1          ;       00000001

and SpeedProc is called…

…To be continued 😉

P.S: I changed course (again) and started to investigate more into the C040’s hardware. The more I understand of the INIT/CP workings the more I can’t fight the idea that it really might be a hardware timing issue.

68000, Atari, STGATW

The STG[A]TW

October 13, 2023 Axel 2 Comments

This is my first ever project I did for one of my favorite computers, the ATARI Mega-ST. Like told in one of my blog posts, the ATARI ST was my 2nd greatest love ❤ (after the C64) and being part of a very cool company back in the days I only have fond and happy memories of it.

After all the years of fiddling with nearly every machine on the market, it’s like coming home by just looking at its system font or hearing it’s specific bell-sound (even the ever-annoying key-click sound it makes by default).
And now it’s time to do something cool with it… adding, what I’ve missed back then: Color and -of course- Transputers 😉

TLDR;

Ok, so you’re in a hurry or suffer from severe ADHD?

This is a graphics card for the ATARI Mega ST internal bus including a Transputer interface.

Got it. More details please…
What about software? (links to a different post)
Why, for god’s sake!?
There’s a relocator, too?
Ok, how much?

NB: This card is now superseded by the ATW800/2

Say hello to the STG[A]TW!

What’s that about the strange naming?! Well, this card is a hybrid of a classic STGA ISA graphics-card adapter and a Transputer interface for the Mega-ST bus.
Mega-ST, high-res graphics and Transputers? Mhh, does this ring a bell? Yes, component-wise this is exactly the configuration of an ATARI ATW800, the famous and rare ATARI Transputer Workstation (for which I designed a Farmcard, just in case you own an ATW).
So adding the two, it’s an STGA-ATW or STG[A]TW for short… and it looks like this:

Looking at the top you’ll spot the 90° angled ISA Slot at the right edge, giving (selected) ET4000 graphic cards a home.
To the left there are two Transputer TRAM slots making it possible to use two size-1 or a single size-2 TRAM.
Obviously, an ISA card and the TRAMs would collide, so you have to choose… or you’re a lucky owner of a low-profile ET4000. Then you could use your VGA card plus one TRAM like this:

But even if your ET4000 card is covering the whole STG[A]TW don’t despair! Looking at the backside you can spot the external Transputer link connector (on the right edge):

Using this you can connect to e.g. an external Transputer(-farm) of any size… for example something like my 64 CPU Final Cube 🔥

Looking further around the backside you can spot a preparation for a CR2032 coin-shape battery holder. That is meant to replace the two AA batteries used in the original case-lid because depending on the TRAMs used, it might be necessary to remove the battery compartment (yes, you’d need to cut it out 😰) .

Talking about power… at the bottom you can see the external power connector which supply is mandatory – you need to connect at least 5V and ground, optionally 12V if your ET4000 needs that.
That said, I highly recommend to make sure your Mega ST’s PSU is powerful enough – best would be to replace it by e.g. a Maxwell RD-50A.

Why?!

I knew you’d ask. Well in case you haven’t noticed yet, I’m a total Transputer nut. It’s a fabulous, genius CPU and design. The more you dig into it, the more you’ll love it.

Back then I adored the ATW800 and always wanted to own one. But it was insanely expensive and -to be honest – wasn’t a real member auf the ST/TT-family anyhow.
This is because the Mega-ST1 inside the ATW was mainly used as a bootup machine for the Transputer and after that was up and running, everything the ST did was file- and user-I/O (Mouse, Keyboard, RS232).

In my humble opinion, the STG[A]TW is (somewhat) the way how ATARI should have done it back then. Instead of creating an ‘island solution’ they should have used the existing install-base and offer an expansion to it. Plug in the missing parts (graphics & Transputer) and keep the TOS/GEM eco-system in charge.
Users could keep running their applications and use the extra ‘ooomph’ to speed them up. Think of all the accelerators Apple Macintosh users had available to speed up PhotoShop filters or have it do the heavy number crunching of science applications etc.
Even all data has to travel over the bus to the Transputer and back, this is still faster than the 8MHz 68000.

Given that in 1990 about 350 ATW800 were produced and sold at 5000-7000 GBP which equaled to about 13700 DM or 8000$ (that’s about 11400 GBP, 13700 EUR or the same in US$ today),
I bet the number of a “ATW for the poor” would have been much higher.

So, again, why? Well to have Mandelbrot fractals calculated fast and in colo(u)r, of course!
Fast means ~60sec, even using slow GEM routines. Using the same algorithm and iteration depth, the ST’s 8MHz 68000 took nearly 3 hours to calculate the same fractal.

Here’s a quick peek how ‘fast’ looks like:

Evolution – a quick excursion

If you’re into hardware development you might wonder why there’s a very vintage GAL and a semi-vintage CPLD used in this design.
Here’s my explanation and shameful justification 😉

From the very simple and basic design of the STGA I took the usual nerdy feature-creep road to hell 🙄
My initial design naturally included the GALs logic into one big CPLD. And having all address-lines available on this, that design also included (on top of the ISA and Transputer interface) a 68882 FPU, an IDE interface and a ROM decoder… everything worked fine BUT all ‘modern’ ET4000 cards didn’t.
I stared at logic-analyzer traces for weeks and weeks and compared them to the original STGA they were absolutely identical. But whatever I did, I wasn’t able to get ET4k cards with a Rev. TC6100AF chip working.
In the end I decided to keep the STGA part as-is, including the external AND-ing of /LDS & /UDS and inverting of /DTACK and put the Transputer handing into a smaller (and cheaper) CPLD.
Thus the FPU, IDE and ROM decoding was off the table and to be honest, there are other solutions which do that job better anyhow.

From left to right: STGA, the Über-STGA and the final STG[A]TW

So there you have it: Colorful high-res GEM combined with the mighty Transputer power… but I understand, that those low-profile ET4k cards are getting rarer and rarer and not everybody has an external Transputer farm to connect to.
So I made another card or better a so-called CPU relocator…

The TRAM-Relocator

Most (Mega) ST users out there already have one or more expansions to their system, mostly plugging into or onto the CPU creating a ‘stack’ of PCBs.
Because the STGA (as well as the STG[A]TW) overlaps over the Mega STs CPU socket you might want relocate the CPU a bit away from the Mega-Bus socket. Simple relocators simply move it a bit towards the front of the case. But that still results in having a stack of multiple extensions. For example here’s a Storm ST (Alt-RAM) on top of a Cloudy (4x ROM) plugged into a Lightning ST (IDE & USB):

This can get tricky in some crowded Mega ST cases…

I really liked the ‘Bus I/O port design’ of the Exxos’ STF Remake Project having multiple sockets next to each other.
And if you have your original TOS ROMs removed (and replaced by e.g. a Cloudy) there’s actually some space to roll out 4 of them having the Relocator sitting flush on the Mega-ST mainboard (make sure the backside of the Relocator is completely isolated!):

4 Sockets and a cool TRAM socket 😁

Like clearly written on the PCB, SOCK1 goes into the (to-be-retrofitted) CPU Socket and using ‘hollow pins’, it can take a CPU itself.

SOCK2-4 are available to extensions of your choice – all 3 of them are protected against power-surges by a fuse and a diode.
This design decision has been made due to my own painful experience loosing everything which had been plugged into the CPU socket… and the Blitter 😥

In the lower right corner are pins for an additional external power connector, also protected. That might be necessary depending what you’re plugging into those sockets.

Finally, the left edge is a Transputer TRAM socket which can be connected to the STG[A]TW by a 10pin flat-cable providing link signals and a 5MHz clock signal.
That way, you can use the STG[A]TW with an internal Transputer even your ET4000 card is big as a baking-tray.
It is highly recommended to use external power when doing so. The poor 68000 power-pins won’t be enough for it.

If needed, the whole TRAM part can be snapped-off from the Relocator to, uhm, relocate the TRAM elsewhere in- or outside the case or use it stand-alone. For that matter itself features an optional connector for power as well as a place to solder a required 5MHz oscillator and 2 mounting holes.

With everything in place, your “ATW800 for the poor” could look like this:

What you see here is the STG[A]TW plugged in, giving home to a low-profile ET4000 and a Size-1 TRAM.
The Relocator was plugged into the CPU socket and in its 1st slot the Cloudy-Storm and the 68000 sitting on top of it, took seat.
Slot 2 of the Relocator is taken by a Lightning-ST… and last but not least, a second TRAM was put onto the Relocator (you can spot the grey flat-cable connecting it to the STG[A]TW.

Want one?

All this sounds so cool that you want to own a STG[A]TW?
Well, first check out this list:

There’s next to no GEM user software for it yet
👉 but we’re working on it and there’s a pretty good system support in place already – and Helios is running already! 🥳
An extra post on that is ~~currently in the works~~ available here.
Do you have an ET4000 card of which you know it’s working with the NOVA drivers?
👉 I am not able to support you in getting your specific card working – there are just too many models and permutations of possible TOS/GEM/Driver installations. See this atari-forum.com thread to get an idea…
Do you own a TRAM?
👉 I might provide you with one at extra cost, mail me.
Do you have time to wait?
I’m manually building these boards and it’s a lot of work (0.5pich SMD, lots of trough-hole pins to cut and file down etc.)

If that’s 4 times “Yes” I can build & sell you one of the 6 which I have left for 100€ (plus shipping)… yes, that’s hefty but the quite large PCB is 4 layers (for stable power-distribution), just the ISA slot connector is 10€ already, Mega Bus 5€, GAL, CPLD etc.etc…. plus, as said, it takes quite some time to build & test them.
~~Drop me a mail on the bottom of that page if interested…~~

SOLD OUT… sorry 😥

As for the CPU-relocator, I’m selling un-populated PCBs for 8€ (Or get the gerbers here and have yours made at your favorite PCB manufacturer).
I’m not building them because the CPU ‘socket’ (SOCK1) is made of 64 single pins which you have to pry/get out of precision pin-headers.

That’s a tedious work you most likely want to do once… but not many times.

All that said – If you weren’t able to get a STG[A]TW, don’t despair.
I consider this as my stepping stone and learning platform for something cooler to come 😎.
Because I don’t like vapor-ware and hot-air-talking, I’ll tell you more when it’s a) done and b) working.

68000, Atari, STGATW

STG[A]TW programming and software

February 9, 2024 Axel Leave a comment

Ok, you read/heard about the STG[A]TW and want to know more about how to use it and -most importantly- for what it’s good for?

First and foremost, a Transputer is a computer-system of its own connected to a host. In this case an ATARI Mega ST.
But given an available host-adapter that could also be e.g. a Unix machine, a classic PC, an Apple II or even a Commodore C64, C128 or Plus/4…
That host communicates with the Transputer over a link-interface using specific memory addresses or, if available, a library. That way the host can send executable binaries to the Transputer, send or receive data to/from it and control it (boot, debug, etc.).

Because each host system is different, these addresses are different, too. But the transfer protocol and Transputer executables are always the same. So looking at this BASIC code example for the C64 gives you an idea, how it works – the steps are the same for every host-communication no matter which host-system used.

As usual, here’s a table of contents for those being in a rush..

Quick intro about standards & history
How to write code talking to a Transputer
Demos/Programs already available

Quick intro about standards & history

Yes, there have been very different ATARI ST and Transputer interfaces in the past. “Two and a half” systems were most prominent – let’s have a look at them before we go into details of the STG[A]TW.

The Atari Transputer Workstation aka ATW800

I think I’ve already wrote a lot about the ATW800 in several post on this page, even designed an expansion card for it – despite I don’t own an ATW myself.
To make a long story short: This is basically a design, where the ATARI Mega-ST is used as a boot device and after that just handles file- and user-I/O. The Transputer is attached to the ST via DMA and runs the Helios OS and has direct access to the graphics controller called ‘Blossom’. Totally different concept.

KUMA K-MAX

The KUMA K-MAX was a box connected to the ATARI ROM-module port and thus acted as pure ‘number cruncher on a leash’.
There are two reviews still available: The English review of atarimagazines.com and the German ST-Computer article even showing some photographs of which I ‘borrowed’ this:

Transfertech

Outside “the scene” this is a relatively unknown German company which actually made a lot of Transputer-centric hardware.
For the ATARI series they had 3 host interfaces:

A ROM port interface (all ST models)
A Mega ST bus interface (ROM port design botched onto the bus)
A VME-card (Mega-STE, TT)

Like the KUMA K-MAX, this design also attached the Transputer(s) as number cruncher.
As I own all of them, I might write a dedicated post about them some day.

This is how we do it

As all of the above did their own thing, there is and was no standard for interfacing the ATARI ST series – So I defined one with the other ATARI ST Transputer enthusiast André Saischowa, who did some intense ATARI Transputing fiddling back in the days.

In case of the ATARI ST the link-interface ( e.g. STG[A]TW) ‘lives’ at the base address 0xFFFAC0 and uses 18 bytes from there up to 0xFFFAD2. So the complete adress-range looks like this (uneven, so we can address the lower byte of a 68000 word):

#define base 0xfffac0
#define inreg base+1 /* C012 */
#define outreg ((base)+3)
#define instat ((base)+5)
#define outstat ((base)+7)
#define reset ((base)+17) /* writing*/
#define analyse ((base)+19)
#define errflag reset /* reading*/

But you don’t have to bother with those as we provide two more convenient ways to talk to a Transputer.

☝ Some words of warning to the programmers:

While the 68000 in your ATARI is big-endian, Transputers are little-endian. So data being send back and forth might need conversion.
Floating-point variables used by the Transputer are IEEE 754-1985, thus 32 Bit (single precision) or 64 Bit (double precision).
Some compilers like Turbo/Pure-C on the ATARI ST use 80bit doubles.
Those need to be converted by e.g. the xdcnv call from the PCFLTLIB library.

The static way

The raw-way is using an include file called “trproc.h”. It’s – like everything else – included in the program archive, located in the “DEVELOP” folder.

This include-file provides you these calls to receive (get) or send (put) data to/from your Transputer:

get/puttrchar(char)	read/send one byte
get/puttrshort(short)	read/send a short (2 bytes)
get/puttrint(int)	read/send an integer (32 bytes)
get/puttrlong(long)	read/send a long (32 bytes)
get/puttrfloat(float)	read/send a float (32 bytes)
get/puttrdouble(double)	read/send a double (64 bytes)
get/puttrraw(char *array, int length)	read/send an array of length

The calls marked blue are doing the endian-conversion for you.

Additionally there’s a call to check for an available Transputer: checkTransputer(int checkType)

If checkType is ‘0’, this function will return ‘1’ if it was able to find a Transputer or ‘0’ when not.
Setting checkType to ‘1’, the return value will give you the “family” of the found Transputer:

0 – No Transputer found
1 – Found a C004 link-switch
2 – A 16bit T2xx Transputer was found
4 – A 32bit T4xx/T8xx Transputer was found
-1 – Found something unknown

The elegant way – TBIOS

The much more elegant way is provided by André who extended the ‘ALIABIOS’ from a project published in the German computer magazine c’t back in 1989.
It’s a GEMDOS driver called “TBIOS.PRG” and can be put into your AUTO folder or called manually when needed. This driver has all the bells’n’whistles like a proper XBRA-ID etc.

DOS# call-name - result (D0=0 Ok) 

100 SetLinkAdr(Adr:W) D0 =-1 not ready 
101 ByteToLink(Value:W) D0 =-1 Timeout 
102 ByteFromLink() D0 =-1 " 
103 LongWordToLink(Value:L) D0 =-1 " 
104 LongWordFromLink((Value):L) D0 =-1 " 
105 SliceToLink((Buf):L,Len:L) D0 RealLen 
106 SliceFromLink((Buf):L,Len:L) D0 " 
111 TestError() D0 =1 Transputer Error 
112 SetReset() D0 =0 
113 SetAnalyse() D0 =0 
114 BootRoot((FileName):L) D0 <0 Error 
115 NewFunkOk() D0 ="ELK1" functions available 
116 BlockToLink((Buf):L,Len:L) D0= sent bytes
117 BlockFromLink((Buf):L,Len:L) D0= sent bytes -
118 BlockFromLink((Buf):L,XLen:L,YLen:L,Offset:L) without timeout
119 GetCommand(Buf) D0 =-1 no command found 
(as SliceFromLink but shorter timeout)

👉 Need short coding examples here

Programs and demos

As ATARI never planned something like this card, there’s no ready-to-use software… it’s up to you to create miracles 😊
But compared to my 8bit Transputer adapters, there’s quite some stuff to start with:

💾 Visit the Atari Transputer Software repo at GitHub (most recent) or get this ZIP archive containing everything discussed below.

Basic Testing

Yes, literally, we’re testing if your Transputer is working correctly using a BASIC program called T_TEST.GFA – so right, it’s GfA Basic in this case. But in essence it’s nearly the same used for my C64 or Apple II interfaces.
This little Program checks if it can find a link-interface, a Transputer and if so, which kind (16 or 32 bit). If that went OK, it does a little coms-speed test by reading 4KB from the Transputer and times that.

Mandelbrot fractal

You knew that this has to be the first thing to be written 😜
There are two Transputer binaries…

TMANDEL.PRG – the evil, dirty, down-to-the-metal, direct-to-screen-writing version.
This is good for getting an idea of how fast data is being pushed to the Atari ST without much handling overhead.
As this writes to the Screen directly, it only runs in “ST-High” resolution (i.e. 640x400x1).

GEMMAN.PRG – The well behaving GEM version.
It opens a window max’ed to the current resolution and starts plotting the fractal in 16 colors. This takes longer than TMANDEL, as it does quite a bit of GEM juggling before plotting a pixel…

Getting serious

So, this is the part for doing serious things with your Transputer(s) and specifically André Saischowas domain.
He did not only port all needed INMOS tools like iserver to run all the available development tools from back in the days (OCCAM, C, etc) but also ported the Helios server, i.e. the software which runs on the host (i.e. your ATARI) and communicates with the Helios Kernel(s) running on your insane Transputer Farm!
This is a good 75% of what the ATW800 offered – the missing 25% are the graphics which ran on the Blossom chip and was only accessible by the Transputer.

That said you’ll currently find 2 folders in the archive:

C-Code – contains the Mandelbrot demos
Andres – the serious stuff containing
- AUTO – the TBIOS driver and stuff needed during ATARI bootup
- BIN – the INMOS tools like iserver as well as the always-needed ispy utility
- D72UNI – contains the transputer hosted compiler environment based on d7205a (OCCAM) and d7214c (C-Compiler). Visit transputer.net for plenty of documentation on those. See the README in that folder.
- HELIOS11 – well, that’s the Helios v1.1 distribution. It’s way smaller than the v1.3 and good for an initial try. You can later switch to v1.3.1 following these steps.

There you have it (for now) – the ATARI ST is therefore the currently third best supported host platform after the PC running DOS or Windoze NT(!) or SPARCStations running Solaris 2.

68000, Atari, DIY Transputer interfacing

Tto68k

February 11, 2024 Axel Leave a comment

The Tto68k project started by a classic “phone call doodling” situation… but instead of drawing strange patterns I was fiddling alternately with one of Transputer TRAMs and a spare 68000 CPU I had laying on my desk.
At one point it dawned to me, that the 68000 classic 64pin DIL package perfectly fits in-between a TRAMs socket-pins 😲.

Obviously this discovery immediately had to go into a project which I called Tto68k – actually it is a spin-off from the STG[A]TW project which I recently did for the Atari Mega-ST.
So this is fully compatible and everything developed for that card (minus the VGA stuff, obviously).

Where space allows, the PCB offers certain features:

2 LEDs showing the Transputer status (running/error)
An external Link, compatible with the STGATW and my CPU-relocator. Thus you can connect to another TRAM on that one.
Dedicated 5V/GND pins to feed-in external power (if needed)
Version 1.1 will have two “multi-purpose” pins (see below)

So while the features are pretty basic compared to the STGATW, it has one advantage: The 68000 socket is system-agnostic. And I don’t mean just the different ATARI ST models (520, 1040, Mega) but other systems, too. E.g. the AMIGA, the entry Macintosh line etc. As some of them have more advanced bus management than the ATARI, I saved two of the CPLDs pins as “multi-purpose” pins.
For example in the case of an AMIGA these could be used for the configuration chain (/CFGINn, /CFGOUTn).
While in the ATARI STs those will be used for TOS ROM decoding… or whatever comes to my/your mind.

All that said, this post is just an announcement for now.
Like mentioned, I’m working on a Version 1.1 which will be much more usable, especially for other systems than just the ATARI ST.

Background

The MicroMac Carrera040

MacNosy – a users nightmare, a hackers heaven.

Same but different – which is where?

Macsbug reloaded

The drill

Approaching… difficulties.

sysDetect

2nd handler

instFPSP

3rd handler

The Vector Base Register

EmEmYou!

I did it my way…

“There and back again…”

This is the end…

TLDR;

Say hello to the STG[A]TW!

Why?!

Evolution – a quick excursion

The TRAM-Relocator

Want one?

Quick intro about standards & history

The Atari Transputer Workstation aka ATW800

KUMA K-MAX

Transfertech

This is how we do it

The static way

The elegant way – TBIOS

Programs and demos

Basic Testing

Mandelbrot fractal

Getting serious

home of real men's hardware