So here we go, C64 first. Some moons ago I was at least midwife if not the mother of the 8BitBaby manufactured by the faries and elves over at individual computers.
This little card features 4 slot-connectors to the most used home-computer systems as well as a CPLD… and makes a perfect basis for my first C64 “cartridge” ever 😉
The schematic was taken from “Das Transputerbuch” by U. Gerlach and being put into the CPLD saving lots of TTL chips. Actually all you need then is an Inmos C012 or C011 link-interface chip, a 5MHz oscillator and some resistors and caps.
I refrained from designing a complete Transputer system but went the easy route by interfacing to a TRAM. A TRAM is a complete “Transputer system on-a-stick”, like those used in my Tower of Power. All you need for a TRAM is an interface to your system. For the C64 you would need the thing I’ve called T2C64 (aka “Transputer to C64”)
The T2C64 prototype in its full glory:
A short overview what’s on the card:
The left edge is the connector to the C64s expansion slot (ignore the other edges).
The square black chip is the Altera CPLD mainly containing the bus-interface and the Transputer Analyse/Reset/Error signal handling. Ah, and it controls the 2 LEDs.
The long black chip above the CPLD is an Inmos C012. This guy interfaces an 8bit parallel bus to the serial Inmos link ‘bus’.
The silver thing is a 5MHz oscillator to clock the C012.
And finally the longish module at the top edge is a TRAM with 128KB RAM and a mucho-macho T800 Transputer.
The CPLD maps the C012 register into the memory area of 0xde00, using just 6 addresses:
Data in 0xde00
Data out 0xde01
reset/error 0xde08 (
analyse 0xde0c (
So programming it pretty easy, still, you have to read some Inmos manuals about the protocol etc… that said the next chapters will show you some code examples I’ve slapped together just to test the hardware. All sources and binaries (.D64 image) are available in a zip file over here, obviously most of that does not make sense without having a T2C64…
Read on in the next posts for demo-code and even a video. “uuh, videos!”
These 4 sequential PEEKs print 12 34 56 78. Yay! A very simple Ram-disk if you want.
Now something more serious. A very small programm is sent to the T and executed. Instead of peek and poke (1/0) the initial ‘command’ is the length of the program (23 in this case), so the T starts executing after he received the 23rd byte.
The program itself is trivial, after some initialisation it writes
to link0out. The use of it is very simple. If the Transputer is working correctly the first 2 bytes will allways be “0xAA”, if it is just a 16bit Transputer (T2xx) there won’t be more than 0xAAAA. A 32bit Transputer returns the full 0xAAAA0000. Voilá, that’s a simple T-detection.
To handle timeouts end error conditions I had to do some ugly things in BASIC. Sorry, don’t blame me, BASIC v2 is just… very basic.
128 DIM R(4)
129 PRINT"SENDING PROGRAM TO TRANSPUTER..."
130 FOR X=1 TO 24
140 READ T:POKE56833,T
150 WAIT 56835,1
160 NEXT X
170 PRINT:PRINT"READING RESULT:"
181 IF C=10GOTO 220
189 IF TI>N THEN ER=ER+1:IF ER=10 GOTO 220
190 IF (PEEK(56834)AND1)=0 GOTO 189
210 GOTO 180
211 REM ------------------------
220 IF C=1 THEN PRINT"C004 FOUND"
230 IF C=2 THEN PRINT"16 BIT TRANSPUTER FOUND"
240 IF C=4 THEN PRINT"32 BIT TRANSPUTER FOUND"
1000 DATA 23,177,209,36,242,33,252,36,242,33,248
1001 DATA 240,96,92,42,42,42,74,255,33,47,255,2,0
Next up a bit more useful example, reading the Transputer executable directly from a binary file instead of having it in-line.
This example actually puts the Transputer to some sort of use… well, it’s still way beneath him but it does its job as an example.
Two tasks: a) Read the Transputer executable from disk and b) exchange data with the Transputer.
Here we go:
10 PRINT"INIT TRANSPUTER"
15 IF (PEEK(57107))=100 THEN POKE57105,9
20 POKE 56840,0
30 POKE 56844,0
40 POKE 56840,1:POKE 56840,0
50 FOR W=0 TO 1000:NEXT W
60 POKE 56834,0
70 POKE 56835,0
80 REM READ STATI
90 PRINT "I STATUS: ";(PEEK(56834)AND1)
100 PRINT"O STATUS: ";(PEEK(56835)AND1)
110 PRINT"ERROR: ";(PEEK(56840)AND1)
140 PRINT"O STATUS: ";(PEEK(56835)AND1)
Up to here everything should be clear. Init Transputer, read stati -just to be sure.
Now we’re reading the executable (“GRAB”) from floppy and poke the bytes to the Transputer as they come flying crawling in:
150 DIM R(4)
160 PRINT"SENDING PROGRAM TO TRANSPUTER..."
170 REM READ TRANSPUTER BIN FILE
180 OPEN 1,8,2,"0:GRAB,S,R"
190 FOR I=0 TO 1 STEP 0
200 IF ST=64 GOTO 240
220 POKE 56833,A
230 NEXT I
240 CLOSE 1
Ok, the Transputer binary expects us to tell it the number of integers we’re about to send to it (AN), the we send these (X) and finally re-read the results… which should all be X+7.
255 POKE 56833,AN
260 FOR AA=1 TO AN
270 PRINT"ENTER ";AA:INPUT X
290 NEXT AA
295 PRINT"GETTING";PEEK(56832);" RESULTS"
300 FOR BB=1 TO AN
320 NEXT BB
Some final words about Transputer executables. There are many kinds depending which development system was used. Some need special loaders or bootfiles before the actual binary will be send to the Transputer. A ‘raw’ upload is only possible with binaries mostly having the extension of “.btl” (BooTfromLink). AFAIK only the OCCAM SDK and the Inmos C-compiler ‘icc’ will create those executables.
For those geeks who actually (still) do know their OCCAM, here’s the source of GRAB:
PROC first(CHAN OF BYTE in, out)
And now something which you knew it would come: Mandelbrot time! 😉
I don’t want to bore you with all the details before you had it seen in action… so here we go:
“Man, that was brilliant! And even you had a lot of geek-babble in there, I want to know more!”
Ok Timmy, let’s go into detail…
Like I said in the video, the Transputer is finally doing something for real, he’s actually doing the most of the work, crunching through a 320x200x8 Mandelbrot, 32 iterations in double-precision floating point. The code itself was written 1988 in OCCAM by Neil Franklin and is available on his page.
This shows the general beauty of Transputers: If the code is written flexible enough to fit into any topology, it’ll run on any platform!
That said, I ran into a certain limitation on the C64. The Transputer mandelbrot executable expects the initial data (resolution, coordinates, iterations) to be send in a specific order and format. While the order isn’t the problem, the format is: The coordinates have to be doubles (C-lingo i.e. 64bit IEEE 754 compliant float). The C-Complier I’ve used for this demo (CC65) doesn’t now a flying s*** about floats or even doubles.
So to get that demo done ASAP I tricked myself a bit and used the same technique we’ve seen in my 1st demo when POKEing something the Inmos-way:
To get the coordinates for left (-2.0), right (1.0), top (1.125) and bottom (-1.125) over to the Transputer they had to be converted into 64bit IEEE 754 format, ordered into little-endianess and finally put into an array like these:
Yes, that’s a bit awkward, but it was OK to get a fast start. The ‘problem’ with this quick-hack is that the demo is pretty static, i.e. no zooming into the Mandelbrot. If you know of a quick way to create IEEE 754 compliant doubles from a long (which is the biggest floating point variable CC65 can handle, so
isn’t an option here) I’m happy to hear from you.
Of course I could have the Transputer do the typecasting but in this case, as part of a demo, I wanted to keep the original binary untouched.
In the video you saw (or didn’t because of the blurry picture) that the timer printout was about 70s… and as said, normally it takes about 60s to complete the fractal – and it did in the video, too! Watch the video-timer or check with a stopwatch. What I’ve forgot to take out from the timing was the actual upload of the code into the Transputer.
The Transputer binary is in this case a bigger array in the C-source, so it’s not being loaded from floppy but directly pushed to the Transputer after it was initialised. This takes some extra time which also went into the stopwatch timing… I’ll correct that in a later version.
“Later version” is a good catchword. If this wouldn’t be just a demo for now, there are obviously plenty of ways to optimise things:
First of all one should take off the burden of converting the colors from the C64 and let the Transputer do that.
My second idea would be to reduce the communication overhead (polling) by having the Transputer to render the whole screen into his own RAM and when done have it ‘pumped’ down to the C64
Yes, DMA would be cool but that’s not possible (yet)
Ok, that’s about it for now. The T2C64 is still in its prototype stage and I can image many more cool things to add… but first I will have a ‘proper’ circuit board being made.
Final words: Don’t get too excited about the acceleration of the C64… it’s not accelerated at all. It’s more like a co-processor attached to it. And even then, you’ll need something really calculation-intensive to justify the time you’ll loose due to communication between the C64 and the Transputer. A single square-root for example wouldn’t make sense at all. 100 sqrts in one go would certainly do.
Of course adding another linkOut/In to the T2C64 to get more Transputers involved into the calculation would be the final step. This is planned for the next version of the hardware but the bigger part of the work would be a complete rewrite of the Mandelbrot code to have it broken down to parts being run in parallel on each Transputer… which closes the loop to today where programmers are trying to wrap their brains around multithreaded programming. 22 years after the first Transputer was released 😉