Tag Archives: Transputer

The T2C=

It had to be done… and now, 12 years later 😱, it is done:
Finally the T2C64 was poured into a proper PCB and some bells’n’whistles had been added – so it’s now on par with the T2A2 for the Apple II series. Say hello to the T2C=!

TLDR;

Here’s a quick intro for those being to lazy to follow the link to the T2C64.

This card enables your Commodore 8-bit computer to communicate with a Transputer Module piggybacked onto the card.
A Transputer is a 32bit RISC(ish) CPU from the 80’s that has the unique ability to connect to other Transputers by a very simple 2-wire protocol making it possible to create large, powerful computing networks – at least by 1980+ measures 😉

What can I do with it?
How to talk to the Transputer?
Can I do something useful with that?
How fast can I move data back and forth?
Ok, how much?

After many years the Commodore 8-bit bug had bitten me again and it was due time to put some love into my C64 Transputer interface.
But while at it, I thought it would be handy to use this card not just with my C64 but also on all other Commodore machines featuring an expansion port.

This led to the (to my knowledge) first 8-bit Commodore ‘flipper card’, i.e. it has a port connector on each end. One for the C64 & C128 and one for the C264 family, namely the C16, C116 and Plus/4. Yes, it works with all of them. Pull it from your C64, flip it 180° and plug it into e.g. your Plus/4. Cool, huh?
So here’s a quick feature list:

  • Edge connectors to connect to the Commodore
    • C64
    • C128
    • C16/C116
    • Plus/4
  • Each edge connector offers 2 I/O address ranges to be set by a jumper (0xDE00/0xDF00 and 0xFD90/0xFDF0)
  • Offers two TRAM (TRAnsputer Module) slots to connect either 2 Size-1 or one Size-2 TRAM
  • External Transputer-Link connector to connect the T2C= to larger external Transputer networks (pinout is the same as on the T2A2) – not populated on the pictures here.
  • The data-bus is fully buffered to prevent interference with other cards when used in e.g. an expander
  • 3.3V CPLD used to reduce power-consumption as much as possible.

Here’s the card in full glory… without a TRAM plugged in:

….and with one size-1 TRAM which itself provides a 32bit T800 Transputer and 128K RAM. The 2nd slot is still free.

TRAMs came in a wild range of variations. Be it CPUs used on them and/or the amount of RAM. But there also been peripherals like SCSI controllers or graphic cards – check my little TRAM page if you like to get an idea.
Yes, TRAMs are quite vintage and thus hard (or expensive) to get… but don’t despair… I’ve designed my own and also have some old ones in stock – probably enough to serve the hand-full of interested nerds 😉

What can I do with these?

I knew that was coming 😜
Well it mostly depends on you. The T2C= is an accelerator running its own code in its own RAM and can exchange data with your Commodore 8-bit machine – everything is possible.

All code examples and sources are available in this archive.
Commodore files are in a .D64 disk image.

Personally I always have the initial reflex to run a Mandelbrot fractal on everything’s slightly capable to do so. Most of the time, that’s where my euphoria ends and my project-ADHD kicks in… but that doesn’t stop you from having cool ideas.

Technically this setup isn’t much different from slapping a Raspberry Pi to your Commie and let that do stuff… but there’s something I’d like to call the “5 connoisseurs C’s” which might not be everyone’s cup of tee but very tempting to others:

  • Contemporary: Transputers are from the same era like your Commodore machine while being much more powerful – we’re talking about ~15MIPS here.
  • Completely different: Transputers are natively programmed in OCCAM, a very interesting, different language than the one you might be used to.
    That said, no worries, there are C, Pascal and Fortran compilers, too. Here’s a page offering a little “SDK” I created – it’s a VirtualBox image coming with everything you need to start coding.
  • Connected: Transputers are made to be networked into a parallel network… making your well programmed application running even faster as benchmarks show.
  • Challenging: “Well programmed” means wrapping your genius brain around multi-threaded, parallel paradigms or use the fast 2 or 4K on-chip RAM the most clever way.
  • Communicate: And finally, find a clever way to communicate with the host (i.e. your Commodore) and vice versa.

Ideas for using it could be raytracing, do complex calculations, heavily compress/manipulate data, use it as a simple storage (“Stupid-REU”) or write a Helios server for it and use your C= as a terminal [Helios is a UNIXish OS running on 1 to infinitive Transputers]

A final word of warning: While the T2C= uses very little power, a Transputer (and the RAM on a TRAM) does use quite some juice.
Depending on the TRAM this can be as little as 500mA up to 1A – which means your power-supply should be a stronger one.

Communication

Let’s start with the most simple and actually useful code:
Detect a connected T2C=/Transputer and check if it’s working correctly. This code was already shown on my T2C64 posts but now it’s enhanced for newly added machines and runs in BASIC V2 as well as V3.5 or V7.

After telling the base-address it does the following:

  • Init/Reset the Transputer to a sane state
  • Read & display the statuses of the Link interface
  • Write some data into the Transputers RAM and read it back
  • Finally, send a small program to the Transputer which makes it possible to find out its model (16bit T2xx, 32bit T4xx/T8xx or just a C004 programmable link switch)

So here’s the new TDETECT code:

100 SY=peek(65534):print chr$(147);"This seems to be a";
110 if peek(1177)=63 then poke1177,62:sy=peek(65534):poke1177,63
120 if sy=72 then print " c64": goto 160
130 if sy=23 then print " c128": goto 160
140 if sy=179 then print " plus 4 or c16":goto 160
150 print"n unknown model";print
160 print "select T2C= base address"
170 print "1: c64/c128 $de00 (56832, default)"
180 print "2: c64/c128 $df00 (57088)"
190 print "3: c264 $fd90 (64912, default)"
200 print "4: c264 $fdf0 (65008)"
210 print "5: enter your own"
220 input m
230 if m=1 then ba=56832: goto 290
240 if m=2 then ba=57088: goto 290
250 if m=3 then ba=64912: goto 290
260 if m=4 then ba=65008: goto 290
270 if m>5 goto 160
280 input "base address:";ba
290 print"initializing transputer"
300 do=ba+1:rem data out
310 is=ba+2:rem in status
320 os=ba+3:rem out status
330 re=ba+8:rem reset/error
340 an=ba+12:rem analyze
350 rem ------------------
360 poke re,1
370 poke an,0
380 poke re,0
390 rem clear i/o enable
400 poke is,0
410 poke os,0
420 print"reading statuses"
430 print"i status: ";(peek(is)and1)
440 print"o status: ";(peek(os)and1)
450 print"error: ";(peek(re)and1)
460 print"sending poke command"
470 pokedo,0
480 print"o status: ";(peek(os)and1)
490 :
500 print"sending test-data to t. (12345678)"
510 poke do,0:poke do,0:poke do,0:poke do,128
520 poke do,12:poke do,34:poke do,56:poke do,78
530 print"i status: ";(peek(is)and1)
540 :
550 print"reading back from t."
560 poke do,1:rem peeking
570 poke do,0:poke do,0:poke do,0:poke do,128
580 print peek(ba);peek(ba);peek(ba);peek(ba)
590 :
600 dimr(4)
610 print"sending program to transputer..."
620 forx=1to24
630 readt:poke do,t
640 wait os,1
650 nextx
660 print:print"reading result:"
670 c=0
680 n=ti+50
690 ifc=10 goto 760
700 if ti>n then ee=ee+1:if ee=10 goto 760
710 if(peek(is)and1)=0 goto 700
720 r(c)=peek(ba)
730 c=c+1
740 goto 680
750 rem ------------------------
760 if c=1 then print"c004 found"
770 if c=2 then print"16 bit transputer found"
780 if c=4 then print"32 bit transputer found"
790 data 23,177,209,36,242,33,252,36,242,33,248
800 data 240,96,92,42,42,42,74,255,33,47,255,2,0

Do something useful?

So now that we have detected a connected Transputer on our Commodore, it should do something useful like… adding numbers.
While this is way beneath his dignity, it’s a good example of uploading code to the Transputer and how to send and read data.

For this, I’d like to redirect you to the 2nd code example I’ve posted for the T2C64…

And finally in this former post I coded a Mandelbrot fractal (Video inside! 😉) for the C64 using cc65 and the TGI graphics library which calculates and displays the initial fractal within a minute or so.

Now having the whole family of C264 machines added, I thought it would be nice to have a demo for them, too.
So, just because it can do graphics out of the box,  I wrote the Mandelbrot “frontend” in BASIC 3.5. It worked but it was brutally slow…it takes like 10 minutes or so to get this screen 🐌

That is – of course – because BASIC is darn slow in doing the IO and plotting. Looking at one of the above examples, reading a byte from the Transputer means read a byte, set “pen” to the next coordinate, decide if to plot or not, repeat – in code (provided in the D64 disk image) this looks like that:

390 for y=0 to 199
400 :for x=0 to 319
420 ::px=peek(ba)
430 ::if px=32 then draw 0,x,y:else draw 1,x,y
440 :next x
450 next y

Because it’s so slow, I even didn’t need to check the input-status of the link-interface as the Transputer delivers the data much quicker than BASIC can say “next”…

This of course will be the ultimate show-stopper. What’s the sense of such a fast number cruncher, if you can’t get the data out of it fast enough?

Speed?

Mhh, so how long does it take to (just) read data from the T2C=?
Let’s start with BASIC to have milestone. This is “BAS-SPEEDTEST”, a very simple benchmark.
It loads a tiny Program into the Transputer which makes him spitting out an endless loop of counting from 1 to 10. Then we read the amount of 4KB and stop the time on that.

NB: As seen on the examples above, there’s an automatic handshaking in the way that the C012 link-interface chip on the T2C= sets a flag (Out-Status) each time there’s a byte ready to be fetched. But BASIC is so slow, that there’s always new data available the next round reading.

100 ba=56832:rem Adjust your base accordingly
110 dd=ba+1:rem data out
120 is=ba+2:rem in status
130 os=ba+3:rem out status
140 re=ba+8:rem reset/error
150 an=ba+12:rem analyze
160 rem ------------------
170 poke an,0
180 poke re,0
190 poke re,1
200 for d=1 to 500:next d
210 poke re,0
220 print"sending program to transputer..."
230 forx=1to33
240 readt:pokedd,t
250 waitos,1
260 nextx
270 print"reading incoming data..."
280 zeit=ti
290 for l=1 to 4096
300 in=peek(ba)
310 next
320 print"time for 4k:";(ti-zeit)/60
370 rem --- for the transputer
380 data 32,181,36,242,33,248,36,242,33,252,37,247
390 data 34,249,70,33,251,36,242,74,251,96,7,1,2,3
400 data 4,5,6,7,8,9,10

That showed it clearly… Basic is an IO-sloth.
Even without waiting for the Input-State ready it took the C64 16.5 seconds to read 4KB – nearly twice as long if we check for the input-status.

Machine without WAIT IS with WAIT IS
C64 16.5 29.4
C128 24.28* 40.5
Plus/4 19.8 33

*) It’s strange, that Basic V7 is even slower – investigation is ongoing
Talking to “the 128 Master” (Johan Grip) the mystery was solved.
Basic 7 also does a “long” fetch through extra vectors and code in ram. That does add quite a bit of overhead.
You could say that BASIC 7 has a “bad peek performance” 🙂

More speed, please!

Ok, let’s use something more mature… like the cc65 creating nice code for all our beloved Commodore machines.

This is a longer one... so please expand
#pragma static-locals(1);
 
#include <stdlib.h>
#include <time.h>
#include <conio.h>
#include <peekpoke.h>
#include "trproc.h" // that's in the provided archive
 
#define TOBEREAD 4096 // how many bytes should be read
 
static char tcode[33] = {
0x20, 0xB5, 0x24, 0xF2, 0x21, 0xF8, 0x24, 0xF2, 0x21, 0xFC, 0x25, 0xF7,
0x22, 0xF9, 0x46, 0x21, 0xFB, 0x24, 0xF2, 0x4A, 0xFB, 0x60, 0x07, 0x01,
0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09, 0x0A
};
 
int main (void)
{
clock_t t;
unsigned long sec, kbps;
unsigned sec10;
 
int i;
char onechar;
 
clrscr ();
 
#if defined(__C64__) || defined(__C128__)
cprintf ("Expecting T2C= at 0xde00\r\n"); 
#elif defined(__PLUS4__) || defined(__C16__)
cprintf ("Expecting T2C= at 0xfd90\r\n"); 
#endif
 
/* Init Transputer - fixed to plus/4 dfeault for now */ 
init_t(); 
 
/* upload Transputer code */
cprintf ("Sending code to Transputer\r\n");
puttr(tcode, (sizeof(tcode)));
cprintf ("start reading %d bytes...", TOBEREAD);
t = clock ();
 
/* reading X KB byte by byte*/
for(i=0; i < TOBEREAD; i++) {
gettrchar(onechar);
}
cprintf ("done\r\n");
t = clock () - t;
 
/* Calculate stats */
sec = (t * 10) / 50;
sec10 = sec % 10;
sec /= 10;
kbps = TOBEREAD / sec;
 
/* Output stats */
cprintf ("\r\nDuration: %lu.%us (%lu byte/s)\n\r", sec, sec10, kbps);
 
/* Done */
return EXIT_SUCCESS;
}

But this time it just took 4.4/4.6/3.6 seconds on a 64/128/+4! 🏍💨 Four times faster than BASIC.

Are we there yet?

That looks promising and there’s still a C-compiler which we can optimize… read: replacing it with assembly code super-power.

For that I wrote a little macro-library for KickAssembler. Any other assembler will do, too, of course.
Besides the Transputer initialization and detection stuff there are macros for reading and writing a single byte, up to a “page” (256bytes) and the full 64K using two zero-page adresses… this is how we read the 4KB in the benchmark:

.label base = $de00 // define according to setting
.label inreg = base
.label outreg = inreg + 1
.label instat = inreg + 2
.label outstat = inreg + 3
.label reset = inreg + 8
.label analyse = inreg + $c
.label errflag = reset
 
		* = $C000 // or wherever you like
start:		
		inittr()  // macro init_transputer
 
		lda #$2E  // print a "." at 1/1 for debugging ;)
		sta $0400
 
		puttr_1page(bench, 34)  // upload the benchmark code
 
		lda #$00  // Set destination pointer to base ($8100)
		sta $FB   // in zero page
		lda #$81
		sta $FC
 
		gettr_big($FB, $1000) // get 4KB data and write it to $8100
 
		rts
 
		// code for benchmarking & testing (33 bytes)
		// The first byte is always 'sizeof(code)'
  bench:	.byte $21, $20, $B5, $24, $F2, $21, $F8, $24, $F2, $21, $FC, $25, $F7
		.byte $22, $F9, $46, $21, $FB, $24, $F2, $4A, $FB, $60, $07, $01
		.byte $02, $03, $04, $05, $06, $07, $08, $09, $0A

“Wrapped” as a SYS call into a BASIC program to stop the time – including the Transputer code upload and checking the input-status – this code takes 0.5 seconds to read 4KB and even with writing the read data to a defined memory area! 🚀
That’s 33 times faster than BASIC and still 8 times faster than cc65.

Getting one?

Still with me and you’re really interested in getting one of these?
Please go through this checklist first:

  • You’re aware that there’s no software for it yet
  • You’re aware that you have to code for the Transputer and are up to learning new things
  • You also need to write code on the Commodore side – assembly needed for maximum speed
  • Besides the T2C= you will need a TRAM. So you need to own/purchase one, too.

If you can answer all of them with “Yes”, “fine with me” and/or “sure!” I can provide you with a T2C= for €40 plus shipping.

I also have TRAMs available in different configurations:

  • My own design, the AM-B404  (size-1, 2MB SRAM): 40€
  • Various manufacturers: size-1, 1MB DRAM: Ask me.
  • “bargain offer”: original INMOS IMS-B404 (size-2, 2MB): 25€
    (given their size, they will clog your T2C= completely)

available CPUs are:

  • T425-20: 7€
  • T800-20: 12€
  • T800-25: 16€

For example, if you like to have a T2C=, my AM-B404 TRAM and a 20MHz T800 that would be 40 + 40 + 12 = 92€ plus shipping

Shipping with tracking is
European Union 9€  (Just in case, for Germany it’ll be 6€ as “Paket”)
UK/Switzerland 13€
USA 15€

⇒ drop me a mail tonobody likes SPAM
(Sorry, you have to type that into your mail-client – nobody likes SPAM, so do I)

The final Cube – Snow white coffin

This is it. Hooray! The final Cube as I always wanted it to be.
It just took me about 2 years of planning, blood, sweat & tears, huffing and puffing. Many tries to find the right parts, plenty materials evaluated always trying to keep the budget low.

The sleeping beauty – yeah, it’s a bit snow white coffin’ish

Meet the ancestors

You might have followed the route I took for quite some time now:
It began with the ‘Tower of Power‘, basically a component carrier with a power-supply.
After some years it was replaced by the first Cube.  Well, yes, while it had a somewhat cubic’ish case, is still was just a dull standard industrial case. Not really what I imagined how my computer should look.

Form follows function

If you’ve read some posts here on GeekDot, you might already got the impression that I’m a sucker for design. Well, not that kind of a surprise, given I studied design some decades ago 🙄
I’m also heavily influenced by the works and philosophy of Dieter Rams (mainly for BRAUN) and Hartmut Esslinger (of frog design), of which you might not have heard about, but you know their designs for sure…
So I fell in love with the Parsytec x’plorer and other iconic computer designs like these ‘cubistic’ examples:

Yes, I am a strong believer that a computer, while basically being a rather boring calculation tool, should look good, timeless and might give you an idea of its innards are actually doing something.
We could probably go on forever, defining how a well designed computer should look like. But like the Romans used to say: “non potest argui per gustum” (You can’t argue about taste)…
Let’s say, I’m probably not totally off, given that most designs I like are also on display at the Museum Of Modern Art 😉

So mentioning the Final Cubes design, it’s  case we’re talking about: If you’re really picky, then yes, the Final Cube is actually two cubes:

The carrier-cage on the top which I tried to keep simplistic and invisible to give the PCBs as much stage as possible. The user should be able to see the many, shining CPUs. So 10x10mm aluminum square tubes are connected by 3D-printed frame-corners to provide maximum view onto the technology.
For protection but also as a design statement and tribute to Rams’/Gugelots famous ‘snow white’s coffin‘ everything is surrounded by a 30x30x30cm translucent acrylic cube.

The white base actually isn’t cubical at all being much wider than tall. Nevertheless, its design should be even more simplistic and cautious to serve three purposes:

  • Give the computing-parts above it a proper podium
  • House the LED array which provides the fitting aura
  • …and finally house all the tech the user should not care about

With quite a big fan between the base and the top both work like a chimney (following the convection) sucking the air from the bottom and blowing it through 169 holes in the top plate of the cube.
Here’s an idea out how it looks “working”:

When one thing comes to another

The parts of which the Final Cube is build from aren’t all created this year. Actually only the case and the cage-frame are from 2019 – all other parts were designed by me some years before.

The core of everything are TRAMs – these are Transputer computing modules defined by Inmos back in 1990. The specific TRAMs used are my own AM-B404, each containing a 25MHz T800 and 2MB of  fast SRAM.

Finest home-made TRAMs

16 of these TRAMs are placed onto an Inmos B012 (or compatible) carrier board. And up to 10 of these carriers can be put into the Cubes carrier-frame creating the cluster.

Under the hood

Below the carrier-frame, in the base, you can spot a 32×16 LED panel. This one is actually from 2012 when I designed the T2i2c, an i2c-bus to Transputer TRAM.

Yes, that’s an Arduino Micro on top of a TRAM

So it was a natural move to make the T2i2c into a system-controller. It does not only controls the LEDs displaying the current load of all Transputers, but also using a photo-diode to set the display brightness as well as measuring the internal temperature and overall power consumption.

Here’s an overview of the base internals:

I know, the venting holes are not pretty – but they do their job and prevent you from accidentally touching the power-supply.

The red arrow points to the T2i2c being connected to the LED panel to the left as well as to a hall-sensor (blue arrow) measuring the power consumption, a temperature sensor (orange) and a photo-diode (green).
You cannot overlook the 22cm fan in the back sucking air from the bottom along the power-supply and pushing it up to the Transputers above to keep them cool.

And their power consumption is not to be trivialized. In average a single Transputer TRAM requires is about 1 ampere… so the math is easy. This means the quest for a powerful power-supply was on.
After some months I found what used to be the power-supply meant for a 3Com Corebuilder 7000: The mighty 3C37010A. A whopping 90A@5V should be OK for starters… here’s the fitting procedure:

Mooooore powerrrrrrr, Igor! You touch, you die!

The back of the medal…

The backside did not change compared to the previous Cube back  – well besides the supply-cabling which now goes down into the base instead to the side of the cage.


In consequence you’ll spot the power-connector there. No switch though  – still thinking about that… as well as a nicer cable-management for the link-cable which is normally connected to the host.

Next up would be a host matching the look. Mhhhh….

ATW800 Farmcard

Sometimes a man has to do what a man has to do 😉 And this time it got to be an ATW800 Farmcard.
A what? ATW800 stands for ATARI Transputer Workstation – basically an Atari ST and a Transputer subsystem in a tower case (just about 250 were build and sold) and looks like this:

Atw front.jpg

Background

Even I still do not own an ATARI Transputer Workstation (yet), I “know” it from the very first presentation when working at the 1988 CeBIT Atari booth back then. The initial demo version used a standard Atari Mega-ST connected to a blunt Atari-PC3 case which housed the Transputer stuff… and the whole thing ran the Helios Operation System about which you can read quite a lot in this GeekDot chapter. Well, actually that was no wonder given that Perihelion Ltd. was not only the creator of Helios but also the birthplace of the ATW800 (Called ABAQ in the early stages).

The final version (the one in the tower case) used so-called Farmcards to expand the number of Transputers available to the system. A Farmcard featured 4 fixed Transputers (T425 or T800) with 1MB of RAM each. Here’s a piccy (click to zoom):

That might be plenty back then, but even in those days progress was made quickly and INMOS presented their TRAM Modules at nearly the same time… and these give you so much more flexibility and also save space. Actually, if you closely read the marketing material from Perihelion you will spot the announcement of a TRAM Farmcard. It just never made it…

I did it my way…

So it was just a question of time until I start to design a TRAM-Farmcard. After many years of unsuccessfully hunting an ATW which is for sale, I at least met shock__ (atari-home.de forum) a lucky ATW800 owner who was happy to help. So using the available documentation and the data provided by shock__ I created this:

Features

So what gives? Well, first of all, there aren’t any original Farmcards available anymore. Even if there were, they’re pretty limited. One Megabyte is enough for a computing slave using pure OCCAM but if you’re using Helios, 300k will already ate up by its core demons. Then there’s space for just 4 Transputers, not many given the size of the card.
That said here’s what this new Farmcard provides:

  • 8 TRAM slots. Can be used for anything. Computing, SCSI, Ethernet or graphics TRAMs. Be it size-1 to 8. There were many, many cool modules available. Today you can at least use my AM-B404 TRAM, a super-fast 2MB size-1 compute TRAM.
    This would mean twice the CPU oomp, and double the RAM.
  • Freely programmable Altera EPM7032/64 CPLD. It will implement Perihelions diagnostic-bus, meant for controlling each Transputer separately. Updates can be done in-field.
  • An external RS422-link. Using differential transceivers (26LS31/32) one can connect external Transputer networks over a distance up to 20-30 meters (vs. 30cm in pure TTL).
  • 4 layer PCB design giving stable power distribution throughout the board. Still, simple layout, easy to patch and only through-hole parts being used to make building as easy as possible.
  • 2 LEDs. Just say’in…

Because of the added Transputer-Slots the network topology looks a bit different than is used to be on the original Farmcard.
This is the “old one”, a simple square, each Transputer has two free links being available at the edge connectors J1-J8:

And this is how the 8 slots are connected:

Now each Transputer has just one link connected to the edge-connectors. Slot 7 uses link-3 to connect to the optional external RS422 connector.

Free as free beer!

Because there are probably only two handful of ATW800 owners out there, I made the schematics freely available here under the GPL license. But to make things clear:
This is untested stuff! The card is huge. It is not cheap to have it manufactured. The best price for 10 PCBs I was able to find was about $250.

The hard part

While the cards design was pretty straightforward the “firmware” of the CPLD will be a different beast. As said, Perihelion used a proprietary bus called “diagnostic bus” (DBus) – a two wire bus used to address all or just a single Transputer in a network to either reset or put him into the so-called analyze-state to do some post-mortem debugging. Pretty advanced stuff given that standard Transputer networks simply used a global reset.

Luckily the DBus is documented in the (short) Farmcard Manual (p.8-10). So we have a rough idea what’s going to be expected:

The DBus is daisy-chained through all Farmcards on the slots as well as on the edge-connectors J9 & J10 (in/out). Because I hadn’t had an original Farmcard at hand I wasn’t sure which of these signals are needed to be controlled by the CPLD. So I connected them all using all available I/O ports.  This is the pinout:

pin function
04 5_ANA
05 1_RES
06 1_ANA
08 LED1 (Error)
09 L_IN
11 L_OUT
12 FAST_IN
14 FAST_OUT
16 SLOW_IN
17 SLOW_OUT
18 GLOBAL_RESET_IN
19 GLOBAL_RESET_OUT
20 T_ERR_IN
21 T_ERR_OUT
24 LED2 (Opt)
25 4_ANA
26 4_RES
27 2_ANA
28 2_RES
29 3_ANA
31 3_RES
33 7_ANA
34 7_RES
36 6_ANA
37 6_RES
39 0_ANA
40 0_RES
41 5_RES

So, this is it for now… it’s an ongoing project and it depends on you how fast progress will be made.

  • Do you own an ATW800? We’re looking for brave testers!
  • Are you a VHDL hero? Grab the manual and do your thing!
  • Transputer nut? Feel free to check the schematic… I’m not swearing it’s 100% bug-free.

To be continued…

RS422 Differential Adapter

This is a Q’n’D but handy design which I was planning for a long time, but you know how things go… A differential adapter is a simple device which translates a single signal/data line in one positive and one negative line, i.e. one is carrying the inverted signal of the other. This is also known as RS422… But why do we need such thing?

Well, differential signals are/were quite common in the world of Transputing to deliver OS-link signals over longer distances (<10m/30ft) or noisy environments and good examples are systems from Parsytec, the AVM-T1, ISA interfaces like the HEMA TA2 or even some custom measure/data-logging solutions.

So to connect to such systems using a non-RS422 (i.e TTL) interface card, e.g. the INMOS B004/008 or the Gerlach-Card you’ll need such an differential adapter.

Technical data

A very simple design on a 5x5cm PCB. All done in trough-hole technology (THT) to keep this a simple DIY project.
It all based around two AM26LS32 receivers and one AM26LS31 transceiver, some resistors and an unavoidable LED  😉

Usage is very simple, too. Just connect the TTL signals to the standard 2×5 shrouded PCB header connector and RS422 signals are provided on the 2×7 header connector on the opposite side of the PCB.
The TTL connector is the same as the one used on the Gerlach card or my T2A2 Apple II interface:

As you can see, pins 9 & 10 can provide 5V to the PCB so that no external power-supply is needed.
If you connect a system not able to provide power to the differential adapter there’s also a 2pin connector provided on the PCB itself.

The RS422-side of the adapter has this pin-layout:

Nice-to-know: You could also use this adapter to simply invert signals. For example if you need a positive ERR signal instead of the not’ed, you simply take the positive pin (i.e. even pin-number 6 in this case) of the 2×7 connector (ignoring the negative).
Mind there’s no power supply pin on that side and pins 11/12 are not connected.

T2A2 for everyone

While the guys over at OpenApple were speculating in their latest podcast what a Transputer actually is… well, there are about a 100 links and ways on this page to find out – but hey… just click here – and of course that box saying search would do great wonders, too.

But seriously, my bad: I should have mentioned that this page your looking at is really just one post of many… so make sure you start at a Chapter (Menu. At the Top. ↑…yes up there) and read them in the order you’d read a book 😉

After quite some years (and a handful Apple II users requests) I felt the urge to finally put the T2A2 prototype (read that post to get a more general understanding) into a real expansion card… and here it is: Say “Hello!” to the T2A2 version 1.1!

It came a long way…

t2a2_evolution

…in more detail:

t2a2_front

As you can see, the final T2A2 is much smaller than the prototype (which used an 8bit Baby one-for-all PCB) and offers many additional features

  • 2 size-1 TRAM slots (or one size-2) – double the processing power!
  • Low-power, low-profile parts used where possible (3.3V CPLD, HCT logic)
  • External Transputer-link available as edge connector – extend your network to “near infinitum
  • Jumpers for LinkSpeed and optional power to the Transputer-link connector
  • Fully buffered to be a good Apple II bus citizen
  • Works in any slot set to “your card”

Beyond this, everything said about the prototype is still true.

Most important:

a) It’s not tested in the ][ or ][+. I simply don’t own one of those.
Update: One owner reported it’s perfectly working in his IIe!
b) The T2A2 won’t instantly speed up any of your Apple II[e/gs] applications.

It’s more like a co-processor attached to it. And even then, you’ll need something really calculation-intensive to justify the time you’ll loose due to communication between the Apple and the Transputer. A single square-root for example wouldn’t make much sense – but having a complex algorithm (like the Mandelbrot fractal in my demo)  does absolutely make sense, as you just pass the parameters to the Transputer and let him do the sweating.

But on the other hand, FPU cards like the Innovative Systems FPE (using an M68881) or my even faster clone “NumberCruncher Reloaded” did just send instruction by instruction. So I somehow fancy the idea writing a SANE driver for GS/OS to integrate the T2A2 more transparently.

Want your own T2A2?

So now you’re keen to get one yourself? Please check this list first:

  • Do you have a TRAM already? (*)
  • You are aware that there’s no real software (yet) besides my little Mandelbrot demo?
  • You are keen to program something yourself – or are fine to wait until somebody else did?

While the Apple II side of coding is pretty easy, you have to get a grip about the Transputer development, too. That includes a DOS/Windows (<=XP) setup and some knowledge of C and/or OCCAM. I’ve created a little Transputer “SDK”, namely a VirtualBox image running DOS.

Plenty of Dev-Docs are available here.  
I suggest using the INMOS cross-compilers for C or OCCAM. An alternative C compiler came from LSC, which might suit you more if you don’t like the INMOS stuff.

Ok, so you’re still with me… so I have the first batch of 20 T2A2 PCBs ready which I will populate on-demand, and for 40€ (plus shipping) one of them can be yours.
(*) I might be able to offer you a TRAM, too. The price depends on available model, RAM size and CPU used.

⇒ drop me a mail tonobody likes SPAM
(Sorry, you have to type that into your mail-client – nobody likes SPAM, so do I)

Also check the T2A2 forum for current availability, shipping procedure and built status.

Some more technical details

Here’s a T2A2 with a size-1 TRAM installed in Slot-0:

t2a2_with_tram

The T2A2’s CPLD programming can be updated any time through a JTAG port (the lower 2×5 pin-row at the edge).
The jumper above it can be used to set the linkspeed for the TRAMs (10 or 20mbps). If you look very close, there’s a tiny LED next to that jumper. It’s the error LED controlled by the CPLD.
The next single jumper enables the VCC pins on the external Link connector, meant for (small!) external extensions. This connector is the same used on the Gerlach card and is very convenient because of its ubiquitous standard 2×5 shrouded pcb header connector. Here’s the pinout:

As said, the V1.1 T2A2 offers two size-1 TRAM slots. Before plugging in 40MIPS of raw processing power consider the amount of juice being pulled there. Depending on the amount of RAM and load a single TRAM can use up to 800mA of power!  😯
Given the max. of 4A on a standard IIgs power-supply, two TRAMs could bring your souped-up GS into trouble… it’s better to use the external connector with just one smaller TRAM or even simply bridge the Link0In/Out pins with Link3In/Out so that the T2A2 works as a TRAM-less adapter.
That said, there are size-2 TRAMs in existence which will snugly fit  and won’t hurt that much.

T2shield – Arduino to Transputers

[The T2shield is currently a work-in-progress project – the first alpha hardware is operating but lots of coding lies ahead!!]

The idea for the T2shield was born when I thought about getting recent peripherals like SD-cards, Ethernet, displays etc. into a Transputer network.
Sure, there are the old, original TRAMs like the STM228 SCSI controller, the B431 Ethernet TRAM or quite some choice of graphics controllers. But they’re all rare like chicken teeth these days which means unaffordable when they rarely pop up on ePay.
Rebuilding them is also a no-no given the obsolete parts they’ve used… and to be honest: A noisy, vintage SCSI-1 drive isn’t what I thought of.
So after some time I came to the conclusion it will be the easiest and cheapest way to re-use what you can buy for a few bucks from China these days: Arduino shields.
As a lucky incident such a shield perfectly fits onto a size-2 TRAM – Yay!

Without much further ado, here is the T2shield (v0.1) with a shield loosely put on top to get an idea…

t2shield

Arduino shields mainly use the SPI bus for data transfer. With its 10+Mbps SPI is much faster than e.g. I²C (which I used for my T2I2C TRAM) and can cope with fast(er) peripherals.
But this also ruled out a slow IMSC011/12 link-adapter design like the one used on the T2I2C. Also there’s no of-the-shelve SPI master controller to simply glue a bus to it.
So the design turned out much more advanced this time:

  • A 16bit Transputer (with 64KB SRAM) handles the high OS-link speeds as well as the ‘glue’ like File/Ethernet handling.
  • A comparably big CPLD (100pins, 128 marcocells) implements the SPI master controller as well as handles the T2xx memory- and interrupt-handling.
  • All Arduino digital-pins and nearly all analog-pins are connected to the CPLD to adjust to any available shield.

Also this new TRAM design is a “first” for me in many aspects:

  • 1st usage of a 16-bit T2xx Transputer (vs. 32bit)
  • 1st serious utilization of a CPLD (besides the baby-steps taken with the T2A2)
  • 1st Size-2 TRAM

Initial status

Transputer

[wppb progress =100 option=green] 100% done

A T222 Transputer is -after fixing some stupid errors- happily running and has control over his external RAM.
Here’s the first sign of life by RSPY (a brilliant ispy‘ish tool of Michael I really need to write a post about soon):

# Part rt Link0 Link1 Link2 Link3 RAM@cycle
0 T2   20 HOST  ...   ...   ...   4K@1,56K@2|

That means: A T2xx @ 20mbps linkspeed is connected to the host (PC) and has 4K internal and 56K external RAM… wait a second! just 56K? Yes, that’s because the internal 4K overlaps and RSPY leaves the up-most 4K untouched as most peripherals are mapped there and poking there could create unwanted effects…

CPLD

[wppb progress =10 option=yellow] 10% done

All basic Transputer controls are in place, LEDs are controlled by the CPLD

TODO: Implement the SPI master in VHDL. That’s THE biggest challenge for me being a total VHDL noob. Again, I wouldn’t got this far already without the tremendous help from Michael Brüstle, my VHDL Jedi-Master.

Drivers

[wppb progress =100 option=red] 0% done

Write the (Helios) driver to handle file and/or network access. This should be (optimistically) be compatible to existing file- or Ethernet ‘servers’. This will probably be the most time consuming task.

Progress?

This is (also) the first time I’m posting about a project which is not finished. So I will keep you posted about the progress with new posts in this “T2shield” chapter of this wonderful little page.
If you do own

  • a Transputer network
  • a free size-2 TRAM slot
  • a T2xx Transputer
  • some VHDL/Occam/Helios driver skills & knowledge

shoot me a mail and I might provide you a T2shield to join the development.

That said, here’s a first little sign of life:
The Transputer writes numbers to a mem-mapped address (0xFC00) while the CPLD reads the lowest 2 bits and displays them by 2 of the 3 LEDs on the T2shield

Inmos B008

Introduced at the sunset of the Transputer era, the INMOS B008 was the successor of the B004 of which it dramatically differed:

  • 16-bit ISA connector (for Interrupts/DMA, it’s still an 8 bit data path)
  • 10 TRAM slots
  • C004 and T2xx on board

The usual source provides the full manual/documentation… down to the GAL equations.
All these enhancements enabled the B008 to create simple, yet powerful Transputer networks on a single expansion card and still makes it the perfect platform for todays Transputer retro experiments/fiddlings.

Here’s a late “Rev.F” version of the IMSB008:

IMSB008E

The previous revision E had a ceramic version of the T2xx and used GALs instead of the CPLD you can stop in aboves picture.

B008_revE

Clones

Obviously the Inmos B008 had clones as the B004 did.

Transtech TMB08

Well, I guess there’s nothing which Transtech did not clone and/or improve…
In this case they used a AMD MACH CPLD from the beginning, a PLCC version of the T225 and everything else was SMD.

TMB08

Alta SuperLink/XL Transputer PC Card

Alta Technology was a spin-off of Computer System Architects (CSA).
The SuperLink/XL board used TRAM modules like the Inmos B008, but featured a 20 MHz IMST225 Transputer to handle external interfacing and a beefed-up host interface design. Like the B008 it supported 10 TRAM’s

Altacor SuperLink XL

ARADEX

One of my first TRAMs was an ARADEX one, so this is a first for me, too. Actually I wasn’t aware that ARADEX also made TRAM carriers… but they seem to do quite some:

TransAT Plus

Their interpretation of the B008 theme features just 8 TRAM slots (vs. the 10 “standard”) which are strangely enough sorted in ascending order (0-7) and not like the B008 order of 1, 5, 6, 2, 0, 4, 7, 3, 8, 9. Also the TransAT provides two 9pin D-subminiature connectors in contrast to the B008s 37pin connector – most likely its layout is the so called Aalener Link-Interface. That includes the RS-422 differential drivers and receivers next to the connector (AM26LS23’s and AM26LS33’s).

Concerning other parts, besides an C012 they used some sort of CPLD – presumably to control the 16 bit bus – in the photos I have, I can clearly see the traces coming form the ISA Bus’ D[8-15].
It also uses the IRQs 10-12.

Last fun bit: They named the two GALs with their JED filenames. Very handy 😉

aradex_transat-plus

TransAT-8

The “8” in its name makes it a valid member of this collection – but it misses an C004 and also has just 8 TRAM slots… so maybe it should be considered somewhere between an IMSB004 and B008?

This time TIs 74ALS192/193 differential drivers are used, the PLCC chip is yet unknown… and in this case, a TTL chip (next to the ISA slot) is missing… all in all, it looks like a refresh of the TransAT Plus to me.

I even have a piccy of the back-side. Why the heck is somebody putting a GAL there?

Tuning the Mandelbrot benchmark

It’s an open secret that the CSA mandelbrot benchmark tool (available in my ‘basic Transputer tools‘ package) is one of my favorite benchmark and test-tool when playing around with my various Transputer toys.
One fine day I thought VGA with more than 16 colo(u)rs  would be nice… and the coding began. First step: Put the original source (well, already enhanced by a timer and some debugging) on github.

The original CSA Mandel program uses the official 640×480 16 color VGA mode (aka 0x12) and uses its own calls for that, i.e. no external 3rd party libs. Very manly 😉 but not very colorful…

Mandel_16

So I created the first branch (aka Mandel_3) added a more “modern” command-line options handling and dived into hand-coding VBE (VESA BIOS Extensions) matters. That was very instructive and fun… and the first results showed that I didn’t just got 256 colors now but draw speed was increased, too  😯

Look Mom! More colors:

Mandel_256

Running in host-mode (/t) on my P200MMX the initial screen took 6.6s vs 7.1s for 16-colors – so a difference of 0.5s or 7% should be much higher on Transputers, so I thought. And should this mean that bigger Transputer farms had been bottleneck’ed by the actual plotting of pixels?

Because 256 colors and higher resolutions (up to 1280×1024 depending on your VGA cards VESA BIOS) are fine, but even more colors are better, I branched the code a 2nd time (MANDEL_BGI) and replaced the VBE code by a BGI SVGA interface.
While originally Borland only supports VGA, there are 2 BGI drivers written by 3rd party developers which do support SVGA and up to 24-bit colors.
It’s commonly known that BGI is not the fastest graphics interface on planet earth… and the benchmark proved this:

P200MMX Orig VESA SVGA SVGA256
1 7.123 6.623 8.911 6.915
2 38.258 36.635 39.717 37.725

I was hoping the change would have more impact when running the same on my Cube system… well it didn’t:

65x T800 (integer) Orig VESA SVGA SVGA256
1 2.323 2.288 3.940 2.383
2 8.163 8.173 8.181 8.164

So as final conclusion, I will stay with the VBE SVGA drivers included in the V3.x code – it’s a good compromise between overall code/distribution size, comfort and speed.
The original VGA mode (0x12) will stay in the code forever to get comparable benchmark measurements – if you really need CGA/EGA/Hercules, you can always use the 2.x version.

The Cube

Meet The Cube – this is the Transputer Power-House successor to the Tower of Power, which was a bit of a hacked frame-case and based on somewhat non-standard TRAM carriers with a max. capacity of just 24 size-1 TRAMs…

The Cube hardware

This time I went for something slightly bigger  😎 …A clear bow towards the Parsytec GigaCube within a GigaCluster.
The Cube uses genuine INMOS B012 double-hight Euro-card carriers, giving home to 16 size-1 TRAMs – Parsytec would call this a cluster and so will I.
Currently The Cube uses 4 clusters, making a perfect cube of 4x4x4 Transputers… 64 in total. Wooo-hooo, this seems to be the biggest Transputer network running on this planet (to my knowledge)
If not, there still room left for more  😯

Just to give you a quick preview, this is what ispy responds when ran against the Cube:

Using 150 ispy 3.23 | mtest 3.22  # Part rate Link# [ Link0 Link1 Link2 Link3 ] RAM,cycle  0 T800d-24 276k 0 [ HOST ... ... 1:1 ] 4K,1 1024K,3; Display all 64 lines
1 T800d-25 1.7M 1 [ ... 0:3 2:1 3:0 ] 4K,1 2048K,3; 2 T800d-24 1.8M 1 [ ... 1:2 4:1 5:0 ] 4K,1 2048K,3; 3 T800d-25 1.8M 0 [ 1:3 6:2 5:1 7:0 ] 4K,1 2048K,3; 4 T800d-24 1.8M 1 [ ... 2:2 6:1 8:0 ] 4K,1 2048K,3; 5 T800d-25 1.8M 0 [ 2:3 3:2 8:1 9:0 ] 4K,1 2048K,3; 6 T800d-24 1.8M 2 [ ... 4:2 3:1 10:0 ] 4K,1 2048K,3; 7 T800d-24 1.8M 0 [ 3:3 10:2 9:1 11:0 ] 4K,1 2048K,3; 8 T800d-25 1.8M 0 [ 4:3 5:2 10:1 12:0 ] 4K,1 2048K,3; 9 T800d-25 1.8M 0 [ 5:3 7:2 12:1 13:0 ] 4K,1 2048K,3; 10 T800d-24 1.8M 0 [ 6:3 8:2 7:1 14:0 ] 4K,1 2048K,3; 11 T800d-24 1.8M 0 [ 7:3 14:2 13:1 15:0 ] 4K,1 2048K,3; 12 T800d-25 1.8M 0 [ 8:3 9:2 14:1 16:0 ] 4K,1 2048K,3; 13 T800d-25 1.8M 0 [ 9:3 11:2 16:1 17:0 ] 4K,1 2048K,3; 14 T800d-24 1.8M 0 [ 10:3 12:2 11:1 18:0 ] 4K,1 2048K,3; 15 T800d-25 1.8M 0 [ 11:3 ... 17:1 19:0 ] 4K,1 2048K,3; 16 T800d-24 1.8M 0 [ 12:3 13:2 18:1 20:0 ] 4K,1 2048K,3; 17 T800d-25 1.8M 0 [ 13:3 15:2 20:1 21:0 ] 4K,1 2048K,3; 18 T800d-25 1.8M 0 [ 14:3 16:2 ... 22:0 ] 4K,1 2048K,3; 19 T800d-25 1.8M 0 [ 15:3 22:2 21:1 23:0 ] 4K,1 2048K,3; 20 T800d-25 1.8M 0 [ 16:3 17:2 22:1 24:0 ] 4K,1 2048K,3; 21 T800d-25 1.8M 0 [ 17:3 19:2 24:1 25:0 ] 4K,1 2048K,3; 22 T800d-25 1.8M 0 [ 18:3 20:2 19:1 26:0 ] 4K,1 2048K,3; 23 T800d-25 1.8M 0 [ 19:3 26:2 25:1 27:0 ] 4K,1 2048K,3; 24 T800d-24 1.8M 0 [ 20:3 21:2 26:1 28:0 ] 4K,1 2048K,3; 25 T800d-25 1.8M 0 [ 21:3 23:2 28:1 29:0 ] 4K,1 2048K,3; 26 T800d-25 1.7M 0 [ 22:3 24:2 23:1 30:0 ] 4K,1 2048K,3; 27 T800d-24 1.8M 0 [ 23:3 30:2 29:1 31:0 ] 4K,1 2048K,3; 28 T800d-25 1.8M 0 [ 24:3 25:2 30:1 32:0 ] 4K,1 2048K,3; 29 T800d-25 1.8M 0 [ 25:3 27:2 32:1 33:0 ] 4K,1 2048K,3; 30 T800d-25 1.8M 0 [ 26:3 28:2 27:1 34:0 ] 4K,1 2048K,3; 31 T805d-20 1.7M 0 [ 27:3 ... 33:1 35:0 ] 4K,1 1024K,3; 32 T800d-24 1.8M 0 [ 28:3 29:2 34:1 36:0 ] 4K,1 2048K,3; 33 T800d-20 1.8M 0 [ 29:3 31:2 36:1 37:0 ] 4K,1 1024K,3; 34 T800d-24 1.8M 0 [ 30:3 32:2 ... 38:0 ] 4K,1 2048K,3; 35 T800c-20 1.8M 0 [ 31:3 38:2 37:1 39:0 ] 4K,1 1024K,3; 36 T805d-20 1.7M 0 [ 32:3 33:2 38:1 40:0 ] 4K,1 1024K,3; 37 T800c-20 1.6M 0 [ 33:3 35:2 40:1 41:0 ] 4K,1 1024K,3; 38 T800d-20 1.6M 0 [ 34:3 36:2 35:1 42:0 ] 4K,1 1024K,3; 39 T800d-20 1.7M 0 [ 35:3 42:2 41:1 43:0 ] 4K,1 1024K,3; 40 T800d-20 1.8M 0 [ 36:3 37:2 42:1 44:0 ] 4K,1 1024K,3; 41 T800d-20 1.7M 0 [ 37:3 39:2 44:1 45:0 ] 4K,1 1024K,3; 42 T800d-20 1.8M 0 [ 38:3 40:2 39:1 46:0 ] 4K,1 1024K,3; 43 T800d-20 1.8M 0 [ 39:3 46:2 45:1 47:0 ] 4K,1 1024K,3; 44 T800d-20 1.8M 0 [ 40:3 41:2 46:1 48:0 ] 4K,1 1024K,3; 45 T800d-20 1.8M 0 [ 41:3 43:2 48:1 49:0 ] 4K,1 1024K,3; 46 T800d-20 1.7M 0 [ 42:3 44:2 43:1 50:0 ] 4K,1 1024K,3; 47 T800d-20 1.8M 0 [ 43:3 ... 49:1 51:0 ] 4K,1 1024K,3; 48 T800d-20 1.8M 0 [ 44:3 45:2 50:1 52:0 ] 4K,1 1024K,3; 49 T800d-20 1.6M 0 [ 45:3 47:2 52:1 53:0 ] 4K,1 1024K,3; 50 T800d-20 1.8M 0 [ 46:3 48:2 ... 54:0 ] 4K,1 1024K,3; 51 T800d-20 1.8M 0 [ 47:3 54:2 53:1 55:0 ] 4K,1 1024K,3; 52 T800d-20 1.8M 0 [ 48:3 49:2 54:1 56:0 ] 4K,1 1024K,3; 53 T800d-20 1.8M 0 [ 49:3 51:2 56:1 57:0 ] 4K,1 1024K,3; 54 T800d-20 1.6M 0 [ 50:3 52:2 51:1 58:0 ] 4K,1 1024K,3; 55 T800d-20 1.8M 0 [ 51:3 58:2 57:1 59:0 ] 4K,1 1024K,3; 56 T800d-20 1.7M 0 [ 52:3 53:2 58:1 60:0 ] 4K,1 1024K,3; 57 T800d-20 1.8M 0 [ 53:3 55:2 60:1 61:0 ] 4K,1 1024K,3; 58 T800d-20 1.8M 0 [ 54:3 56:2 55:1 62:0 ] 4K,1 1024K,3; 59 T800d-20 1.8M 0 [ 55:3 ... 61:1 ... ] 4K,1 1024K,3; 60 T800d-20 1.7M 0 [ 56:3 57:2 62:1 63:0 ] 4K,1 1024K,3; 61 T800d-20 1.6M 0 [ 57:3 59:2 63:1 ... ] 4K,1 1024K,3; 62 T800d-20 1.8M 0 [ 58:3 60:2 ... 64:0 ] 4K,1 1024K,3; 63 T800d-20 1.8M 0 [ 60:3 61:2 64:1 ... ] 4K,1 1024K,3; 64 T800d-20 1.7M 0 [ 62:3 63:2 ... ... ] 4K,1 1024K,3;

Here are some more figures:

  • 32 x T800@25Mhz/2MB  (my very own AM-B404 TRAMs)
  • 32 x T800@20MHz/1MB  (mainly TRAMs from MSC and ARADEX)
  • -> 96MB of total RAM
  • -> 70-130 MFLOPS (single precision)
  • ~800MIPS combined integer power
  • ~60Amps @5V needed (That’s 300W  😯 )

So we’re talking about 70-130 MFLOPS here – depending which documentation you trust and what language (OCCAM vs. Fortran) and/or OS you’re using. That was quite a powerhouse back in 1990 (Cray XM-P class!)… and dwarfed by a simple Pentium III some years later 😉
Just for to give you an comparison with recent hardware (Linpack MFlops):

Raspberry Pi Model B+ (700 MHz) ~40 DP Mflops
Raspberry Pi 2 Model B (1000 MHz – one core) ~134 DP Mflops
Raspberry Pi 3 Model B (1200 MHz – one core) ~176 DP Mflops

Short break for contemplation about getting old…

Ok, let’s go on… you want to see it. Here it is – the front, one card/cluster pulled, 3 still in. On the left the mighty ol’ 60A power supply:

Cube_Front_4x

Well, this is the evaluation version in a standard case, i.e. this is meant for testing and improving. I’m planning for a somehow cooler and more stylish case for the final version (read: Blinkenlights etc.).

And here’s the IMHO more interesting view… the backside. It shows the typical INMOS cabling.

CubeBack

As usual, I color coded some of the cables.
The green arrow points to the uplink to the host system to which The Cube is connected to. Red are the daisy-chained Analyse/Reset/Error (ARE) signals. The yellow so-called jumper-cables connect some of the IMSB004 links back into the boards network. And in the upper row (blue) four ‘edge-links’ of each board are connected to its neighbor.

This setup connects four 4×4 matrices (using my C004 dummies as discussed here)  into a big 4×16 matrix. Finally I will ‘wrap’ that matrix into a torus. Yeah, there might be more clever topologies, but for now I’m fine with this.

Building up power

For completeness, here’s a quick look at how things came together.

The 4 carriers/clusters with lots of size-1 TRAMs… upper right one is the C004-dummy test board (now also fully populated). Upper left is pure AM-B404 love <3

ITEM_futter

Fixing/replacing the broken power-supply (in the back), including the somewhat difficult search for a working cooling solution:

ITEM_repairing

The Cube software

Well there isn’t any specific software needed to run The Cube, but it definitely cries out loud for some heavily multi-threaded stuff.

So the first thing has definitely to be a Mandelbrot zoom. As usual, I used my very own version with a high-precision timer, available in my Transputer Toolkit.

Here’s the quick run in real-time – you can still figure out visually each Transputer delivering its result:

Other Transputer and x86 results of this benchmark can be seen in this post over here.

We need (even) more power, Igor!

So this is running fine – using internal RAM only. On the other hand, it seems that the current power supply has some issues with, well, the electric current.
When booting Helios onto all 64/65 Transputers which uses all of the external RAM, very soon some of them do crash or go into a constant reboot-loop.
By just reducing the network definition (i.e. not pulling any Transputers) to 48, Helios boots and runs rock-solid.
Because measuring the voltage during a 64-T boot shows a solid 5.08V on all TRAM-slots it most likely means the power supply either can’t deliver the needed amount of Amps (~60) or produces noise etc.  😥
So this is the next construction site I have to tackle.

To be continued…

Lies, damn lies and benchmarks

As soon you’re talking about Transputers with people which weren’t there back in 1985 you’ll be asked this very soon: “How fast are these Transputer thingies”? Then there’s a stakkato of “MIPS? Whetstones? Dhrystones?” etc…

As always with benchmarks, the only valid answer is “it depends”. Concerning Transputers that’s even more true.
First, I suggest you read this Lies, Damn lies and benchmarks document from INMOS itself. It pretty much describes the dilemma and all the smoke and mirrors around that matter.

Benchmarks? It depends.

So you’ve read the above INMOS document? As you might saw, it’s full of OCCAM code. That’s the #1 prerequisite to get fast, competitive code (as long you’re not into Transputer assembler). From there it gets worse if you use a C compiler or even FORTRAN…

My little benchmark

Because it scales so well, works with integer as well as floating point CPUs and also runs on the x86 host while using at least the same graphic output routines, my personal benchmark is CSAs Mandelbrot tool (DOS only).
My slightly modified version is part of my Transputer Toolkit, which is downloadable here. You will need that version because I extended the code of this Mandelzoom with a high precision timer (TCHRT, shareware, can’t remove the splashscreen, sorry) when run with the “-a” parameter. You’ll need my provided default “MAN.DAT” file, which contains 2 coordinates to calculate (1st & 2nd run) to get comparable numbers.

So to bench your Transputer system start it with:

 man -v -a

which runs it in VGA mode (640x480x16c), loads the coordinates from “MAN.DAT” and when done presents you with a summary screen like this:

csa_mandel_timer

To run it on your hosts x86 CPU, call it with “man -t -v -a”

The Results

Here are my results of the different Mandelzoon runs I made in the past. The blue background marks the host machine results, yellow are the integer timings and green is where the mucho macho things are happening.. well, sort of 😉
There are two columns for the results, the HD timer and the hand-timed runtimes. This is because these are from days before I enhanced the Mandelzoom.
This table will continously updated of course. e.g. the last row is pretty new – what might that system be?  😯

The sources are available in my github repository – so we can collaborate on enhancing and optimizing it.

HD in-programm Timer (s) Hand-Timed
System 1st 2nd 1st run 2nd run Comment
i386DX/33 (0kb L2) 1800 0 1:30:00
(canceled)
0 Canceled 1st run after a quarter of Mandelbrot was done…
i386DX/33 (0kb L2) + 387 588 3316 0:09:48 0:55:16
Am386/40 (0kb L2) + 387 490 2980 0:08:10 0:49:40  21% faster clock but only 10.5% better result
i386DX/33 (128k L2) + 387 274 1547 0:04:34 0:25:47
Am386DX/40 (128k L2) + 387 228 1292 0:03:48 0:21:32
i486DX/33 (8k L1, 0k L2) 01:06.24 368.56 Pretty close to a single T800-20
i486DX2/66 (8k L1, 128k L2) 00:33.72 185.51 Very close to 2x T800-20
Pentium 133 (256kb L2) 00:09.09 00:55.01 About 8x T800-20
Pentium 200 MMX 00:07.13 00:38.06 About 9x T800-20
AMD K6-3+/266 00:06.00 00:32.00 Downclocked, 64k L1, 256kb L2, 1M L3
Core i3-2120 3.3GHz 00:01.66 00:02.13 VirtualBox,1 CPU
1x T425-20 0:00:25 0:02:28   There’s something wrong here – needs re-run
2x T425-20 00:51.55 04:56.60
3x T425-20 00:34.42 03:17.81
4x T425-20 00:25.86 02:28.56
5x T425-20 00:20.74 01:58.96
6x T425-20 00:17.37 01:39.19
9x T425-20 11 62 0:00:11 0:01:02
13x T425-20 8 42 0:00:08 0:00:42
21x T425-20 5 27 0:00:05 0:00:27
25x T425-20 4 23 0:00:04 0:00:23
65xT425 (48x25Mhz, 16x20MHz) 00:02.323 00:08.163 Actually it was 64xT800 and one T425 forcing the calculation to integer
1x T800-20 01:09.13 06:27.18
1x T800-25  0:00:55 0:05:09 25% higher clockrate should result in 17.5% speedup. Incl comm-overhead that pretty much fits
1x T800-30 00:00.46 00:04.30
2x T800-20 00:35.65 03:13.79
3x T800-20 00:23.16 02:09.32
4x T800-20 00:17.43 01:37.04
5x T800-20 00:14.04 01:17.74
6x T800-20 00:11.82 01:04.83
5x T800-25 11 62 0:00:11 0:01:02
9x T800-20 8 40 0:00:08 0:00:40
13x T800-20 5 30 0:00:05 0:00:30
17x T800-25 00:03.8 00:18.59  “1st run” shows that the slow ISA interface is really  getting a bottleneck
21x T800-20 4 18 0:00:04 0:00:18
33x T800-20 00:02.88 00:11.97
65x T800 (32×25, 33x20Mhz) 00:02.21 00:05.74