Tag Archives: Transputer

GigaCluster

The GC system (GigaCluster) was Parsytecs entry into the Supercomputing world aiming the TOP500 list.
Actually there were two models available. The pure Transputer model was called “GCel” (GigaCluster entry level) while its successor using Motorolas PPC601 CPUs was simply called “GC”. The different sources in the Web are mixing those two model-names at random – I’m sticking to GC throughout this article, even it’s the Transputer model.

Architecture

A GC machine is built up from a number of GigaCubes. Each GigaCube represents a self-contained unit/case with its own power supply, I/O channels and interconnection to other GigaCubes. Each GigaCube contains 64 T805 processors packaged at high-density.
A GigaCube was available as a stand-alone machine, called the GC-1/64 (Peak performance = 12.8 GIPS (32-bit), 1.6 GFLOPS (64-bit))
This is a picture of 4 GigaCubes (GC-2, i.e. 256 Transputers)

GigaCubeCases

 This picture shows the biggest ever built GC. It’s the GC-3 (1024 CPUs) delivered to the university of Paderborn (Germany) which is now on display stored away somewhere in the Heinrich-Nixdorf Computer Museum in the same city (Thanks to Abraham V. giving me the update that they removed it from public display. Shame on them! – been there myself in 2018 and yes, it’s still not on display):

GC_paderborn

Machines larger than the GC-3 (>= 1024 processors) would have required water cooling which is facilitated by the use of “heat pipes”. Here’s a sketch I’ve found showing where the cooling was located in the GigaCube housing.

GC-Cooling

How did Parsytec connect those heat-pipes with the board? By a lucky incident a nice guy provided me with new, detailed photos describing how cooling was managed.

Contrary to a “normal” GigaCluster CPU board (or cluster, as described further down), CPU boards in GC-3 were covered by a massive aluminum heat-spreader, featuring “half-pipes” milled into the aluminum as shown in this picture:

GC_CPUboards

When put back-to-back into the Backplane, those half-pipes created a full pipe into which the heat-pipe was “sunk”. Very clever, indeed.

GC_CPUboardsPipes

Fun-findings: If a GC-4 whould have ever been built, it would have looked like this rendering (bear in mind: 4096 Transputers!)

And here’s a nice press-photograph showing Angela Merkel building a GC… naah, just kidding:

Node

A GigaCube consists of four clusters of 16 processors and has self-contained redundancy, control processor, power supply and cooling.
A cluster is the basic architectural unit and consists of 16 Inmos IMS T805 transputers running at 30MHz, the EDC-protected memories (up to 4Mbytes per T805), a further redundant T805, the local link connections and 4 Inmos C004 routing chips. Each link of the T805 is connected to a different C004, thus making it hardware fault tolerant. Redundancy in a cluster ensures overall probability of failure is less than that of a single typical chip.
This is a diagramm of how a cluster was connected internally:

cluster

And this is a photogaph of a cluster as it was used in the Parsytec x’plorer. Besides the missing C004s it is identical to the those used in the CG:

FullBoard

I/O

Inside a GigaCube each processor cluster has eight dedicated links with a bidirectional bandwidth of 20 Mbytes/s. Each of the two sets of 16 links with an additional control link forms a basic I/O channel. These are logically driven by the control processor and therefore allow it to control the attached devices if required. For the largest systems shared I/O devices amongst the GigaCubes is achieved with a special module (IONM) which may be cascaded.

Up to 4 clusters were plugged into a Backplane which looks like this:

GC_Backplane_1

Next to the four cluster-boards, a buffer-board was seated handling the communication “outside of the cube”:

GC_Bufferboard_1

As you most certainly know, Blinkenlights is an absolute must-have (and the 6th commandment in Axels laws ;)), so each Cube had its own LED panel – 128 LEDs, 2 for each Transputer in the Cluster – hidden behind a sleek acrylic glass:

GC_LEDpanel

Topology

The communications structure of the machine is software configurable. Each T805 has four hard links and up to 16384 virtual links. Each hard link is connected to a C004 32×32 way cross-bar switch. The C004 can determine the destination of a message and switch automatically with extremely low latency.

Operating System

Ofically the CG-machines were meant to be used with PARIX. PARIX is based on UNIX with parallel extensions, supports Remote Procedure Call and the I/O library is a subset of the POSIX standard.
Being a “good Transputer system” the GCs can also run Helios. Helios is supporting the special reset-mechanism of Parsytec out of the box.

Parsytec was involved in the first phase of the GPMIMD project. However, disagreement with the other members, Meiko, Parsys, Inmos and Telmat, on the adoption of a single physical architecture, prompted them to announce their own T9000 machine, based on the design of the CG machines. Due to INMOS’ problems with the T9000, Parsytec switched to the Motorola 604 PowerPC CPUs. This led to their “hybrid” systems degrading Transputers as communication processors and having the PPCs doing the computational work.
While I cannot support that policy, here’s a separate Post about those bastard machines due to frequent requests 😉

SuperCluster

The Parsytec SuperCluster was the first system moving away from the classic Eurocard (wikipedia) based machines like the predecessor “MegaFrame” – also, due to the availability of the FPU in the new T800, the previously supporting Motorola M68881 was dropped.

This is how a SuperCluster looks like (NB: Two MegaFrames sitting on top of it):

SuperClusterTotale

(This specific system still stands as an exhibition in the halls of the University Paderborn. It has 320 Transputers, 4MB RAM each, total performance is about 1.1 GFlops/sec. Picture courtesy of “[CD]Overkill”)

This is the CPU Module (4 nodes, 4MB each) – while being bigger because of the use of many DIP parts, the basic design is very similar to the later GigaCluster node:

MegaFrameTotal

And this is the so-called NCU- or XBAR-Module (Network Connection Unit/Crossbar, in this case the “L” version) consisting of 13 C004 Network-switches and one controlling T425 Transputer:

MegaFrameXbarDetail

A “Map” of the NCU (Red CPU-Node, Blue C004s, Orange RS422 transceiver):

MegaFrameXbarMap

If my interpretation is correct, 4 CPU Modules were combined with one NCU, so 16 Transputers were connected to a 12-C004 Network using the 13th C004 to connect this “Cluster” to the ‘outside’, e.g. backplane with more Clusters.

For completeness sake, here’s a picture of the NCU “version R”. It consists of just 10 C004 and the controlling Transputer is a T222 – given its layout in pure DIL, this might be the predecessor of the L-model.
As I’ve learned from another “SuperCluster Enthusiast” (That probably makes it 2 world wide ;-)) both NCUs were used in a SuperCluster. “L” and “R” simply stands for “Link” and “Reset”, so those signals were handled by separate NCUs.

SC_XBAR-R

The two larger ICs in front of the T222 are SRAMs (64K total) making the design even simpler.

x’plorer

It’s yet unclear how to write its name correctly. Some sources use XPlorer, others xPlorer and many other permutations. As my model has this logo at the front of its case xpl_logo

I’m going to use x’plorer. Having this burning issue off my chest, let’s go into details 😉

The x’plorer is more or less a single “slice” -Parsytec called them cluster– of a GigaCube, which used 4 of those clusters in its smallest 64-Transputer version (GC-1). Thus, the  x’plorer features 16 Transputers, each having access to 4MB RAM in a gorgeousdesktop case. If you want, you could call it a “GC-0.25”. xplorer_freigestellt

Sorry, but I have to rave a bit over its case design. I first saw it at CeBit fair 1995, when is was on display at the “IF forum design” (I explained that in the Parsytec intro already). I was literally pulled through the room and stood in front of it for minutes.
It looked like a typical FROG design work (Famous for the NeXT cube, Apple //c, some 68k Macs, SPARC Stations and much more) and I told this everybody who asked… but I just figured out I was wrong and it was actually designed by a relatively small studio called Via 4 Design.

The grey L-shaped base/back is made of sheet metal and contains the power supply in the base, a tiny bit of logic (described further down) and 2 big fans in the back.

The big blue main body is made of cast metal – no puny plastic crap. Yes, it’s heavy.
At first sight everybody thinks “hey, that’s a clever design, using the case as a heat sink!”. That’s obvious as those fingers/burlings sticking out to the left and right really do look like they are just that. And yes, as far as I was told, initially they planned to use them for cooling to circumvent noisy fans… but for some reason this didn’t work out and so they’re just a -admittedly very cool- design element.

Dissection

Let’s open that beauty. After loosening two screws at the back you can remove the whole back-panel holding the 2 big fans. The first you’ll see will be a rather simple, 2-layer circuit board  containing the RS422 interface logic, i.e. 2 Am2631 and 2 Am2632.
This is the board removed from the case. Front:

RS422_1

….and the back. A bit patched, some resistors added.

RS422_3

Those extra cables and resistors were retrofitted to allow the user to “partition” the x’plorer into either 1×16 or 2×8 Transputers. Therefore my x’plorer has a simple switch on its back to either select the full or half system. As there are 4 connectors in the back (2x TTL, 2x RS422) two users can use the system at the same time.

After removing the L-Shaped base from the main part, you can slide-out the blue side-panels leaving you with the dark grey frame holding the main circuit board… looking like this:

xPlorer_Workbench

You might wonder about the ‘workshop environment’ – well, actually you need quite some tools to open the x’plorer. Like with some Italian cars ;-), you need a different screwdriver for each screw. Here’s what I remember: Allen key size 2, 2.5 and 3. Also handy are a Torx size 3 and a Phillips size 2 as well a a rubber hammer to carefully removing the dark-grey frame (consists of 3 parts).
All in all, given the wild mixture of techniques and material used, I have the strange feeling that the x’plorer was created in a somewhat shirt-sleeve approach.

After some wiggling, careful knocking and a bit of cursing, you will have the main circuit-board laying in front of you – actually there are two boards, back-to-back, connected by a tiny backplane still hidden in the metal frame in the picture below.
Definitely this is the high-tech part of the whole x’plorer using state-of-the-art technology of its time (1992):

FullBoard

A closer look

The boards layout is nice and clean. You can understand its design by simply looking at it – so let’s do so. Here’s a top-view of 3 ‘Blocks’ or ‘Columns’ as an example… all 16 look the same:

Topview_3_blocks

At the Top there’s the T805-30MHz Transputer. Right below is an IDT49C460 which is a 32-bit Error Detection and Correction chip which generates check bits on a 32-bit data field. Then there a lots of buffers and drivers which connect to the 4MB of RAM below them (80ns, so 4 cycle speed). Nice little feature: 2 SMD LEDs for error and running.

On each side, between 4 of those ‘blocks’ are two of those little circuit boards seated in 2 sockets (labelled ‘U1004’ and ‘U1005’). These are hard-wire replacements for the normally installed C004 link switches. Normally means if this cluster would have been installed in a GigaCube, there would be 4 C004s on each cluster allowing free and dynamic configuration of the Transputer network topology. So in an x’plorer you have to live with a fixed 4×4 network, which is not bad, because configuring 4 C004s can be painful 😉

C004_dummies1

At the same spot, there’s a nice marking of the ‘fathers’ of this board, namely Mr. M.Vyskcocil and L. Wassmann.
Also, the official product name of this board is “GCT8PEDC” and it’s Revision 1.3.

Model_GCT8PEDC

The “other x’plorer”

Well, yes, there was another x’plorer in existence. As with the GC-system, Parsytec later offered the “PowerXplorer”- you may guessed it already: It used the cluster from the PPC601 powered GC systems. I wrote a dedicated post over here about those hybrids, bastards, you-name-it 😉

Let’s roll

Now that we know what’s inside that beautiful case, let’s see if it’s still working. As mentioned before, Parsytec had a special way of resetting the Transputer(s). Instead of resetting all the Transputers in the network the same time (like INMOS did), Parsytec systems had been designed that each Transputer can reset its 4 direct neighbours. So you need to load a special software into a Transputer to do so.
That said, after powering-on the system, all Transputers will be reset automatically. This is the only chance to run standard Trasputer tools like ‘ispy’… once.

So I connected my trusty old Gerlach card to my selfbuilt RS422 interface and brought a cable from there into the x’plorer.
At its back, the x’plorer features four 8-pin sockets made by LEMO (Sockets are EGG.2B.308, so the plug might be FGG.2B.308 I guess) lloking like this:

xpl_connectors

 

If you’re lucky enough to own one of those expensive plugs/cables, this is the wireing:

      1        1 Reset out+    2 Reset out-
   2     8     3 Link out+     4 link out-
  3       7    5 Link in-      6 Link in+
   4     6     7 Reset in-     8 Reset in+
      5

I you don’t have a LEMO plug, use the 2×5 sockets inside the x’plorer case right behind the LEMO sockets.  Their pin-out should be:

10  RO+  RO-  9
 8  LO+  LO-  7
 6  NC   NC   5
 4  LI-  LI+  3
 2  RI-  RI+  1

Now let’s call ispy and see what happens…

Using 150 ispy 3.23 | mtest 3.22
# Part rate Link# [  Link0  Link1  Link2  Link3 ] RAM,cycle
0 T800d-24 265k 0 [   HOST    1:0    ...    ... ] 4K,1 1024K,3;
1 T805d-30 1.3M 0 [    0:1    2:1    3:2    4:3 ] 4K,1 4092K,4.
2 T805d-30 1.8M 1 [    5:0    1:1    6:2    7:3 ] 4K,1 4092K,4.
3 T805d-30 1.8M 2 [    7:0    ...    1:2    8:3 ] 4K,1 4092K,4.
4 T805d-30 1.8M 3 [    6:0    ...    ...    1:3 ] 4K,1 4092K,4.
5 T805d-30 1.8M 0 [    2:0    9:1   10:2   11:3 ] 4K,1 4092K,4.
6 T805d-30 1.8M 2 [    4:0   11:1    2:2    ... ] 4K,1 4092K,4.
7 T805d-30 1.8M 3 [    3:0   10:1   12:2    2:3 ] 4K,1 4092K,4.
8 T805d-30 1.8M 3 [    ...   12:1    ...    3:3 ] 4K,1 4092K,4.
9 T805d-30 1.8M 1 [    ...    5:1   13:2   14:3 ] 4K,1 4092K,4.
10 T805d-30 1.8M 2 [   14:0    7:1    5:2   15:3 ] 4K,1 4092K,4.
11 T805d-30 1.8M 3 [   13:0    6:1    ...    5:3 ] 4K,1 4092K,4.
12 T805d-30 1.8M 2 [   15:0    8:1    7:2    ... ] 4K,1 4092K,4.
13 T805d-30 1.8M 2 [   11:0    ...    9:2    ... ] 4K,1 4092K,4.
14 T805d-30 1.8M 3 [   10:0    ...   16:2    9:3 ] 4K,1 4092K,4.
15 T805d-30 1.8M 3 [   12:0   16:1    ...   10:3 ] 4K,1 4092K,4.
16 T805d-30 1.8M 2 [    ...   15:1   14:2    ... ] 4K,1 4092K,4.

Tadaa! There they are: 16 30MHz Transputers running at full steam ahead! Some intense brain-boggling hours later, I was able to draw a “mesh-map”:

Host
|
4---1---3---8
|   |   |   |
6---2---7--12
|   |   |   |
11--5--10--15
|   |   |   |
13--9--14--16

After this ispy run you can’t reset the network again if you don’t use proper Parsytec tools… or Helios.

So quickly running the ispy output through my little Perl script (available in the Helios chapter on this page), some small adjustments and here it is, the Helios network map:

Network /Net {
Processor 00 { ~IO, ~01, , ; system; }
Processor IO { ~00; IO; }
{
Reset { driver; ; pa_ra.d }
processor 01 { ~00, ~02, ~03, ~04; }
processor 02 { ~05, ~01, ~06, ~07; }
processor 03 { ~07,    , ~01, ~08; }
processor 04 { ~06,    ,    , ~01; }
processor 05 { ~02, ~09, ~10, ~11; }
processor 06 { ~04, ~11, ~02,    ; }
processor 07 { ~03, ~10, ~12, ~02; }
processor 08 {    , ~12,    , ~03; }
processor 09 {    , ~05, ~13, ~14; }
processor 10 { ~14, ~07, ~05, ~15; }
processor 11 { ~13, ~06,    , ~05; }
processor 12 { ~15, ~08, ~07,    ; }
processor 13 { ~11,    , ~09,    ; }
processor 14 { ~10,    , ~16, ~09; }
processor 15 { ~12, ~16,    , ~10; }
processor 16 {    , ~15, ~14,    ; }
}
}

Obviously you have to ‘compile’ a .map file out of that and adjust your initrc file – all this is described in the above mentioned Helios chapter.
When you have done everything correctly, Helios will happily boot your x’plorer and you can enjoy the mighty power of 16 (well, 17) Transputers!

The Inmos B020

Well, normally I wouldn’t list Inmos’ expansion cards here, as they’re normally just TRAM carrier boards and very well documented at the usual places… but for this card, it’s different.

Not only that there isn’t any documentation of the IMSB020 (just a brochure over at Rams page) but this specific card is an updated version of that described in the brochure… I’d call it Rev.2.
It has a T805 instead of a T4xx and probably some more additions…

Ok, first of all: What is the Inmos B020 anyway? It’s being sold as a “X11 server on a card”… I’m intentionally writing ‘being sold’ because that just one way of usage.
More generally spoken, it’s an ISA card with a Transputer (with 4-12MB RAM) and a graphics controller on it, an Inmos G332 to be precise. That Transputer controls the G322 and can therefore create graphics onto a VGA screen connected to the board.

As an avid reader of GeekDot, this should sound familiar to you because it was the typical 90’s approach of High-End Graphics… see the MiroTiger or the SPEA i860 boards.

With the right software being booted into the Transputer, it may act as a X11 (R4) Server… but it could also used as a native Transputer system, e.g. running Helios or some OCCAM code and still create graphics with the G332 controller. As a nice add-on, the card features two size-2 TRAM slots, so more transputers or devices could be added into the equation.
Ah -and as a side note- most people called the B020 “the BOZO“… not very kind but it gets a credit for its pre-l33tc0de usage.

This is how my ‘rev.2’ looks in it’s full glory:

B020full_small

As usual, we go into more detail looking at each half of the card.In this full-view, you can identify the two TRAM-slots by the yellow markings at the lower edge of the card saying “SLOT 0” and “SLOT 1”.

So let’s start with the right half of the B020:

B020right

At the right-most edge, there’s the VGA-in and VGA-out connector, so that you can loop-through the video signal from your regular graphics-cards (a fancy ET4000 for example ;-)).

Between the two VGA connectors is an 8-pin Mini-DIN connector, which has this pinout:

1 Error/ (in)
2 Analyse (out)
3 Tram2 Link 2 out
4 Ground
5 Tram2 Link 2 in
6 Onboard T800 link1 out
7 Reset (out)
8 Onboard T800 link 1 in

The upper 2/3, starting with the golden IC (IMSG322F-85F) is all graphics part. Which consists of the G322 controller, 1MB VRAMabove him and some latches and 3 PALs on the left of it.
Below the G322, next to the slot-connector is the ISA interface. That’s handled by a PAL (with the sticker peeling off) and 4 buffers/transceivers and a bit more “chicken food” around them.
Of course there’s also the inevitable 5MHz oscillator which is needed in every Transputer household.

Let’s move to the left half of the card:

B020left

This is ‘where the music plays’. At the lower edge of the right side you can spot the C012 guarded by two PALs (with white labels) who handles the ‘Bus data-to-Transputer’ translation. The two PALs are most likely the subsystem (i.e. handling Reset, Error, the 012’s RS0/1 registers etc.).
Then there is the King of the Hill, the Transputer himself – a T805-25 in a quite rare PLCC packaging. To his left are 4MB of RAM in ZIP-packaging, most likely configured and controlled by the two PALs below him.
Finally, there are 4 SIMM slots to expand the Transputer memory further to a whopping 8MB! As you can easily see, there are unpopulated solder-pads between the ZIP-RAM ICs (as well as the VRAMs). My assumption is that you could order this card with 8MB soldered, which lead to a max. RAM of 12MB, like mentioned in the brochure on Rams page. But I fear the PALs of the RAM-subsystem need to be programmed accordingly to support the larger amount of RAM (added later: And boy was I right!).

So much for the theory of the B020… and it might stay so for some time, because my B020 is was broken 🙁

But I’m working on it and one fine day you will see an X11 server in an XT-PC at blazing speed 🙂

B020LA

Resurrection

[Nearly 4 weeks later…] As promised, I worked hard on getting the BOZO back to life. It started with weeks of tracing connections of the complete bus-interface, ie. ISA-Bus to the buffers/transceivers, the PAL next to the ISA-bus, the C012 link-adapter etc.etc…
Then, knowing the signals now, came days of staring at the logic analyser, many many mails with Mike Brüstle and lots of hacking with DOSes debug.exe.
Finally it became clear that the PAL 0211 was somehow “confused” and therefore not controlling the bus as it should do. Because there’s no documentation neither of the B020 nor of its PAL equations (like there is for e.g. the B008) I had to go the long and bitter route of reverse-engineering.

So first thing is to remove the faulty part which proved to by a bit tricky – which is a short story of its own available in the DIY-section of this page.
So in went a proper socket and a copy of the original PAL (now a “modern” GAL). The copy proved to be a 100% one… which means not working as it should. But somewhat different than the tries I did before…it was erratic. For example only on I/O port 0x200 I was able to get something useful out of the signals (eg. a well timed /OE for the ‘244 buffers) while jumpering the IO range to 0x150 or 0x300 gave total silence. So I went one step closer to the bus, listening to what the PAL gets on its A4-A11 lines.
There you go! A4-A6 were constantly high and A7 glitching while A8-A11 was happily bouncing. According to my tracing A4-A7 were handled by a different ‘244 than those working. So Sauron turned his eye on the little SMD octal buffers… out went the solder iron and all pins of all buffers were re-soldered.

Lo and behold, the next ispy run on 0x200 worked! Heureka!

Next up: Why is it working on port 0x200 only? Learning from the ‘244 issue I checked all solder-pads concerning the jumper for port 0x150 (that’s J5). Everyhing looked nice and re-soldering didn’t change a bit. So it was the PAL who was suspicious… again.
Luckily INMOS refrained from setting the protection fuse on the PAL, so not only I was able to copy it but also the read its programming. I was thinking about recoding the whole thing and started doing so when I stumbled across a missing “.oe” (Output Enable) for exactly that input-pin connceted to the J5 jumper. So quickly adding

/f14 = gnd
f14.oe = gnd

to the code, compile, burn, plug it into the socket. Tadaa! The B020 now also talks to 0x150… and 0x300 respectively.
[BTW: This happened right at the 30th birthday of DOS. Don’t know if it’s a good or bad sign… at least DOS is my favourite debugging tool 😉 ]

For those who happen to have the same issues with their B020 (erm, that’s probably 1-2 people on this planet by now), here’s my documentation and .jed file for the “PAL 0211B”.

Enhancement

Well, as said in Axels 10 commandments: “Thou should not leave a socket unpopulated!”
And my B020 has a lot of ’em. So I started with the Transputer RAM.
In preparation I checked if the empty solderpads (as you can see in the above pictures) do get all the required signals… well, kind of: I was lazy and just checked /OE with my logic analyser and there was a signal in sync with the one of the already installed RAM. Also all Data- and Addresslines were connected… so it was worth a try!

But I should have known it before: After clearing all 160 pin-holes from the solder, puting self-made zig-zag sockets in place and filling them with the appropriate 1Mx4 RAM chips ispy/mtest still reported just 4MB of RAM :-/

Obviously PAL0214, sitting right below the ZIP RAM was playing an important role here. So the good old routine of “buzzing it through” began. After 2 days I pretty much figured out the purpose of each pin of PAL0214. Yes, it must be the RAM decoder.
So there was no way around removing, reading and replacing it… like with PAL0211.
This time I got myself a desolder-gun – a hell of a facilitation! Again, luckily the PAL wasn’t protected so reading it wasn’t an issue.
As expected, the decoding for the 2nd RAM bank was missing – no idea where the initial measured /OE signals came from.
After the decoding of 1st 4MB ZIP RAM the SIMM RAM followed immediately – so I had to squeeze in the 2nd bank of 4MB ZIP RAM and let the SIMM RAM follow after that. Even my VHDL/PAL Equation skills are quite limited it wasn’t as hard as I expected. Compile, program, put the new GAL16V8 in place, start ispy/mtest, cross fingers….

Using 150 ispy 3.23 | mtest 3.22
# Part rate Link# [  Link0  Link1  Link2  Link3 ] RAM,cycle
0 T805d-25 340k 0 [   HOST    ...    ...    ... ] 4K,1 12284K,4;

Yay! 12MB including the SIMM RAM!! Here’s the view of a “BOZO to the MAX” 😉

BOZO_full_RAM

Double the VRAM!

The last step in bringing the BOZO “to the max” was adding more VRAM, i.e. 2MB.
So first you have to get two more AM29C841 9-bit latches (or compatible parts, like those 74FCT841 I’ve used) in SO-24 package. That SMD package is still quite easy to solder onto the board. Here’s a before-and-after picture:

BOZO_VRAM1

On the right-hand side of the latches you can already spot the empty solder-holes for the VRAM. Like with the DRAM expansion before, I made my own sockets by using two 1×14 pin sockets. This is what the now fully populated VRAM section looks afterwards:

BOZO_VRAM2

The first run using “ixcheck” from the INMOS iX package went without errors and the /OE signals of all the ‘841s are driven correctly according to my logic analyzer. I was a bit scared that INMOS might have again not completely programmed the PALs (left to the latches) to support this memory expansion but it seems I’ve been lucky.

Next up: Find a tool using the 2MB and confirming that the extra VRAM is actually usable.

Inmos TRAMs

INMOS was obviously the first manufacturer of TRAMs. Over the time you can clearly see the progress the technology made over the period Inmos TRAMs were made. Starting with comparably big sizes, using DIP chips etc. the last of its breed were highly integrated PCBs cramped with SMD parts and chips.

The IMS B401 started it all. A 32KB SRAM size-1 TRAM for a 32-bit Transputer. Here’s the prototype, the final product and a picture of its schematic:

B401_Layout

The “huge” IMS B403 1Mbyte DRAM Size-4 TRAM:

IMSB403_1

The IMS B404 (2MB DRAM, Size-2) is where the fun starts. Size-2 is just OK not to totally hog your mainboard and 2MB is what you need for OCCAM or HELIOS to make something useful.
Actually, the B404 has 3 “levels” of RAM. 4K internally in the Transputer, 32K SRAM and 2048K DRAM. As they are superimposed (i.e. overlapping), the total amount is still 2MB with different access speed at 0-4k (1 cycle), 5-32k (3 cycles) and 4-5 cycles above.

IMSB404

There were also the IMSB402 (8Kbyte Size-1) and the IMSB405 8Mbyte Size-8 TRAMs of which I have no pictures yet.

The IMS B407 was an Ethernet TRAM (Size-8!), proving that a TRAM doesn’t need to be necessarily a Transputer Module for number crunching only.

imsb407

The IMS B408 and B409 were both part of a graphics system, so you didn’t the to have the host to render the graphics – which would have been much slower than those TRAMs.
The B408 was the “drawing pad image storage” (1,25MB dual-ported RAM), while the B409 had the timing generator and the CLUTs:

imsb408

imsb409

The IMS B411 can be seen as a new era: 1MB DRAM on just a size-1 TRAM made possible by the use of small ZIP-packaged DRAMs… which were very expensive, of course.

IMSB411-3B

The IMS B415 is a simple Transputer-Link to RS422 converter. This way, link connections can span up to 30 meters.

imsb415

The IMS B416 features a 16bit T2xx Transputer and 64KB SRAM

IMSB416_10P

The IMS B417 is a massive beast: 4MB on a size-4 TRAM… well, still better than the B403.

IMSB417-5A

The IMS B418 contains a 16-bit T222 and 256KB Flash ROM… quite modern stuff those days and probably a good way to boot your HELIOS system from 😉

IMSB418_1

The IMS B419 combined the two huge B408 and B409 modules into one size-6 TRAM. A nice graphics TRAM with 2MB DRAM and 2MB VRAM…

imsb419

The IMS B420 featured a ZORAN ZR34325 DSP (45 MFLOPS peak) with its own 256KB SRAM besides the obvious T800 Transputer and 4MB DRAM. My guess is, that it is comparable to the Quintek QVA-T TRAM.

imsb420_1

The IMS B421 enabled a Transputer system to talk to GPI/IEEE-488 buses … like laboratory equipment or your old Commodore Floppydrive 😉

IMSB421_1

Ahh, the real-deal… the IMS B426 is what you want in your TRAM collection. 4MB RAM on a size-1 TRAM. This is where Helios really loves to run on.

IMSB426-5A

…and here’s a more recent version of the B426 (rev “-16C”, 1993), this time all SMD, featuring the latest T800 in a nice TQFP case

IMS_B426

Well, the IMS B426 is great… as long as you can’t get the IMS B427 😉 This Size-2 TRAM features a whopping 8MB RAM. Enough for running Helios and X11 on it.

IMSB427

Something more important than ever is a fast network connection. The last ethernet TRAM from INMOS was the IMS B431.
10mbps is the maximum you get… well, more wouldn’t make sense given the Linkspeed of 20mbps.

IMSB431

The IMS B437 is a very neat little thing: A graphics TRAM as size-2 TRAM! A nice 25MHz SMD T805 and a G332 colo(u)r video controller. Rare as chicken teeth!
It looks like this was designed by Contex Systems Design Ltd. and OEM’ed by INMOS.

IMSB437_1

And when you thought you’ve seen them all, another one pops up:
The mighty IMS B438. As the name-code suggests, it’s an updated B437 – very updated and IMHO the ultimate graphics TRAM: 2MB VRAM, 4MB DRAM, 30MHz T805 and the last and final 32Bit video controller G335@130MHz. I really, really want one. Badly!

B438

Because it’s a beauty, here’s its back, too:

B438_back

The T9000

Again, Geekdot.com might be the only place left in the World Wide Web where you can get more information about the flopped mysterious T9000 than what you will find on Wikipedia.

The T9000 never made it into the market in higher volumes. I don’t have any numbers on how many T9000 were actually produced but only a few were evaluated outside INMOS by universities and science facilities, most famous is probably the CERN, which did some benchmarks and tests. Here’s a (now funny) promotion video of those days:

By another lucky incident I got myself a HTRAM mainboard for Christmas 2011. It’s the QT9M from Quintek, a spin-off of INMOS back then. Three slots of this 6-slot board were populated, one featuring the mythical T9000!

This is the naked board, the QT9M:

QT9M_total

At the first look you can see it’s a damn huge thing. At least one inch taller than the slot-backplate. But this way Quintek could squeeze 6 size-1 HTRAM slots (3×2 slots, horizontally) onto the card, while the “official” INMOS B108 (PDF) board just features 2 slots (but size-4, vertically).

The HTRAM slots in detail

(-> For a complete description of HTRAM visit this chapter of geekdot.com)

Not all slots are offering the same features, though. Slot-0, slot 3 (the two middle ones) and slot-2 (lower left) offer the standard pin-blocks 1 and 2 as well as block 3 and 4.
slot-1 (top-left) offers the same pin-blocks but also a small 4×2 horizontally aligned block on its lower edge.
The same 4×2 pin-block can be found on HTRAM-slot 5 (upper right), but this time only the standard pin-blocks 3 and 4 are featured.
Finally, on slot-4 (lower right) only the standard pin-blocks 3 and 4 are featured, too, but it also has a 5×2 pin-block centred to the opposing edge of the standard pin-blocks.
My assumption is that pin-blocks 3 & 4 offer a direct connection into the T9000 memory map, so I’d call slot-4 & 5 “mem-mapped-ony slots”.

The interface logic

The interface logic on the board was kept surprisingly simple. Especially when you compare it to the already mentioned IMS B108, which had an FPGA and two C101 protocol converters.
Without any C101 you can be assure that there won’t be any downward compatible communication as we are used to with the T4xx/T8xx Transputers (OS-Link), so this will be a hard nut to crack :-/

Let’s have a closer look at the “logic side of things”:

QT9M_detail

There are only 2 ICs: An AMD MACH 210 CPLD and an AMD 4701, which is an 8-bit bi-directional 512 byte deep FiFo Buffer and parity-generator. Given the fact that there’s a place left for IC2, I assume a 2nd 4701 could go there to make the interface 16bit wide (as all 16 ISA data-lines are connected on the slot-connector).

This leads to my totally uneducated guesses, that:

  • The DS data-links are directly connected (through the FiFo) to the ISA bus
  • The control registers of the AM7401 needs to be set – if the default state is not sufficient

Quite a solid foundation to those assumptions is the following schematic I was able to dig out of the depth of the internet describing an IEEE 1355 interface. IEEE 1355 is the standard which was created out of INMOS’ DS-Links.

DS-Link_CPLD

Everything else…

The function of the colourful DIP-Switch is yet completely unknown. It’s clear that some of these switches will set the base-address for communication with the ISA bus.
The sense behind the two Sub-D connectors at the cards backplate are unknown, too. Maybe some link-outs?

(To be continued…)

Mandelbrot and Video

Like with the C64, this is probably the reason you came here:

The friggin’ fastest Mandelbrot displayed on a IIgs, ever! 😉

Yeah, you’re right, it’s not the fastest Mandelbrot calculated by an IIgs (its very own 65c816 CPU, that is)… but hey, it’s still kinda cool – and sooooo much faster!

Okey-Dokey, here we go, a complete Mandelbrot in 60s. This time in colorand zooming in! I couldn’t do that on the C64 as the C-Compiler didn’t natively support IEEE754 doubles (like Orca-C does) and having a mouse also helps a bit, too:

Wow, that was nice, wasn’t it?! (Sorry for the shaking, need to get a tripod soon)
Especially when you take in concern how ‘far’ the native 65c816 code got during the video on a ‘sped-up’ 10MHz TransWarp GS.

Like with the T2C64 version there are surely several things which could be improved, but the IIgs (even at native speed) is well capable to handle the little bit of extra work. The limiting factor is the bus-speed, i.e. how quick the Transputer can push his data into the host (IIgs). You can clearly see that by the time it took the display the 3 zooms: They all took about 60 seconds, even each zoom means more calculations as the iteration is doubled each zoom, in this case 32, 64, 128.
The Orca-C source/binary of this demo -and the previous AppleSoft sample- is available here (zip’ed PRODOS disk-image). It won’t make much sense without a T2A2 and is GS/OS-only as it uses QuickDraw II and the EventManager (for mouse & keyboard).

Final words: Don’t get too excited about the acceleration of the IIgs… it’s not accelerated at all. It’s more like a co-processor attached to it. And even then, you’ll need something really calculation-intensive to justify the time you’ll loose due to communication between the Apple and the Transputer. A single square-root for example wouldn’t make much sense.
But OTOH, that’s exactly things are handled with the Innovative Systems FPE (using a M68881). So it might be worth evaluating. Maybe I’ll write a SANE driver if I have the time to get a deeper understanding of GS/OS.

As my two targets (C64 & Apple II) are working now, I’m thinking about creating a ‘real PCB’ in the medium term. Given the rarity of the Link-Adaptor (Inmos C012) I’m currently looking into the possibility to use a larger CPLD to move the C012 into that. This would actually make this ‘project’ a product to buy.
But don’t hold your breath, need to get an eval kit first. Then some 100 days of fiddling, cursing and crying… and then more.

well, 6 years later it happened: The T2A2 became a proper PCB design… and got some additions too!

1st basic example

Like with the T2C64, the CPLD on the T2A2 maps the C012 registers into the memory area of the used slot, using just 6 addresses starting at 0xc080 + (SLOT# * 0x10). So for e.g. Slot 4 this would be:

  • BASE        (0xc080 + (4 * 0x10))  = 49344
  • Data in:    BASE                   49344
  • Data out:  (BASE + 1)           49345
  • in-status:  (BASE + 2)           49346
  • out-status:(BASE + 3)           49347
  • reset:       (BASE + 8)           49352 (writing)
  • analyse:   (BASE + 12)          49356
  • errorflag:  (BASE + 8)           49352 (reading)

With this ‘knowlege’ we can start talking to the Transputer… and to make our first babysteps we’re using BASIC. It’s pretty much the same code as used on the C64 with the exception that there’s no “elegant” timeout handling due to the missing clock in AppleSoft, so you have to wait a bit longer until you get the printout in the end.
For details about what’s going on here, see the C64 page.

 10  PRINT "INIT TRANSPUTER"
11 BASE = 49344:IN = BASE:OUT = BASE + 1:IS = BASE + 2
12 OS = BASE + 3:RESET = BASE + 8:ANA = BASE + 12
13  POKE RESET,0: POKE ANA,1: POKE RESET,1
20  REM CLEAR I/O ENABLE
21  POKE IS,0
22  POKE OS,0
30  REM READ STATI
31  PRINT "I STATUS: ";( PEEK (IS) AND 1)
35  PRINT "O STATUS: ";( PEEK (OS) AND 1)
40  PRINT "ERROR: ";( PEEK (RESET) AND 1)
45  PRINT "SENDING POKE COMMAND"
46  POKE OUT,0
50  PRINT "O STATUS: ";( PEEK (OS) AND 1)
58 :
59  PRINT "SENDING DATA TO T."
60  POKE OUT,0: POKE OUT,0: POKE OUT,0: POKE OUT,128
61  POKE OUT,12: POKE OUT,34: POKE OUT,56: POKE OUT,78
70  PRINT "I STATUS: ";( PEEK (IS) AND 1)
79 :
80  PRINT "READING FROM T."
90  POKE OUT,1: REM PEEKING
100  POKE OUT,0: POKE OUT,0: POKE OUT,0: POKE OUT,128
110  PRINT  PEEK (IN); PEEK (IN); PEEK (IN); PEEK (IN)
128  DIM R(4)
129  PRINT "SENDING PROGRAM TO TRANSPUTER..."
130  FOR X = 1 TO 24
140  READ T: POKE OUT,T
150  WAIT OS,1
160  NEXT X
170  PRINT : PRINT "READING RESULT:"
175 C = 0: REM RETRIES
180  IF C = 10 GOTO 220
181  FOR X = 0 TO 5000: NEXT X: REM DELAY
189 ER = ER + 1: IF ER = 10 GOTO 220
190  IF ( PEEK (IS) AND 1) = 0 GOTO 181
195 R(C) =  PEEK (IN)
200 C = C + 1:ER = 0
210  GOTO 180
211  REM ------------------------
220  IF C = 1 THEN  PRINT "C004 FOUND"
230  IF C = 2 THEN  PRINT "16 BIT TRANSPUTER FOUND"
240  IF C = 4 THEN  PRINT "32 BIT TRANSPUTER FOUND"
250  IF C = 0 OR C > 4 THEN  PRINT "COULD NOT IDENTIFY""
1000  DATA 23,177,209,36,242,33,252,36,242,33,248
1001  DATA 240,96,92,42,42,42,74,255,33,47,255,2,0 

Ok, if this is running fine, i.e. a Transputer was actually found and your Apple didn’t went up in smoke, we’re set for some serious numbercruncing… and a video! Yay! We love Videos, don’t we?

(I’m skipping the other sample code available on the T2C64 page. It’ll work the same on an Apple, so no need for redundancy)

T2A2

Ok, here it is. The T2A2. What sounds like a robot from StarWars is actually a Transputer to Apple II interface. Its design is pretty much the same than its cousin, the T2C64, with the addition of some buffers to behave like a good citizen of the Apple-II bus. So this is how the little beast looks as of now (v0.5):

T2A_front

To use the Apple-II bus-connector of the 8Bit-Baby (another brainchild of mine), I had to shuffle the parts around a bit. On top you’ll see the same TRAM used on the T2C64… 20MHz T800 Transputer, 128KB SRAM. Right below it is an LS245 octal bus-transciever to handle signals like DevSel, R/W and A0-3. To its right its the IMS012 Linkadapter converting 8bit parallel bus into INMOS’ serial link-protocol. Below that, there’s the silver 5MHz oscillator to clock the IMS012 as well as the Transputer and another 245 to buffer the data-lines (D0-7). Finally on the left bottom there’s the CPLD which handles the Analyze, Reset and Error lines of the Tranputer as well as chipselect and such (Thanks to Mike for helping out on VHDL here!).

The picture above is the first Prototype and it’s finally working… even it’s a nightmare to look at its back side 😀

T2A_back

But because I’m a Commodore guy who wasn’t able to afford an Apple II in its hey-days I had to start from scratch and learn a lot…
That said, when I finally could afford an Apple II system, I went straight for the IIgs. I think IIgs is the perfect platform for an 8-bit Transputer interface given the amount of available/adressable RAM, native access to harddisks and a decent screen resolution.

Because of its simple design, the T2A2 should also work in any Apple-IIe etc. There’s no ‘firmware’, no EPROM. Just plain simple reading and writing to some (slot)specific addresses.
Due to its close relationship to the T2C64, programming is quite similiar. As a matter of fact I just slightly changed the examples I’ve used on the C64… which is the beauty of the idea.

So jump to the next post to see the first little test proggie in AppleSoft BASIC…

3rd sample and video

And now something which you knew it would come: Mandelbrot time! 😉

I don’t want to bore you with all the details before you had it seen in action… so here we go:

“Man, that was brilliant! And even you had a lot of geek-babble in there, I want to know more!”
Ok Timmy, let’s go into detail…

Like I said in the video, the Transputer is finally doing something for real, he’s actually doing the most of the work, crunching through a 320x200x8 Mandelbrot, 32 iterations in double-precision floating point. The code itself was written 1988 in OCCAM by Neil Franklin and is available on his page.
This shows the general beauty of Transputers: If the code is written flexible enough to fit into any topology, it’ll run on any platform!

That said, I ran into a certain limitation on the C64. The Transputer mandelbrot executable expects the initial data (resolution, coordinates, iterations) to be send in a specific order and format. While the order isn’t the problem, the format is: The coordinates have to be doubles (C-lingo i.e. 64bit IEEE 754 compliant float). The C-Complier I’ve used for this demo (CC65) doesn’t now a flying s*** about floats or even doubles.
So to get that demo done ASAP I tricked myself a bit and used the same technique we’ve seen in my 1st demo when POKEing something the Inmos-way:

To get the coordinates for left (-2.0), right (1.0), top (1.125) and bottom (-1.125) over to the Transputer they had to be converted into 64bit IEEE 754 format, ordered into little-endianess and finally put into an array like these:

static char left[] = {0x00,0x00,0x00,0x00,0x00,0x00,0x00,0xc0}; //-2.0
static char right[] = {0x00,0x00,0x00,0x00,0x00,0x00,0xf0,0x3f}; // 1.0
static char top[] = {0x00,0x00,0x00,0x00,0x00,0x00,0xf2,0x3f}; // 1.125
static char bottom[] = {0x00,0x00,0x00,0x00,0x00,0x00,0xf2,0xbf}; //-1.125

Yes, that’s a bit awkward, but it was OK to get a fast start. The ‘problem’ with this quick-hack is that the demo is pretty static, i.e. no zooming into the Mandelbrot. If you know of a quick way to create IEEE 754 compliant doubles from a long (which is the biggest floating point variable CC65 can handle, so StringToDouble() isn’t an option here) I’m happy to hear from you.
Of course I could have the Transputer do the typecasting but in this case, as part of a demo, I wanted to keep the original binary untouched.

In the video you saw (or didn’t because of the blurry picture) that the timer printout was about 70s… and as said, normally it takes about 60s to complete the fractal – and it did in the video, too! Watch the video-timer or check with a stopwatch. What I’ve forgot to take out from the timing was the actual upload of the code into the Transputer.
The Transputer binary is in this case a bigger array in the C-source, so it’s not being loaded from floppy but directly pushed to the Transputer after it was initialised. This takes some extra time which also went into the stopwatch timing… I’ll correct that in a later version.

“Later version” is a good catchword. If this wouldn’t be just a demo for now, there are obviously plenty of ways to optimise things:

  • First of all one should take off the burden of converting the colors from the C64 and let the Transputer do that.
  • My second idea would be to reduce the communication overhead (polling) by having the Transputer to render the whole screen into his own RAM and when done have it ‘pumped’ down to the C64
  • Yes, DMA would be cool but that’s not possible (yet)

Ok, that’s about it for now. The T2C64 is still in its prototype stage and I can image many more cool things to add… but first I will have a ‘proper’ circuit board being made.

Final words: Don’t get too excited about the acceleration of the C64… it’s not accelerated at all. It’s more like a co-processor attached to it. And even then, you’ll need something really calculation-intensive to justify the time you’ll loose due to communication between the C64 and the Transputer. A single square-root for example wouldn’t make sense at all. 100 sqrts in one go would certainly do.

Of course adding another linkOut/In to the T2C64 to get more Transputers involved into the calculation would be the final step. This is planned for the next version of the hardware but the bigger part of the work would be a complete rewrite of the Mandelbrot code to have it broken down to parts being run in parallel on each Transputer… which closes the loop to today where programmers are trying to wrap their brains around multithreaded programming. 22 years after the first Transputer was released 😉