Tag Archives: Transputer

RS422 Differential Adapter

This is a Q’n’D but handy design which I was planning for a long time, but you know how things go… A differential adapter is a simple device which translates a single signal/data line in one positive and one negative line, i.e. one is carrying the inverted signal of the other. This is also known as RS422… But why do we need such thing?

Well, differential signals are/were quite common in the world of Transputing to deliver OS-link signals over longer distances (<10m/30ft) or noisy environments and good examples are systems from Parsytec, the AVM-T1, ISA interfaces like the HEMA TA2 or even some custom measure/data-logging solutions.

So to connect to such systems using a non-RS422 (i.e TTL) interface card, e.g. the INMOS B004/008 or the Gerlach-Card you’ll need such an differential adapter.

Technical data

A very simple design on a 5x5cm PCB. All done in trough-hole technology (THT) to keep this a simple DIY project.
It all based around two AM26LS32 receivers and one AM26LS31 transceiver, some resistors and an unavoidable LED  😉

Usage is very simple, too. Just connect the TTL signals to the standard 2×5 shrouded PCB header connector and RS422 signals are provided on the 2×7 header connector on the opposite side of the PCB.
The TTL connector is the same as the one used on the Gerlach card or my T2A2 Apple II interface:

As you can see, pins 9 & 10 can provide 5V to the PCB so that no external power-supply is needed.
If you connect a system not able to provide power to the differential adapter there’s also a 2pin connector provided on the PCB itself.

The RS422-side of the adapter has this pin-layout:

Nice-to-know: You could also use this adapter to simply invert signals. For example if you need a positive ERR signal instead of the not’ed, you simply take the positive pin (i.e. even pin-number 6 in this case) of the 2×7 connector (ignoring the negative).
Mind there’s no power supply pin on that side and pins 11/12 are not connected.

T2A2 for everyone

After quite some years (and a handful Apple II users requests) I felt the urge to finally put the T2A2 prototype (read that post to get a more general understanding) into a real expansion card… and here it is: Say “Hello!” to the T2A2 version 1.1!

It came a long way…

t2a2_evolution

…in more detail:

t2a2_front

As you can see, the final T2A2 is much smaller than the prototype (which used an 8bit Baby one-for-all PCB) and offers many additional features

  • 2 size-1 TRAM slots (or one size-2) – double the processing power!
  • Low-power, low-profile parts used where possible (3.3V CPLD, HCT logic)
  • External Transputer-link available as edge connector – extend your network to “near infinitum
  • Jumpers for LinkSpeed and optional power to the Transputer-link connector
  • Fully buffered to be a good Apple II bus citizen
  • Works in any slot set to “your card”

Beyond this, everything said about the prototype is still true.

Most important:

a) It’s not tested in the ][ or ][+. I simply don’t own one of those.
b) The T2A2 won’t instantly speed up any of your Apple II[e/gs] applications.

It’s more like a co-processor attached to it. And even then, you’ll need something really calculation-intensive to justify the time you’ll loose due to communication between the Apple and the Transputer. A single square-root for example wouldn’t make much sense – but having a complex algorithm (like the Mandelbrot fractal in my demo)  does absolutely make sense, as you just pass the parameters to the Transputer and let him do the sweating.

But on the other hand, FPU cards like the Innovative Systems FPE (using an M68881) did just send instruction by instruction. So I somehow fancy the idea writing a SANE driver for GS/OS to integrate the T2A2 more transparently.

Want your own T2A2?

So now you’re keen to get one yourself? Please check this list first:

  • Do you have a TRAM already? (*)
  • You are aware that there’s no real software (yet) besides my little Mandelbrot demo?
  • You are keen to program something yourself – or are fine to wait until somebody else did?

While the Apple II side of coding is pretty easy, you have to get a grip about the Transputer development, too. That includes a DOS/Windows (<=XP) setup and some knowledge of C and/or OCCAM.
Plenty of Dev-Docs are available here.  
I suggest using the INMOS cross-compilers for C or OCCAM. An alternative C compiler came from LSC, which might suit you more if you don’t like the INMOS stuff.

Ok, so you’re still with me… so I have the first batch of 20 T2A2 PCBs ready which I will populate on-demand, and for 50€/55USD (plus shipping) one of them can be yours.
(*) I might be able to offer you a TRAM, too. The price depends on available model, RAM size and CPU used.

⇒ drop me a mail tonobody likes SPAM
(Sorry, you have to type that into your mail-client – nobody likes SPAM, so do I)

Some more technical details

Here’s a T2A2 with a size-1 TRAM installed in Slot-0:

t2a2_with_tram

The T2A2’s CPLD programming can be updated any time through a JTAG port (the lower 2×5 pin-row at the edge).
The jumper above it can be used to set the linkspeed for the TRAMs (10 or 20mbps). If you look very close, there’s a tiny LED next to that jumper. It’s the error LED controlled by the CPLD.
The next single jumper enables the VCC pins on the external Link connector, meant for (small!) external extensions. This connector is the same used on the Gerlach card and is very convenient because of its ubiquitous standard 2×5 shrouded pcb header connector. Here’s the pinout:

As said, the V1.1 T2A2 offers two size-1 TRAM slots. Before plugging in 40MIPS of raw processing power consider the amount of juice being pulled there. Depending on the amount of RAM and load a single TRAM can use up to 800mA of power!  😯
Given the max. of 4A on a standard IIgs power-supply, two TRAMs could bring your souped-up GS into trouble… it’s better to use the external connector with just one smaller TRAM or even simply bridge the Link0In/Out pins with Link3In/Out so that the T2A2 works as a TRAM-less adapter.
That said, there are size-2 TRAMs in existence which will snugly fit  and won’t hurt that much.

T2shield – Arduino to Transputers

[The T2shield is currently a work-in-progress project – the first alpha hardware is operating but lots of coding lies ahead!!]

The idea for the T2shield was born when I thought about getting recent peripherals like SD-cards, Ethernet, displays etc. into a Transputer network.
Sure, there are the old, original TRAMs like the STM228 SCSI controller, the B431 Ethernet TRAM or quite some choice of graphics controllers. But they’re all rare like chicken teeth these days which means unaffordable when they rarely pop up on ePay.
Rebuilding them is also a no-no given the obsolete parts they’ve used… and to be honest: A noisy, vintage SCSI-1 drive isn’t what I thought of.
So after some time I came to the conclusion it will be the easiest and cheapest way to re-use what you can buy for a few bucks from China these days: Arduino shields.
As a lucky incident such a shield perfectly fits onto a size-2 TRAM – Yay!

Without much further ado, here is the T2shield (v0.1) with a shield loosely put on top to get an idea…

t2shield

Arduino shields mainly use the SPI bus for data transfer. With its 10+Mbps SPI is much faster than e.g. I²C (which I used for my T2I2C TRAM) and can cope with fast(er) peripherals.
But this also ruled out a slow IMSC011/12 link-adapter design like the one used on the T2I2C. Also there’s no of-the-shelve SPI master controller to simply glue a bus to it.
So the design turned out much more advanced this time:

  • A 16bit Transputer (with 64KB SRAM) handles the high OS-link speeds as well as the ‘glue’ like File/Ethernet handling.
  • A comparably big CPLD (100pins, 128 marcocells) implements the SPI master controller as well as handles the T2xx memory- and interrupt-handling.
  • All Arduino digital-pins and nearly all analog-pins are connected to the CPLD to adjust to any available shield.

Also this new TRAM design is a “first” for me in many aspects:

  • 1st usage of a 16-bit T2xx Transputer (vs. 32bit)
  • 1st serious utilization of a CPLD (besides the baby-steps taken with the T2A2)
  • 1st Size-2 TRAM

Initial status

Transputer

100% done

A T222 Transputer is -after fixing some stupid errors- happily running and has control over his external RAM.
Here’s the first sign of life by RSPY (a brilliant ispy‘ish tool of Michael I really need to write a post about soon):

# Part rt Link0 Link1 Link2 Link3 RAM@cycle
0 T2   20 HOST  ...   ...   ...   4K@1,56K@2|

That means: A T2xx @ 20mbps linkspeed is connected to the host (PC) and has 4K internal and 56K external RAM… wait a second! just 56K? Yes, that’s because the internal 4K overlaps and RSPY leaves the up-most 4K untouched as most peripherals are mapped there and poking there could create unwanted effects…

CPLD

10% done

All basic Transputer controls are in place, LEDs are controlled by the CPLD

TODO: Implement the SPI master in VHDL. That’s THE biggest challenge for me being a total VHDL noob. Again, I wouldn’t got this far already without the tremendous help from Michael Brüstle, my VHDL Jedi-Master.

Drivers

0% done

Write the (Helios) driver to handle file and/or network access. This should be (optimistically) be compatible to existing file- or Ethernet ‘servers’. This will probably be the most time consuming task.

Progress?

This is (also) the first time I’m posting about a project which is not finished. So I will keep you posted about the progress with new posts in this “T2shield” chapter of this wonderful little page.
If you do own

  • a Transputer network
  • a free size-2 TRAM slot
  • a T2xx Transputer
  • some VHDL/Occam/Helios driver skills & knowledge

shoot me a mail and I might provide you a T2shield to join the development.

That said, here’s a first little sign of life:
The Transputer writes numbers to a mem-mapped address (0xFC00) while the CPLD reads the lowest 2 bits and displays them by 2 of the 3 LEDs on the T2shield

Inmos B008

Introduced at the sunset of the Transputer era, the INMOS B008 was the successor of the B004 of which it dramatically differed:

  • 16-bit ISA connector (for Interrupts/DMA, it’s still an 8 bit data path)
  • 10 TRAM slots
  • C004 and T2xx on board

The usual source provides the full manual/documentation… down to the GAL equations.
All these enhancements enabled the B008 to create simple, yet powerful Transputer networks on a single expansion card and still makes it the perfect platform for todays Transputer retro experiments/fiddlings.

Here’s a late “Rev.F” version of the IMSB008:

IMSB008E

The previous revision E had a ceramic version of the T2xx and used GALs instead of the CPLD you can stop in aboves picture.

B008_revE

Clones

Obviously the Inmos B008 had clones as the B004 did.

Transtech TMB08

Well, I guess there’s nothing which Transtech did not clone and/or improve…
In this case they used a AMD MACH CPLD from the beginning, a PLCC version of the T225 and everything else was SMD.

TMB08

Alta SuperLink/XL Transputer PC Card

Alta Technology was a spin-off of Computer System Architects (CSA).
The SuperLink/XL board used TRAM modules like the Inmos B008, but featured a 20 MHz IMST225 Transputer to handle external interfacing and a beefed-up host interface design. Like the B008 it supported 10 TRAM’s

Altacor SuperLink XL

ARADEX TransAT Plus

Actually I wasn’t aware that ARADEX also made TRAM carriers… One of my first TRAMs was an ARADEX one, so this is a first for me, too.
Their interpretation of the B008 theme features just 8 TRAM slots (vs. the 10 “standard”) which are strangely enough sorted in ascending order (0-7) and not like the B008 order of 1, 5, 6, 2, 0, 4, 7, 3, 8, 9. Also the TransAT provides two 9pin D-subminiature connectors in contrast to the B008s 37pin connector – most likely its layout is the so called Aalener Link-Interface. That includes the RS-422 differential drivers and receivers next to the connector (AM26LS23’s and AM26LS33’s).

Concerning other parts, besides an C012 they used some sort of CPLD – presumably to control the 16 bit bus – in the photos I have, I can clearly see the traces coming form the ISA Bus’ D[8-15].
It also uses the IRQs 10-12.

Last fun bit: They named the two GALs with their JED filenames. Very handy 😉

aradex_transat-plus

Tuning the Mandelbrot benchmark

It’s an open secret that the CSA mandelbrot benchmark tool (available in my ‘basic Transputer tools‘ package) is one of my favorite benchmark and test-tool when playing around with my various Transputer toys.
One fine day I thought VGA with more than 16 colo(u)rs  would be nice… and the coding began. First step: Put the original source (well, already enhanced by a timer and some debugging) on github.

The original CSA Mandel program uses the official 640×480 16 color VGA mode (aka 0x12) and uses its own calls for that, i.e. no external 3rd party libs. Very manly 😉 but not very colorful…

Mandel_16

So I created the first branch (aka Mandel_3) added a more “modern” command-line options handling and dived into hand-coding VBE (VESA BIOS Extensions) matters. That was very instructive and fun… and the first results showed that I didn’t just got 256 colors now but draw speed was increased, too  😯

Look Mom! More colors:

Mandel_256

Running in host-mode (/t) on my P200MMX the initial screen took 6.6s vs 7.1s for 16-colors – so a difference of 0.5s or 7% should be much higher on Transputers, so I thought. And should this mean that bigger Transputer farms had been bottleneck’ed by the actual plotting of pixels?

Because 256 colors and higher resolutions (up to 1280×1024 depending on your VGA cards VESA BIOS) are fine, but even more colors are better, I branched the code a 2nd time (MANDEL_BGI) and replaced the VBE code by a BGI SVGA interface.
While originally Borland only supports VGA, there are 2 BGI drivers written by 3rd party developers which do support SVGA and up to 24-bit colors.
It’s commonly known that BGI is not the fastest graphics interface on planet earth… and the benchmark proved this:

P200MMX Orig VESA SVGA SVGA256
1 7.123 6.623 8.911 6.915
2 38.258 36.635 39.717 37.725

I was hoping the change would have more impact when running the same on my Cube system… well it didn’t:

65x T800 (integer) Orig VESA SVGA SVGA256
1 2.323 2.288 3.940 2.383
2 8.163 8.173 8.181 8.164

So as final conclusion, I will stay with the VBE SVGA drivers included in the V3.x code – it’s a good compromise between overall code/distribution size, comfort and speed.
The original VGA mode (0x12) will stay in the code forever to get comparable benchmark measurements – if you really need CGA/EGA/Hercules, you can always use the 2.x version.

The Cube

Meet The Cube – this is the Transputer Power-House successor to the Tower of Power, which was a bit of a hacked frame-case and based on somewhat non-standard TRAM carriers with a max. capacity of just 24 size-1 TRAMs…

The Cube hardware

This time I went for something slightly bigger  😎 …A clear bow towards the Parsytec GigaCube within a GigaCluster.
The Cube uses genuine INMOS B012 double-hight Euro-card carriers, giving home to 16 size-1 TRAMs – Parsytec would call this a cluster and so will I.
Currently The Cube uses 4 clusters, making a perfect cube of 4x4x4 Transputers… 64 in total. Wooo-hooo, this seems to be the biggest Transputer network running on this planet (to my knowledge)
If not, there still room left for more  😯

Just to give you a quick preview, this is what ispy responds when ran against the Cube:

Using 150 ispy 3.23 | mtest 3.22
  # Part rate Link# [ Link0 Link1 Link2 Link3 ] RAM,cycle
  0 T800d-24 276k 0 [ HOST ... ... 1:1 ] 4K,1 1024K,3;
Display all 64 lines
1 T800d-25 1.7M 1 [ ... 0:3 2:1 3:0 ] 4K,1 2048K,3; 2 T800d-24 1.8M 1 [ ... 1:2 4:1 5:0 ] 4K,1 2048K,3; 3 T800d-25 1.8M 0 [ 1:3 6:2 5:1 7:0 ] 4K,1 2048K,3; 4 T800d-24 1.8M 1 [ ... 2:2 6:1 8:0 ] 4K,1 2048K,3; 5 T800d-25 1.8M 0 [ 2:3 3:2 8:1 9:0 ] 4K,1 2048K,3; 6 T800d-24 1.8M 2 [ ... 4:2 3:1 10:0 ] 4K,1 2048K,3; 7 T800d-24 1.8M 0 [ 3:3 10:2 9:1 11:0 ] 4K,1 2048K,3; 8 T800d-25 1.8M 0 [ 4:3 5:2 10:1 12:0 ] 4K,1 2048K,3; 9 T800d-25 1.8M 0 [ 5:3 7:2 12:1 13:0 ] 4K,1 2048K,3; 10 T800d-24 1.8M 0 [ 6:3 8:2 7:1 14:0 ] 4K,1 2048K,3; 11 T800d-24 1.8M 0 [ 7:3 14:2 13:1 15:0 ] 4K,1 2048K,3; 12 T800d-25 1.8M 0 [ 8:3 9:2 14:1 16:0 ] 4K,1 2048K,3; 13 T800d-25 1.8M 0 [ 9:3 11:2 16:1 17:0 ] 4K,1 2048K,3; 14 T800d-24 1.8M 0 [ 10:3 12:2 11:1 18:0 ] 4K,1 2048K,3; 15 T800d-25 1.8M 0 [ 11:3 ... 17:1 19:0 ] 4K,1 2048K,3; 16 T800d-24 1.8M 0 [ 12:3 13:2 18:1 20:0 ] 4K,1 2048K,3; 17 T800d-25 1.8M 0 [ 13:3 15:2 20:1 21:0 ] 4K,1 2048K,3; 18 T800d-25 1.8M 0 [ 14:3 16:2 ... 22:0 ] 4K,1 2048K,3; 19 T800d-25 1.8M 0 [ 15:3 22:2 21:1 23:0 ] 4K,1 2048K,3; 20 T800d-25 1.8M 0 [ 16:3 17:2 22:1 24:0 ] 4K,1 2048K,3; 21 T800d-25 1.8M 0 [ 17:3 19:2 24:1 25:0 ] 4K,1 2048K,3; 22 T800d-25 1.8M 0 [ 18:3 20:2 19:1 26:0 ] 4K,1 2048K,3; 23 T800d-25 1.8M 0 [ 19:3 26:2 25:1 27:0 ] 4K,1 2048K,3; 24 T800d-24 1.8M 0 [ 20:3 21:2 26:1 28:0 ] 4K,1 2048K,3; 25 T800d-25 1.8M 0 [ 21:3 23:2 28:1 29:0 ] 4K,1 2048K,3; 26 T800d-25 1.7M 0 [ 22:3 24:2 23:1 30:0 ] 4K,1 2048K,3; 27 T800d-24 1.8M 0 [ 23:3 30:2 29:1 31:0 ] 4K,1 2048K,3; 28 T800d-25 1.8M 0 [ 24:3 25:2 30:1 32:0 ] 4K,1 2048K,3; 29 T800d-25 1.8M 0 [ 25:3 27:2 32:1 33:0 ] 4K,1 2048K,3; 30 T800d-25 1.8M 0 [ 26:3 28:2 27:1 34:0 ] 4K,1 2048K,3; 31 T805d-20 1.7M 0 [ 27:3 ... 33:1 35:0 ] 4K,1 1024K,3; 32 T800d-24 1.8M 0 [ 28:3 29:2 34:1 36:0 ] 4K,1 2048K,3; 33 T800d-20 1.8M 0 [ 29:3 31:2 36:1 37:0 ] 4K,1 1024K,3; 34 T800d-24 1.8M 0 [ 30:3 32:2 ... 38:0 ] 4K,1 2048K,3; 35 T800c-20 1.8M 0 [ 31:3 38:2 37:1 39:0 ] 4K,1 1024K,3; 36 T805d-20 1.7M 0 [ 32:3 33:2 38:1 40:0 ] 4K,1 1024K,3; 37 T800c-20 1.6M 0 [ 33:3 35:2 40:1 41:0 ] 4K,1 1024K,3; 38 T800d-20 1.6M 0 [ 34:3 36:2 35:1 42:0 ] 4K,1 1024K,3; 39 T800d-20 1.7M 0 [ 35:3 42:2 41:1 43:0 ] 4K,1 1024K,3; 40 T800d-20 1.8M 0 [ 36:3 37:2 42:1 44:0 ] 4K,1 1024K,3; 41 T800d-20 1.7M 0 [ 37:3 39:2 44:1 45:0 ] 4K,1 1024K,3; 42 T800d-20 1.8M 0 [ 38:3 40:2 39:1 46:0 ] 4K,1 1024K,3; 43 T800d-20 1.8M 0 [ 39:3 46:2 45:1 47:0 ] 4K,1 1024K,3; 44 T800d-20 1.8M 0 [ 40:3 41:2 46:1 48:0 ] 4K,1 1024K,3; 45 T800d-20 1.8M 0 [ 41:3 43:2 48:1 49:0 ] 4K,1 1024K,3; 46 T800d-20 1.7M 0 [ 42:3 44:2 43:1 50:0 ] 4K,1 1024K,3; 47 T800d-20 1.8M 0 [ 43:3 ... 49:1 51:0 ] 4K,1 1024K,3; 48 T800d-20 1.8M 0 [ 44:3 45:2 50:1 52:0 ] 4K,1 1024K,3; 49 T800d-20 1.6M 0 [ 45:3 47:2 52:1 53:0 ] 4K,1 1024K,3; 50 T800d-20 1.8M 0 [ 46:3 48:2 ... 54:0 ] 4K,1 1024K,3; 51 T800d-20 1.8M 0 [ 47:3 54:2 53:1 55:0 ] 4K,1 1024K,3; 52 T800d-20 1.8M 0 [ 48:3 49:2 54:1 56:0 ] 4K,1 1024K,3; 53 T800d-20 1.8M 0 [ 49:3 51:2 56:1 57:0 ] 4K,1 1024K,3; 54 T800d-20 1.6M 0 [ 50:3 52:2 51:1 58:0 ] 4K,1 1024K,3; 55 T800d-20 1.8M 0 [ 51:3 58:2 57:1 59:0 ] 4K,1 1024K,3; 56 T800d-20 1.7M 0 [ 52:3 53:2 58:1 60:0 ] 4K,1 1024K,3; 57 T800d-20 1.8M 0 [ 53:3 55:2 60:1 61:0 ] 4K,1 1024K,3; 58 T800d-20 1.8M 0 [ 54:3 56:2 55:1 62:0 ] 4K,1 1024K,3; 59 T800d-20 1.8M 0 [ 55:3 ... 61:1 ... ] 4K,1 1024K,3; 60 T800d-20 1.7M 0 [ 56:3 57:2 62:1 63:0 ] 4K,1 1024K,3; 61 T800d-20 1.6M 0 [ 57:3 59:2 63:1 ... ] 4K,1 1024K,3; 62 T800d-20 1.8M 0 [ 58:3 60:2 ... 64:0 ] 4K,1 1024K,3; 63 T800d-20 1.8M 0 [ 60:3 61:2 64:1 ... ] 4K,1 1024K,3; 64 T800d-20 1.7M 0 [ 62:3 63:2 ... ... ] 4K,1 1024K,3;

Here are some more figures:

  • 32 x T800@25Mhz/2MB  (my very own AM-B404 TRAMs)
  • 32 x T800@20MHz/1MB  (mainly TRAMs from MSC and ARADEX)
  • -> 96MB of total RAM
  • -> 70-130 MFLOPS (single precision)
  • ~800MIPS combined integer power
  • ~60Amps @5V needed (That’s 300W  😯 )

So we’re talking about 70-130 MFLOPS here – depending which documentation you trust and what language (OCCAM vs. Fortran) and/or OS you’re using. That was quite a powerhouse back in 1990 (Cray XM-P class!)… and dwarfed by a simple Pentium III some years later 😉
Just for to give you an comparison with recent hardware:

Raspberry Pi Model B+ (700 MHz) ~40 Mflops
Raspberry Pi 2 Model B (1000 MHz – one core) ~92 Mflops

Short break for contemplation about getting old…

Ok, let’s go on… you want to see it. Here it is – the front, one card/cluster pulled, 3 still in. On the left the mighty ol’ 60A power supply:

Cube_Front_4x

Well, this is the evaluation version in a standard case, i.e. this is meant for testing and improving. I’m planning for a somehow cooler and more stylish case for the final version (read: Blinkenlights etc.).

And here’s the IMHO more interesting view… the backside. It shows the typical INMOS cabling.

CubeBack

As usual, I color coded some of the cables.
The green arrow points to the uplink to the host system to which The Cube is connected to. Red are the daisy-chained Analyse/Reset/Error (ARE) signals. The yellow so-called jumper-cables connect some of the IMSB004 links back into the boards network. And in the upper row (blue) four ‘edge-links’ of each board are connected to its neighbor.

This setup connects four 4×4 matrices (using my C004 dummies as discussed here)  into a big 4×16 matrix. Finally I will ‘wrap’ that matrix into a torus. Yeah, there might be more clever topologies, but for now I’m fine with this.

Building up power

For completeness, here’s a quick look at how things came together.

The 4 carriers/clusters with lots of size-1 TRAMs… upper right one is the C004-dummy test board (now also fully populated). Upper left is pure AM-B404 love <3

ITEM_futter

Fixing/replacing the broken power-supply (in the back), including the somewhat difficult search for a working cooling solution:

ITEM_repairing

The Cube software

Well there isn’t any specific software needed to run The Cube, but it definitely cries out loud for some heavily multi-threaded stuff.

So the first thing has definitely to be a Mandelbrot zoom. As usual, I used my very own version with a high-precision timer, available in my Transputer Toolkit.

Here’s the quick run in real-time – you can still figure out visually each Transputer delivering its result:

Other Transputer and x86 results of this benchmark can be seen in this post over here.

We need (even) more power, Igor!

So this is running fine – using internal RAM only. On the other hand, it seems that the current power supply has some issues with, well, the electric current.
When booting Helios onto all 64/65 Transputers which uses all of the external RAM, very soon some of them do crash or go into a constant reboot-loop.
By just reducing the network definition (i.e. not pulling any Transputers) to 48, Helios boots and runs rock-solid.
Because measuring the voltage during a 64-T boot shows a solid 5.08V on all TRAM-slots it most likely means the power supply either can’t deliver the needed amount of Amps (~60) or produces noise etc.  😥
So this is the next construction site I have to tackle.

To be continued…

Lies, damn lies and benchmarks

As soon you’re talking about Transputers with people which weren’t there back in 1985 you’ll be asked this very soon: “How fast are these Transputer thingies”? Then there’s a stakkato of “MIPS? Whetstones? Dhrystones?” etc…

As always with benchmarks, the only valid answer is “it depends”. Concerning Transputers that’s even more true.
First, I suggest you read this Lies, Damn lies and benchmarks document from INMOS itself. It pretty much describes the dilemma and all the smoke and mirrors around that matter.

Benchmarks? It depends.

So you’ve read the above INMOS document? As you might saw, it’s full of OCCAM code. That’s the #1 prerequisite to get fast, competitive code (as long you’re not into Transputer assembler). From there it gets worse if you use a C compiler or even FORTRAN…

My little benchmark

Because it scales so well, works with integer as well as floating point CPUs and also runs on the x86 host while using at least the same graphic output routines, my personal benchmark is CSAs Mandelbrot tool (DOS only).
My slightly modified version is part of my Transputer Toolkit, which is downloadable here. You will need that version because I extended the code of this Mandelzoom with a high precision timer (TCHRT, shareware, can’t remove the splashscreen, sorry) when run with the “-a” parameter. You’ll need my provided default “MAN.DAT” file, which contains 2 coordinates to calculate (1st & 2nd run) to get comparable numbers.

So to bench your Transputer system start it with:

 man -v -a

which runs it in VGA mode (640x480x16c), loads the coordinates from “MAN.DAT” and when done presents you with a summary screen like this:

csa_mandel_timer

To run it on your hosts x86 CPU, call it with “man -t -v -a”

The Results

Here are my results of the different Mandelzoon runs I made in the past. The blue background marks the host machine results, yellow are the integer timings and green is where the mucho macho things are happening.. well, sort of 😉
There are two columns for the results, the HD timer and the hand-timed runtimes. This is because these are from days before I enhanced the Mandelzoom.
This table will continously updated of course. e.g. the last row is pretty new – what might that system be?  😯

The sources are available in my github repository – so we can collaborate on enhancing and optimizing it.

HD in-programm Timer (s) Hand-Timed
System 1st 2nd 1st run 2nd run Comment
i386DX/33 (0kb L2) 1800 0 1:30:00
(canceled)
0 Canceled 1st run after a quarter of Mandelbrot was done…
i386DX/33 (0kb L2) + 387 588 3316 0:09:48 0:55:16
Am386/40 (0kb L2) + 387 490 2980 0:08:10 0:49:40  21% faster clock but only 10.5% better result
i386DX/33 (128k L2) + 387 274 1547 0:04:34 0:25:47
Am386DX/40 (128k L2) + 387 228 1292 0:03:48 0:21:32
i486DX/33 (8k L1, 0k L2) 01:06.24 368.56 Pretty close to a single T800-20
i486DX2/66 (8k L1, 128k L2) 00:33.72 185.51 Very close to 2x T800-20
Pentium 133 (256kb L2) 00:09.09 00:55.01 About 8x T800-20
Pentium 200 MMX 00:07.13 00:38.06 About 9x T800-20
AMD K6-3+/266 00:06.00 00:32.00 Downclocked, 64k L1, 256kb L2, 1M L3
Core i3-2120 3.3GHz 00:01.66 00:02.13 VirtualBox,1 CPU
1x T425-20 0:00:25 0:02:28   There’s something wrong here – needs re-run
2x T425-20 00:51.55 04:56.60
3x T425-20 00:34.42 03:17.81
4x T425-20 00:25.86 02:28.56
5x T425-20 00:20.74 01:58.96
6x T425-20 00:17.37 01:39.19
9x T425-20 11 62 0:00:11 0:01:02
13x T425-20 8 42 0:00:08 0:00:42
21x T425-20 5 27 0:00:05 0:00:27
25x T425-20 4 23 0:00:04 0:00:23
65xT425 (48x25Mhz, 16x20MHz) 00:02.323 00:08.163 Actually it was 64xT800 and one T425 forcing the calculation to integer
1x T800-20 01:09.13 06:27.18
1x T800-25 55 309 0:00:55 0:05:09 25% higher clockrate should result in 17.5% speedup. Incl comm-overhead that pretty much fits
2x T800-20 00:35.65 03:13.79
3x T800-20 00:23.16 02:09.32
4x T800-20 00:17.43 01:37.04
5x T800-20 00:14.04 01:17.74
6x T800-20 00:11.82 01:04.83
5x T800-25 11 62 0:00:11 0:01:02
9x T800-20 8 40 0:00:08 0:00:40
13x T800-20 5 30 0:00:05 0:00:30
17x T800-25 00:03.8 00:18.59  “1st run” shows that the slow ISA interface is really  getting a bottleneck
21x T800-20 4 18 0:00:04 0:00:18
33x T800-20 00:02.88 00:11.97
65x T800 (32×25, 33x20Mhz) 00:02.21 00:05.74

Other Transputer interfaces

This is the collection of other Transputer interface cards (i.e. not from the usual suspects) I came by here and there. And no, I don’t actually own them all.
Some are rare, some are odd, may are cool and most are all three of them 😉

This post will be irregularly updated as new finds come along…

Pixar Transputer Card

Yes, that’s true. Pixar built its own Transputer interface back in the days, simply called “Render Accelerator”. Nomen es omen 😉
In constant search of raw computing power they tried out everything, eg. Intels i860 and others. It was just another logical step to try speeding-up Pixars Renderman package by using a Transputer farm. As an early adopter, they had to build their own stuff to suit their demands. Here’s a very interesting read about the long way Renderman came from and which hardware they’re were using. This article about texturing techniques is mentioning this card on pp.16.
The mentioned Jeffrey Mock was also quite busy during his Pixar days in helping Logical Systems (LSC) by writing a concurrency library for their C-Compiler.

Here’s the front of the full-size 8bit ISA card – filled up to the brim.
It features two T800-20 Transputers, each having its own 4MB of RAM. Two links are facing the outside-world – I presume it’s been meant like a up- and down-link to the next card.
The rest is standard early-days ISA bus, C012, Transputer DRAM design.

pixar_transputer_card1

No much to say about the back side. The silkscreen is a bit uncommon, but it’s nice when it comes down to repairing… and some “burning” shows, that seemed to be the case with this specific card.

pixar_transputer_card2

The Quintek T9

This is a very manly approach into transputing matters… don’t give me that step-by-step upgrading of an Transputer array. Just do it once and do it right 😉

Quintek_9T

Yes, that’s 9 Transputers, each with 1 or 4MB RAM and a C004 all directly soldered onto that 8-bit ISA card. It sure was expensive…
While there’s a C004 (the 6th golden IC from left), there’s no T2xx to control it. So I assume that one of the T800 is in charge to configure it… or this is done through the ISA link-interface as there are two C012 on the card,

 

The YARC ProTran

This is a fullsize 16-bit ISA card from YARC. It features 4 Transputers with the option of supporting them with a (for the time) big amount of RAM.

YARC-ProTran

The ProTran board can be equipped with 1 to 4 Transputers. The root Transputer (the rightmost) can have acces to 1-16 megabytes of RAM – it’s 8 in this picture. Each of the other three can be configured with 1 to 8 MB. That was very serious stuff back then.
What’s most interesting about this card is that YARC  didn’t gave much about standards and went a proprietary route in many places:

  • A proprietary bus interface is said to provide a peak I/O speed exceeding 1 MB/sec
  • All links, event inputs and subsystem control signals are fed to a proprietary pin-field array.
  • The memory design uses a multibank and interleaving technologies to achieve zero wait state performance

All this explains the excessive use of GALs in the lower half of the card. Beside the proprietary approach the ProTran offered a compatibility mode (read: B004 interface) to use standard INMOS tools.

CSA

CSA (Computer Systems Architects) was big in the educational market and produced smaller, better integrated B004 compatible Transputer cards. The higher integration of DRAM parts allowed half-length ISA cards meant for evaluation or as a starting point for building bigger systems later.
(The following pictures are courtesy “the PCPUTER” page. Permission to repost them were kindly granted)

Meet the “Transputer Kit PC Card” –  They came bundled with a slightly restricted version of the Transputer Toolset, together with a great manual which described lots of different programming and hardware interfacing lab-type experiments.  These cards used standard multi-pin circular DIN connectors/cables to route the link and reset signals, and provided the first hands-on introduction to distributed parallel processing for many people.  They included a “budget” T400A 32 bit non-floating point processor, and off-processor memory was an option

CSA Kit Board

This is th PART.1 four processor board – mainly a carrier for CSAs proprietary Transputer Modules (like INMOS’ TRAMs), and the single processor PART.2 board featuring a PC interface.

CSA PART1 and PART2 System

The “Gerlach card”

The card from the book “Das Transputerbuch” from A. Gerlach is a typical example. It has its own post on this page.<
Very simple design but only 2 layers and completely documented in the book.

Gerlach

 

 

Hema TA2

The Hema TA2 is some very special specimen of ISA interface cards. IMHO it’s the last and most sophisticated interface you can run in an ISA bus. These are the feature highlights:

  • 16 bit ISA interface
  • half-size card
  • 4 TRAM sockets
  • TTL and RS422 link connectors (if RS422 drivers are fitted, TTL is not usable)
  • B004 compatible ‘Fast Mode’ as well as 100% vanilla ‘Slow Mode’

The TA2 implements an Idea which can be found in some documents from those days, about getting the maximum speed from the sluggish ISA bus and a link-interface chip like the IMS C011/012:
Overlapping acknowledge by using FIFO buffers and a controlling FPGA.

This is how the TA2 looks like:

HemaTA2

I’ve marked the the important parts with colors/arrows:

  • red arrow – the IMS C012
  • orange arrow – the IMS C011 connected to
  • blue – two 1KB FIFOs controlled by
  • yellow – a MACH 110 CPLD and
  • green arrow – a PAL
  • purple – a XILINX 3030 FPGA doing the control logic
  • cyan & magenta – TTL and RS422 link connectors

And here’s the block-schematic using the same colors:

TA2_Block

The schematic also mentioned the two other cool features of the Hema TA2:
Four TRAM slots and the “hema LINK-Bus”, a proprietary two row DIN 41612 connector which provides all links/subsystem which were used otherwise by the 4 TRAMs.
Finally there is a 4-bit microswitch (upper right corner) to set a unique ID for the card so you can identify up to 16 cards in a single system.

Software

Using the provided control program “CTA2” everything can be set by software, e.g.

  • Base addresses for the fast- and slow link (0x150/0x158 by default)
  • Swapping fast/slow link configuration
  • Linkspeed for every link (fast/slow/TRAM/hema-bus)
  • Up- /Down-subsystem control
  • Interrupts per link
  • Waitstates

All the hardware wise ‘jumping through hoops’ still doesn’t do the job alone. To reach the ultimate ISA speed (the docs are talking about up to 1mbps) the communication needs to be tuned, too.
Lets talk a bit x86 assembler here (ahhhh), and DOS-only for sure:
It’s not enough to use simple in and out port instructions and constantly poll the C011/12 status register – that’s way too slow. You’ll need to go for the string variant(s) ins[b|w|d] combined with the rep instruction. Here’s an example for a C insb wapper function:

void insb(UINT16 port, void *buf, int count)
{
   _ES = FP_SEG(buf);   /* Buffer Segment */
   _DI = FP_OFF(buf);   /* Buffer Offset  */
   _CX = count;         /* Bytes to read  */
   _DX = port;          /* from Port xy   */
   asm   REP INSB;
}

Same goes for outs[b|w|d] respectively.  But there’s another extra to care for: The TA2 provides special  registers to give you deeper insight into its status, e.g. FiFo fill-rate (empty, half-full, full), FiFo interrupt settings.
So in effect, you couple the fast ins/outs instructions with interrupts attached to e.g. input half-full and output full.

That said, there are some caveats. ins[b|w] and outs[b|w] are supported from the i8018x and V20 on.  insd and outsd needs a 386.
And then there are possible speed penalties with 32nit processors (i.e. 386 and up) as they optimized the port instructions for virtualization (Virtual 8088 mode, not todays VM!) resulting in 100+ cycles per call.

So when everything is 100% optimal, hema says in its documents these are the possible transfer speeds to reach:

Function/Array size 1K 10K 100K 1MB
Read FiFo 150kB/s 390kB/s 570kB/s 615kB/s
Read Polled 160kB/s 160kB/s 160kB/s 160kB/s
Read Direct 600kB/s 610kB/s 610kB/s 610kB/s
Write FiFo 565kB/s 600kB/s 610kB/s 610kB/s
Write Polled 160kB/s 160kB/s 160kB/s 160kB/s
Write Direct 610kB/s 610kB/s 610kB/s 610kB/s

FiFo – Using interrupts and syncing status of fill level.
Polled: Each byte is synchronized with the C012 status
Direct: Like FiFo but no syncing.

Well, this has to be proved yet. Seem I need to write a benchmark… someday 😉

Inmos B004

I’d call the Inmos B004 the “mother of all Interface cards”, simply because it was the first ISA card sold by INMOS. And it wasn’t just the card but it also defined the (PC) standard of the software interface, mostly called the “B004-interface”. What a surprise 😉

So being the first card, it is quite big (full ISA length) while not offering really impressing specs: 8bit XT Bus, just one Transputer –TRAMs weren’t invented yet- and a max. of 2MB RAM (DILs). To do it justice, the manual rightfully calls it “Evaluation Board” and for that purpose it’s totally fine – remember that 2MB were quite an amount of RAM back in 1984.
To create a multi-Transputer network you had to either plug-in multiple B004s or connect an external network to the onboard connectors (the blue ones in the picture below).

As mentioned, the B004 software interface is what makes this card a keystone in the Transputer universe. All communication to the host (i.e. the XT/AT compatible PC) is done through a port range normally beginning at 0x150 (base, can be moved by some cards).
With certain offsets the host software can communicate with the Transputer, or the C011 to be precise:

Base Address Register Comment
+0x00 C011/12 input data  read
+0x01 C011/12 Output data write
+0x02 C011/12 input status register read = returns input status
write = set input interrupt on/off
+0x03 C011/12 Output status register read = returns output status
write = set output interrupt on/off
+0x10 Reset/Error register write: Reset Transputer & C011/12 and possibly subsystem (check manual)
read: Get Error status
+0x11 Analyse register  (un)set analyse

This mapping was used by more or less all ISA interface cards and extended by other more sophisticated interface cards later.

Clones

Needless to say, that very soon there were a couple of “inspired” models from other manufacturers. AFAIK all of them support Transputers up to 30MHz, which the B004 didn’t… so they’re actually better.

This example is from Microway (yes, those guys who later build the i860 Number Smasher), named Monoputer and dated 1987. Up to 2MB could be used on it. Mind the connectors being accessible from the outside:

Microway-B004

Later they produced the “Monoputer 2” which was more modern and used SIMM RAM modules instead of DIL parts. The Transputer and Linkinterface moved into the middle of the card and the link connectors were moved inside the pc case again – the connectors are the very same used on the NumberSmasher860:

monoputer2

And here’s the one Transtech made, calling it TMB04 mind the SIMM banks which enable the card to give home up to 16MB RAM (at 3 cycle speed!):

Transtech TMB04