Like with the C64, this is probably the reason you came here:
The friggin’ fastest Mandelbrot displayed on a IIgs, ever! 😉
Yeah, you’re right, it’s not the fastest Mandelbrot calculated by an IIgs (its very own 65c816 CPU, that is)… but hey, it’s still kinda cool – and sooooo much faster!
Okey-Dokey, here we go, a complete Mandelbrot in 60s. This time in color… and zooming in! I couldn’t do that on the C64 as the C-Compiler didn’t natively support IEEE754 doubles (like Orca-C does) and having a mouse also helps a bit, too:
Wow, that was nice, wasn’t it?! (Sorry for the shaking, need to get a tripod soon)
Especially when you take in concern how ‘far’ the native 65c816 code got during the video on a ‘sped-up’ 10MHz TransWarp GS.
Like with the T2C64 version there are surely several things which could be improved, but the IIgs (even at native speed) is well capable to handle the little bit of extra work. The limiting factor is the bus-speed, i.e. how quick the Transputer can push his data into the host (IIgs). You can clearly see that by the time it took the display the 3 zooms: They all took about 60 seconds, even each zoom means more calculations as the iteration is doubled each zoom, in this case 32, 64, 128.
The Orca-C source/binary of this demo -and the previous AppleSoft sample- is available here (zip’ed PRODOS disk-image). It won’t make much sense without a T2A2 and is GS/OS-only as it uses QuickDraw II and the EventManager (for mouse & keyboard).
Final words: Don’t get too excited about the acceleration of the IIgs… it’s not accelerated at all. It’s more like a co-processor attached to it. And even then, you’ll need something really calculation-intensive to justify the time you’ll loose due to communication between the Apple and the Transputer. A single square-root for example wouldn’t make much sense.
But OTOH, that’s exactly things are handled with the Innovative Systems FPE (using a M68881). So it might be worth evaluating. Maybe I’ll write a SANE driver if I have the time to get a deeper understanding of GS/OS.
As my two targets (C64 & Apple II) are working now, I’m thinking about creating a ‘real PCB’ in the medium term. Given the rarity of the Link-Adaptor (Inmos C012) I’m currently looking into the possibility to use a larger CPLD to move the C012 into that. This would actually make this ‘project’ a product to buy. But don’t hold your breath, need to get an eval kit first. Then some 100 days of fiddling, cursing and crying… and then more.
And now something which you knew it would come: Mandelbrot time! 😉
I don’t want to bore you with all the details before you had it seen in action… so here we go:
“Man, that was brilliant! And even you had a lot of geek-babble in there, I want to know more!”
Ok Timmy, let’s go into detail…
Like I said in the video, the Transputer is finally doing something for real, he’s actually doing the most of the work, crunching through a 320x200x8 Mandelbrot, 32 iterations in double-precision floating point. The code itself was written 1988 in OCCAM by Neil Franklin and is available on his page.
This shows the general beauty of Transputers: If the code is written flexible enough to fit into any topology, it’ll run on any platform!
That said, I ran into a certain limitation on the C64. The Transputer mandelbrot executable expects the initial data (resolution, coordinates, iterations) to be send in a specific order and format. While the order isn’t the problem, the format is: The coordinates have to be doubles (C-lingo i.e. 64bit IEEE 754 compliant float). The C-Complier I’ve used for this demo (CC65) doesn’t now a flying s*** about floats or even doubles.
So to get that demo done ASAP I tricked myself a bit and used the same technique we’ve seen in my 1st demo when POKEing something the Inmos-way:
To get the coordinates for left (-2.0), right (1.0), top (1.125) and bottom (-1.125) over to the Transputer they had to be converted into 64bit IEEE 754 format, ordered into little-endianess and finally put into an array like these:
Yes, that’s a bit awkward, but it was OK to get a fast start. The ‘problem’ with this quick-hack is that the demo is pretty static, i.e. no zooming into the Mandelbrot. If you know of a quick way to create IEEE 754 compliant doubles from a long (which is the biggest floating point variable CC65 can handle, so StringToDouble() isn’t an option here) I’m happy to hear from you.
Of course I could have the Transputer do the typecasting but in this case, as part of a demo, I wanted to keep the original binary untouched.
In the video you saw (or didn’t because of the blurry picture) that the timer printout was about 70s… and as said, normally it takes about 60s to complete the fractal – and it did in the video, too! Watch the video-timer or check with a stopwatch. What I’ve forgot to take out from the timing was the actual upload of the code into the Transputer.
The Transputer binary is in this case a bigger array in the C-source, so it’s not being loaded from floppy but directly pushed to the Transputer after it was initialised. This takes some extra time which also went into the stopwatch timing… I’ll correct that in a later version.
“Later version” is a good catchword. If this wouldn’t be just a demo for now, there are obviously plenty of ways to optimise things:
First of all one should take off the burden of converting the colors from the C64 and let the Transputer do that.
My second idea would be to reduce the communication overhead (polling) by having the Transputer to render the whole screen into his own RAM and when done have it ‘pumped’ down to the C64
Yes, DMA would be cool but that’s not possible (yet)
Ok, that’s about it for now. The T2C64 is still in its prototype stage and I can image many more cool things to add… but first I will have a ‘proper’ circuit board being made.
Final words: Don’t get too excited about the acceleration of the C64… it’s not accelerated at all. It’s more like a co-processor attached to it. And even then, you’ll need something really calculation-intensive to justify the time you’ll loose due to communication between the C64 and the Transputer. A single square-root for example wouldn’t make sense at all. 100 sqrts in one go would certainly do.
Of course adding another linkOut/In to the T2C64 to get more Transputers involved into the calculation would be the final step. This is planned for the next version of the hardware but the bigger part of the work would be a complete rewrite of the Mandelbrot code to have it broken down to parts being run in parallel on each Transputer… which closes the loop to today where programmers are trying to wrap their brains around multithreaded programming. 22 years after the first Transputer was released 😉
It was inevitable… the biggest system AVM built was the “T1”, a 30 channel ISDN controller in a sleek 1U 19 inch case of which nothing more than the above marketing picture seems to exist.
One fine day I had to had one – and today is the day!
I was able to find a AVM T1 on ePay which was not very well advertised so I had no “professional competition”. Even I didn’t spent a fortune it was a bit of gambling because I didn’t knew what to expect.
Besides AVMs own T1 PDF manual there’s next to nothing available in the Web – So this section is yet another WWW-exclusive brought to you by geekdot.com 😉 (Ok since 2009 others discovered this page and also this cheap entry into the wonderful world of multi Transputing)
Still, the docs said “a Transputer network with 9MB RAM” so I couldn’t go completely wrong. That said, I was expecting SMD T400s at AVMs usual sluggish speed…
When the box arrived first thing was getting out good ol’ screwdriver and open the case…
…and I was very surprised:
A socketed T425 – so that’s another easy upgrade then.
An external power supply (48V)! That’s strange but also neat – no noise and next to no heat in the case itself
Also, the board is very small… lot’s of room left in the case.
That’s done by intention as you could buy the T1-B, where “B” stands for the “Booster Board”, yet another board with 4 more Transputers and another 8MB of RAM giving a total of 7 Transputers and 17 Megs of memory. Quite a setup for just an ISDN controller.
Ok, this beast has to do something better than handling 30 boring B-Channels… Mandelbrot for example 😉 So let’s see how this thing is/was supposed to speak to the outside world.
The manual is talking about an ISA or PCI controller-card which will be connected to a 9-pin Sub-D connector. Having a closer look to the mainboard where that connector is seated I discovered some other old friends: AM26C31 and AM26C32.
Aaaaalrighty, RS422 time… that’s the same way my Tower of Power is transmitting its data. So I can use my TTL-to-RS422-converter I’ve built for the Gerlach card.
Out goes the multimeter and after a while I figured out the the traces on the board. For a better understanding, here’s the “map”:
Marked by the red arrows are the three Transputers: T1, a T425-25, is the “application processor” while T2 and T3 are more simple T400-20 handling the ISDN subsystem.
The yellow arrows mark the four links of the T425 – which is probably the reason why AVM used a 425 vs. their usual T400: this time they really needed 4 links. Link0 is connected to the 9-pin sub-D connector (via the RS-422 transmitters/receiver) for interfacing to the PC. Link1 and Link2 are directly connected to the T400s. Link3 goes to the connector on the lower edge of the board. I bet this is where the “booster board” would be connected… not a hard bet, I admit.
The pinout for the 9-pin sub-D connector (female) is:
9 MB is true. The “application processor” (T1) got 4MB while the two T400s got 1 (T2) and 4 MB (T3, obviously connected to the SIEMENS Munich32 Über-ISDN controller).
While the built-in T425 is spec’ed for 25Mhz it’s just running at 20MHz… what a waste of bang… and what an opportunity for improvement :->
The linkspeed is at maximum… which one would expect with directly connected links. But with AVM you’ll never know 😉
The RAM-speed is pretty good (compared to what they did to the B1) – even they just used 70ns RAM.
Next up: Having fun with Mandelbrot! Having just T4xx Transputers it can only use the integer algorithms (i.e. no floating point) but who cares for a quick start?!
It’s working and showed another nice gadget: LEDs! Each Transputer has a tiny SMD-LED connected to it’s Link-Out.
So having the T1 underneath the table I have quite a nice light-show while the three are working their a** off 😉
Here’s a video of Transputer-driven-Blinkenlights:
If you happen to have no access to a RS422 converter: Never say die!
Like said above, there’s still Link3 available – normally meant for the booster-board – and it’s pure TTL. All you need is a somewhat non-standard plug to this connector. Be creative but don’t forget that unbuffered link connections only allow a distance of a couple of inches/centimeters!
The pin-out (so far) is, counting from left to right:
1 - 5V VCC
2 - T1 Link3 OUT
3 - T1 Link3 IN
4 - RESET
5 - T3 Link1 OUT
6 - T3 Link1 IN
7 - GND
[UPDATE 11/14/10] Again, with some ePay-Luck I got another T1… and it again was some kind of lottery… and I had luck! This time it’s a T1-B!! This means, the “booster board” is installed. So opening the case, it looks like this. On the right the normal T1-board, to the left, “da mighty booster board” 😉 I’ll call it “BB” from here…
As expected, it’s connected via Link-3 of the T1 Board. On the lower edge of the picture you can spot the power-supply “module”. It’s longer than in the T1 configuration and provides 3.3V/GND to the BB, i.e. the BB is 3.3v only!!
Here’s the BB alone:
All in all the BB is more modern than the T1-board. Very suspicious are the JTAG connector on the lower left having its lines connected to a EEPROM (AT28V256, right edge of the BB board, above the row of RAMs). Further up, left to the CPU nearby is a pad with the lable “Boot from ROM/Link”. I wonder what the default is and what’s inside that EEPROM – will investigate later.
Most importantly the BB board features 4 ST20450 processors, which aren’t INMOS products anymore. They were designed by ST after they bought INMOS. For short, the ST20450 is a T425 on steroids. More on-chip RAM (16K), higher clocking (40MHz) and some more instructions.
Each ST20450 has its own 2MB of RAM and a GAL handling the memory etc.. Here’s a close-up of a single ST20450 “module”:
Mind the careful markings/labels on the board. The CPUs are numbered (“Processor 3”) and there are pads for Links etc.
Finally, I currently have no tools to check/use ST20450 processors. ispy finds the Transputers on the T1 board but freaks-out when it pings the ST20s.
Here’s another new addition: A picture of the official T1-PCI interface. It contains a PCI-controller (the big IC) and a XILINX FPGA… probably containing a synthesized C011.