Category Archives: Knowledge Base

Deciphering the INMOS IMSB430

The IMSB430 is a rare, yet interesting and important TRAM. It’s been meant for hardware developers to easily build and test prototypes before producing actual PCBs.

Interesting enough, next to no documentation besides a sales brochure has survived. So it’s time for reverse engineering… again. This is also an official call for help – If you by any chance know more about this TRAM, please contact me!

Here’s the left side of this size-4 TRAM. The other half is just the prototyping grid with lots of through holes which we can omit here.
I tried to number all jumpers on the board starting at the top – numbers in brackets are the jumper-numbers actually printed on the boards silk-screen. That numbering is a bit confusing and seems not to follow any logic.

IMSB430 Map

IC connections

Buzzing through all lines from/to the GAL leads to this table so far:

1 ProcClkOut 20 VCC
2 A13 19 Mem0 (IC2B „/CE“ + JP26)
3 A14 18 Mem1 (IC3B „/CE“ + „J20“)
4 A15 17 1Wait (IO 2-above JP8)
5 notMemCE 16 2Wait (IO above JP8)
6 WaitSEL0 (JP8) 15 3Wait (IO below JP8)
7 WaitSEL1 (JP9) 14 SelWait (MemWait+ IO below JP9)
8 Map0 (JP10) 13 BLK0 (IO below JP10)
9 Map1 (JP11) 12 BLK1 („IO1“ below JP11)
10 GND 11 (I/OE) GNDed

Each SRAM socket is actually a “double row”.
If you seat your SRAMs aligned to the right, they will be accessed word-wide (16bit, D0-A15). When aligned to the left, they are accessed byte-wide (8bit, D0-A7). That’s why the silkscreen print says “IC2B/IC2W“…

Jumpers

This is the official press photo. It seems to show the default jumper settings (using the full 64KB SRAM, word-access):

IMSB430_official

Some jumpers are already identified (“JP” precedes the official jumpers, “J” is my numbering):

Jumper description
 1  MemWait (FIT=GalPin14, else ????)
 2  DisIntRam (FIT=use internal RAM)
 3  ProcSpeedSel0 (FIT=HoldToGnd T222)
 4  MemReq (FIT=no request)
 5  EventReq (FIT=no request)
 6  MemBacc (FIT=word access)
 7 (JP6)  ProcSpeedSel2 (FIT=HoldToGnd T222)
 8  BootFromRom (FIT=BootFromLink)
 9 (JP7)  ProcSpeedSel1 (FIT=HoldToGnd T222)
JP8 Set MemWaitstate Bit0
JP9 Set MemWaitstate Bit1
JP26 Connects /OE with /CE of IC2 (upper SRAM)
J20 Connects /OE with /CE of IC3 (lower SRAM)

The external RAM access wait-states can be set with JP8/9:

JP8 + JP9 = 2 clock-cycles per MemAccess
JP9 = 3 clock-cycles
JP8 = 4 clock-cycles
none = 5 clock-cycles

To be continued…

Lies, damn lies and benchmarks

As soon you’re talking about Transputers with people which weren’t there back in 1985 you’ll be asked this very soon: “How fast are these Transputer thingies”? Then there’s a stakkato of “MIPS? Whetstones? Dhrystones?” etc…

As always with benchmarks, the only valid answer is “it depends”. Concerning Transputers that’s even more true.
First, I suggest you read this Lies, Damn lies and benchmarks document from INMOS itself. It pretty much describes the dilemma and all the smoke and mirrors around that matter.

Benchmarks? It depends.

So you’ve read the above INMOS document? As you might saw, it’s full of OCCAM code. That’s the #1 prerequisite to get fast, competitive code (as long you’re not into Transputer assembler). From there it gets worse if you use a C compiler or even FORTRAN…

My little benchmark

Because it scales so well, works with integer as well as floating point CPUs and also runs on the x86 host while using at least the same graphic output routines, my personal benchmark is CSAs Mandelbrot tool (DOS only).
My slightly modified version is part of my Transputer Toolkit, which is downloadable here. You will need that version because I extended the code of this Mandelzoom with a high precision timer (TCHRT, shareware, can’t remove the splashscreen, sorry) when run with the “-a” parameter. You’ll need my provided default “MAN.DAT” file, which contains 2 coordinates to calculate (1st & 2nd run) to get comparable numbers.

So to bench your Transputer system start it with:

 man -v -a

which runs it in VGA mode (640x480x16c), loads the coordinates from “MAN.DAT” and when done presents you with a summary screen like this:

csa_mandel_timer

To run it on your hosts x86 CPU, call it with “man -t -v -a”

The Results

Here are my results of the different Mandelzoon runs I made in the past. The blue background marks the host machine results, yellow are the integer timings and green is where the mucho macho things are happening.. well, sort of 😉
There are two columns for the results, the HD timer and the hand-timed runtimes. This is because these are from days before I enhanced the Mandelzoom.
This table will continously updated of course. e.g. the last row is pretty new – what might that system be?  😯

The sources are available in my github repository – so we can collaborate on enhancing and optimizing it.

HD in-programm Timer (s) Hand-Timed
System 1st 2nd 1st run 2nd run Comment
i386DX/33 (0kb L2) 1800 0 1:30:00
(canceled)
0 Canceled 1st run after a quarter of Mandelbrot was done…
i386DX/33 (0kb L2) + 387 588 3316 0:09:48 0:55:16
Am386/40 (0kb L2) + 387 490 2980 0:08:10 0:49:40  21% faster clock but only 10.5% better result
i386DX/33 (128k L2) + 387 274 1547 0:04:34 0:25:47
Am386DX/40 (128k L2) + 387 228 1292 0:03:48 0:21:32
i486DX/33 (8k L1, 0k L2) 01:06.24 368.56 Pretty close to a single T800-20
i486DX2/66 (8k L1, 128k L2) 00:33.72 185.51 Very close to 2x T800-20
Pentium 133 (256kb L2) 00:09.09 00:55.01 About 8x T800-20
Pentium 200 MMX 00:07.13 00:38.06 About 9x T800-20
AMD K6-3+/266 00:06.00 00:32.00 Downclocked, 64k L1, 256kb L2, 1M L3
Core i3-2120 3.3GHz 00:01.66 00:02.13 VirtualBox,1 CPU
1x T425-20 0:00:25 0:02:28   There’s something wrong here – needs re-run
2x T425-20 00:51.55 04:56.60
3x T425-20 00:34.42 03:17.81
4x T425-20 00:25.86 02:28.56
5x T425-20 00:20.74 01:58.96
6x T425-20 00:17.37 01:39.19
9x T425-20 11 62 0:00:11 0:01:02
13x T425-20 8 42 0:00:08 0:00:42
21x T425-20 5 27 0:00:05 0:00:27
25x T425-20 4 23 0:00:04 0:00:23
65xT425 (48x25Mhz, 16x20MHz) 00:02.323 00:08.163 Actually it was 64xT800 and one T425 forcing the calculation to integer
1x T800-20 01:09.13 06:27.18
1x T800-25 55 309 0:00:55 0:05:09 25% higher clockrate should result in 17.5% speedup. Incl comm-overhead that pretty much fits
2x T800-20 00:35.65 03:13.79
3x T800-20 00:23.16 02:09.32
4x T800-20 00:17.43 01:37.04
5x T800-20 00:14.04 01:17.74
6x T800-20 00:11.82 01:04.83
5x T800-25 11 62 0:00:11 0:01:02
9x T800-20 8 40 0:00:08 0:00:40
13x T800-20 5 30 0:00:05 0:00:30
17x T800-25 00:03.8 00:18.59  “1st run” shows that the slow ISA interface is really  getting a bottleneck
21x T800-20 4 18 0:00:04 0:00:18
33x T800-20 00:02.88 00:11.97
65x T800 (32×25, 33x20Mhz) 00:02.21 00:05.74

Weitek Abacus FPU

This is the first post on GeekDot about a single IC. While I did a lot of writing about ICs, mainly CPUs and FPUs, back in the BBS days (German) I stopped doing so after that and concentrated on collecting them during the following years – a true love never dies 😉
The Weitek Abacus FPUs were always special, exotic and unaffordable for the most of us. Today they additionally got a touch of a mythic being, especially as 386/486 boards featuring the special  WEITEK socket are dying out fast. Not mentioning the Abacus FPU itself.

Recently I had a mail conversation about the Abacuses and because they’re are memory mapped, I thought they might actually fit quite good into the range of my other accelerator post as the DSM860 or all those Transputer cards.

I wasn’t planning to go into all the details about the Abacus models 3167 and 4167 and went looking for a handy Wikipedia link.
To my surprise that article is pretty general and just briefly touches all product ever made by WEITEK.
So, the best source of technical information about these FPUs is the highly recommendable posting “copro16a.txt“, written by Norbert Juffa in 1994.
I’d call it the most comprehensive write-up about FPUs until that date. Because you’ll never know what happens to external links, here’s the (shortened) part about the Weitek chips:

The architecture of the Weitek chips differs significantly from the 80x87.
Strictly speaking, the Weitek Abacus 3167 and 4167 are not coprocessors in that they do not transparently extend the CPU architecture; rather, they could be described as highly-specialized, memory-mapped IO devices. But as the term "coprocessor" has been traditionally used for these chips, they will
be referred to as such here.

The Weitek coprocessors have a RISC-like architecture which has been tuned for maximum performance. Only a small instruction set has been implemented in the chip, but each instruction executes at a very high speed (usually only a few clock cycles each). [...] 
In contrast to the 80x87 family, the Weitek Abacus does not support a double extended format, has no built-in transcendental functions, and does not support denormals. The resources required to implement such features have instead been devoted to implement the basic arithmetic operations as fast as possible.

While the 80x87 coprocessors perform all internal calculations in double extended precision and therefore have about the same performance for single and double-precision calculations, the Weitek features explicit single and double-precision operations. For applications that require only single-precision operations, the Weitek can therefore provide very high performance, as single-precision operations are about twice as fast as their double-precision counterparts. Also, since the Weitek Abacus has more registers than the 80x87 coprocessors (31 versus 8), values can be kept in registers more often and have to be loaded from memory less frequently. This also leads to performance gains.
[...]

To the main CPU, the Weitek Abacus appears as a 64 KB block of memory starting at physical address 0C0000000h. Each address in this range corresponds to a coprocessor instruction. Accessing a specified memory location within this block with a MOV instruction causes the corresponding Weitek instruction to be executed. (The instructions have been cleverly assigned to memory locations in such a way that loads to consecutive coprocessor registers can make use of the 386/486 MOVS string instruction.)
This memory-mapped interface is much faster than the IO-oriented protocol that is used to couple the CPU to an 80287 or 80387 coprocessor. The Weitek's memory block can actually be assigned to any logical address using the MMU (memory management unit) in the 386/486's protected and virtual modes. This also means that the Weitek Abacus *cannot* be used in the real mode of those processors, since their physical starting address (0C0000000h) is not within the 1 MByte address range and the MMU is inoperable in real mode. However, DOS programs can make use of the Weitek by using a DOS extender or a memory manager (such as QEMM or EMM386) that runs in protected/virtual mode itself and can therefore map the Weitek's memory block to any desired location in the 1 MByte address range.

Typically the FS segment register is then set up to point to the Weitek's memory block. On the 80486, this technique has severe drawbacks, as using the FS: prefix takes an additional clock cycle, thereby nearly halving the performance of the 4167. Most DOS-based compilers exhibit this problem, so the only way around it is to code in assembly language.

Ok, so we have a good idea how the Abacuses work and how they can be used. … but what’s the story behind the company?

This is a ‘web found’ © by antiquetech.com
“Founded by Chi-Shin Wang, Edmund Sun, and Godfrey Fong (President and CEO) in 1981 in San Jose. All founders immigrated from China.
Weitek specialized in high-performance digital semiconductor components and systems for the computer and workstation industries. Weitek floating point units have been used with Inmos Transputers (Floating Point System T-series Hypercube, 1986), National Semiconductor NS32032’s (Encore Multimax, 1986), and Intel 386’s (1988). The 2048 was used in the Thinking Machines Corporations CM-2 Connection Machine. Weitek produced floating point processors for HP. HP allowed to Weitek to use it’s facilities to make chip for themselves and for their competition.”

When dramatically loosing ground to the 486DX2 Weitek moved away from the x86 platform and concentrated on SPARC CPU and FPUs as well as MIPS FPUs. That worked quite well for some time and in the 90s they finally moved into the frame buffer and graphics accelerator business. Their POWER P9000/P9100 models were quite successful but lost when players like S3 and ATI started to flex muscles.

Specs

So from our (PC-compatible) perspective there are two models of interest: The 3167 and the 4167. Basically they only differ in the bus protocol to either the Intel 386 or 486.
If you need more detailed data, I make the original specs available here for the 3167 and 4167 as PDF.
Both Abacuses clocked up to 33MHz. Especially the 4167-33MHz is very hard to find these days. That said, it might be possible to overclock a 25MHz version to 33MHz providing sufficient cooling. I found this press snippet where WEITEK proactively promoted overclocking a 33MHz to 50MHz by using a peltier element made by ICECAP – so when overclocking that by 51%, the 32% from 25 to 33MHz should be an issue:

WEITEK and ICECAP announce 33MHz version of Abacus 4167

Ram Ganapathi Mar 30 1992, 02:57 pm
WEITEK and ICECAP announce 33MHz version of Abacus 4167

Weitek Corporation and ICECAP Technologies announced today that users of 50MHz Intel 80486-based personal computers can realize a 50% performance increase for numeric-intensive applications by combining the 33 MHz version of Weitek’s Abacus 4167 math coprocessor with the ICECAP thermal management device. Superior system performance is achieved without degrading system reliability.

Using it

Applications that took advantage of the Weitek Abacus were scarce. AutoShade, Autodesk Renderman, 3-D Studio were the most prominent to use a Weitek coprocessor.
If you happen to own one, you might actually use or at least test it… so here’s the official test-suite from Weitek. If contains these tools

  • DOS TSR to update the BIOS for Abacus support (if missing)
  • A test tool wich checks for an Abacus presence using INT 11h BIOS calls.
  • A diagnose tool
  • 2 demos: A side-by-side Mandelbrot benchmark (yay!) and rendering a phong shaded beach ball
  • Abacus macros for using it natively in your cool assembly code

When I find the time to pull out my Weitek-PC, I’ll post pictures of the demos… feel free to comment if you feel I’ve been missing out something.

Vintage Fan replacement

Making some noise

Working with vintage computers has many aspects and one of them, nearly throughout every component, is noise.
Hard-drives whirr, floppy-disks rattle, the CRT emits a high-pitch whistle and on top of it all at least one fan is blowing like a jet-engine. Time for a fan replacement!

Luckily since these days, much quieter fans have been developed during the last 30 years – so let’s just swap the fan and… ahhh, silence. I did that with my Sun Blade 150 and it worked great!

…well, it’s actually not that simple all the time.

Blown by the wind

System cooling was handled a bit different back in the 80’s and 90’s. Practically there were just 3 levels of cooling:

  • None – convection had to do the job.
  • One for all – one fan cooled the whole system
  • Insane – Either huuuuuge Fans (>8″) or some mad-scientist liquid-cooling was used. I won’t touch these in this post…

The “no fan” class were all so-called home-computers and the lower-end models of the early 16-bit machines like the ATARI ST, Commodore Amiga 500 or the first Apple Macintoshes (Steve Jobs was fanatic about convection cooling).

The most prominent “single fan” family members were office PCs up to the 486-class as well as any desktop/deskside 68k Apple Macintosh. All these had one fan sitting in their power-supply, blowing the warm air out to the back of the case.
Please mind the warm air. We’re not talking hot streams of  death-rays here. While CPUs weren’t a big heat-source issue (until the advent of the i486/50) passive heatsinks were sufficient to cool them by the air-flow/draft through the case created by the PSU.
For such Personal Computer systems it can be perfectly fine to replace some old, noisy fans with recent high-tech whirls. Especially if the PSU was also replaced by something more modern (like I did with my Quadra 950) producing less heat than the original one.

This is a totally different story when it comes down to workstations.
An en-vogue (UNIX) workstation back in the days was mostly designed using a relatively small pizza-box sized case, nicely snuggling underneath a monstrous 21″ CRT. To name just a few there were

  • SUN SparcStation 1 to 20
  • SGI Indy
  • Digital VAX/DECstation
  • Many HP PA-RISC 7xx

These boxes were cramped (hard-drives, expansion cards, lots of RAM) and their high-end processors ran much hotter than those x86 and 68k in personal computers.
Surprisingly none of them had a dedicated CPU fan mounted – instead they all relied on the power of the fan installed in the PSU.

Under pressure

Searching the web, you will find many texts recommending to put a quieter PC into your workstation. Vintage-me says: Don’t!

In case of workstations (pun intended!) a new indicator is needed in the game of fan replacement.
While the ‘PC world’ just looks at the CFM value (cubic feet per minute, i.e. airflow) as a performance indicator, workstation owners need to check the static pressure delivered by a fan. This is measured in mm/H20 (millimeter of water) and means how strong is a fan pulling air over obstacles and/or through venting slits etc. – think vacuum cleaner.
In consequence, two fans having about the same CFM value might be completely different when it comes down to static pressure. This is a nice table I found on the web comparing some high-performance fans with standard PC ones and the Papst 8412N is a good example of what I just wrote: The much liked NF-A8  has just 25% less CFM but only half the mm/H20:

Fan Airflow [CFM] Static pressure [mmH2O] RPM Noise [dB]
Panaflo FBA08A12U1A 46.9 4.8 3450 38.2
EBM Papst 8412N 40.6 4 3100 32.0
Noctua NF-A8 FLX 30 1.96 2000 16.1
Noctua NF-R8 31 1.4 1800 17.1
Arctic F8 31 1 2000 20

That power naturally comes at a price: More revs and much more noise… which is inevitable at the given mm/H20 power.

What’s cooking?

So what happens if I chose the wrong fan?
If a low-pressure fan is placed in an airflow path with lots of obstacles, the fan’s airflow will reduce and it will cool only the nearby components, but won’t have enough juice to suck heat from parts further away.
This means it will only cool (parts of) the power supply and the rest of the workstation will be more-or-less cooled by convection and the internal temperature will accumulate. Running such a machine for a longer duration will lead to ‘effects’. From errors to crashes, even smoke and finally destruction.

C’mon, that’s folklore, Axel!” – Not a bit my friend.
I just had this experience when I thought that my MIPS RS2030 workstation could do just fine with a somewhat more silent, recent fan. The original one is/was a Delta AFB0812HH – a hellish loud fan. But even a  Noctua NF-A8 with a mm/H2O of nearly 2 made the small MIPS workstation unstable. First the PSU housing got really warm in the middle (switching regulators are screwed onto it there) and after 30 minutes of torture (compiling code) I got more and more SCSI I/O errors until the system completely froze.
Changing the fan back to the noisy one everything ran rock-solid. Quod erat demonstrandum!

So what should I do?

There’s no general advise to follow. If you want to keep your original PSU it might be possible to use a more silent fan for just the PSU and add one or more fans caring for the case ventilation.
Your milage may vary and multiple modern fans -which also need to be mounted somehow- might add up producing the same amount of noise like the single original did.

The cleanest solution is replacing the original PSU innards by more modern & smaller parts which then draw less power and therefore dissipate less heat, requiring less cooling.
This is especially advisable if you have the feeling that the original PSU gives fist signs of ageing (smell, heat, buzzing). Better safe than sorry!