Tag Archives: fpu

NumberCruncher Reloaded

The NumberCruncher Reloaded is a peripheral card for the Apple II series that features a math co-processor, often also called a Floating Point Unit (FPU) which is specialized on, well, floating point calculations. Doing so, it is much, much faster than any 6502 or 65816 CPU ever will be.

That said, the NumberCruncher Reloaded will not automatically speed-up your programs as CPU accelerators like the Transwarp GS or ZIP CHIP would do. Programs will have to be either specifically written to use the NumberCruncher Reloaded or use a floating-point library like the SANE interface which then needs to be patched to itself use the NumberCruncher Reloaded for calculations instead of the main CPU.

TLDR; In a hurry? Here are some shortcuts for this page:

In the beginning…

The “Reloaded” in its name hints towards the fact that this is a reboot of an already existing card. To make writing/reading easier, NumberCruncher Reloaded might be shortened to ‘NC-R’ further down this page…
In 1988, there was the Floating Point Engine (FPE) created by Innovative Systems (‘iS’ for short).
Read more about those in my separate post over here.

While it was a great idea, it wasn’t the most stable design – but it laid the foundation especially and most importantly for the software we’re still using today.
Due to the FPE’s issues there was quite some displeasure in the user-ship and in 1990 a German company called Alternative Systems announced the Number Cruncher, an ‘overhaul’ of the original design – here’s their newsgroup announcement:

“The FPE is suffering from a major problem, namely the coproc is crashing internally and has to be reset in software. This happens in a non-deterministic way, and software written for that engineering junk must be adapted to that.
The Number Cruncher is compatible with the FPE but is actually what the FPE was supposed to be – a math coproc that works. It performs very well.”

Over the years the FPE as well as the NC faded in unobtanium. Because they were cool, and I love processors of all kinds it was time to revive the Number Cruncher.

Revival!

If you have read the above mentioned post about the the FPE you learned that the predecessors were built around the first, very obsolete and proprietary FPGA, a 555 timer and the 68881 FPU. All these parts would have a Facebook status of ‘#complicated’ today and needed to be replaced.

Logic: The Xilinx XC2064 FPGA was replaced by something more recent like my universal 5V-tolerant weapon of choice, the Altera EPM3064 (aka MAX3000). That little fellow has enough logical gates and using the 100-pin version sufficient I/O-pins are available even when using ISP. The timer for blinking the busy-LED went into this, too.

FPU: There are still some 68881 around, but the 68882 is much easier to find  and both are cheaper in PLCC packaging than ceramic PGAs as of today. But as future NC-R owners might already own one or the other, so we’ll go with… both. Yes – to offer maximum flexibility, you can use either Pin-Grid-Array or PLCC packages.
Physical differences aside, the original FPE/NC did not work with the 68882 – the NumberCruncher Reloaded does.

To sum it up the NumberCruncher Reloaded was improved in many aspects to make it much more usable in the 21st century:

  • it also supports the enhanced and the easier to find MC68882
  • FPU’s can be used either in pin-grid-array or PLCC package thanks to the two sockets provided. Again, the latter being much more common these days
  • Further increased stability by using low-power SMD parts and a 4-layer PCB with dedicated supply layers
  • Speed optimized FPU protocol handling
  • 2 more LEDs, which I consider very important.
  • Updatable firmware (ALTERA ByteBlaster required)

Software

In contrast to my T2A2 Transputer Link-Adapter, there is already some software available for the NC-R.

The Tools Disk

This is a good start but was mainly intended for that warm fuzzy feeling of unboxing a real product 😉

Download: 2MG image or ShrinkIt image

Still, based on the original Innovative Systems disk (updated to the latest releases),  it provides everything you need to start:

  • All Apple IIGS-related software is located in the FPE.IIGS folder.
  • All Apple II/II+/IIe-related software is located in the FPE.6502 folder.
  • The Appleworks 2.x modification software is located in the APPPLEWORKS.FPE8 folder.

In the FPE.IIGS folder:

  • In the FPETOOLS.INIT folder you’ll find the FPE tool set named FPE.INIT.Sx . Be sure to delete any existing  SANE.INIT.x file.
  • The EXAMPLE folder contains an assembly language file which demonstrates the use of NC-R register-to-register operations to significantly improve floating point operations speed.
  • The BENCHMARK folder contains an ORCA/PASCAL version of the SAVAGE benchmark and an APW C version of the Byte Magazine floating point co-processor benchmark. (The program in example is an assembly language adaptation of the co-processor benchmark).
  • The APW.ORCA.FPE16 , MERLIN.FPEand LISA816 folders contain macros and equates files for use in assembly language programming.

The FPETOOLS.INIT requires some extra explanation because it is a very elegant solution to put the NC-R to use:

This init redirects every Standard Apple Numerics Environment™ (SANE) floating-point call to the NumberCruncher Reloaded – so as long an application using the SANE library calls it will be accelerated. All you need is to copy the INIT corresponding to the slot your NC-R is installed to…

…and you’re done!

As said in the FPE.6502 folder you will find all tools for the Apple II.

That also has a SANE patch which can be found in TOOLSET:FPE8.TOOLSET.  This toolset uses the following calls:

jsr $2100 ; to call the fp6502 routines 
jsr $2104 ; to call the ELEMS6502 routines

It loads into locations beginning at $(00)2100 and has a length of less than $1000 bytes.

I have included the APW.ORCA.FPE8 and MERLIN8.FPE macro folders.

The APPLEWORKS.FPE8 folder contains the Appleworks 2.x modification. To modify your copy of Appleworks, just run FPE.SYSTEM from the root folder and answer the questions. The modified code will automatically access the FPE whenever a floating point operation is required.

In the FPEfractal folder you will find Zoombaya (and other fractal programs, all by Glen Brendon).
This is my currently favorite tool for benchmarking and testing.
It’s written in Applesoft BASIC(!) and uses Glens cool so-called ProCMD module which sets up an interface between Applesoft programs. The downside is, that it uses some 65816/65802 specific commands. So running it on a 6502 CPU will lead to a crash.

“Real” Programs

I also prepared an archive, containing all programs (I was able to find) supporting an FPE card in some manner.

Download: ShrinkIt archive or ZIP file

Obviously, they’re mainly math packages… and sad but true, they’re all IIgs programs. :

GSnumerics (by Spring Branch Software)

GSnumerics

Symbolix (by Henrik Gudat of Bright Software)

Screen Shot

jazGraph (by Jason Perez)

MathGraphics (by Dirk Fröhling)

saneglue (by Söhnke Behrens)

From the README: “lsaneglue is a library that contains code to let you call SANE funtions directly from ORCA/C”.
This lib provides convenient functions like findfpcp() and most calls to floating-point operations.

FAQ

Q: Which Apple computers are compatible with the NC-R?
A:
I’ve tested the NC-R in my IIgs and IIe. Those work for sure.
The original FPE was communicated as being compatible with the II and II+, too. I don’t have those machines and while the compatibility is highly possible, it has yet to be proven.

Q: I’m experiencing crashes and instant lock-ups starting programs which are supposed to use the NC-R
A: Most likely your software is expecting the FPE/NC-R in another slot.
For speed sake, most current programs naively supporting an FPU card, expect the card in a certain slot. Especially the SANE INIT.
So please check if your NC-R is installed in the correct slot and try other programs if they are crashing, too.
I recommend the Mandelbrot program provided on the NC-R Tools disk. This program scans all slots for a FPE/NC-R by itself.

Q: What are these LEDs for?

  • The green BUSY LED blinks at every access to the FPU.
  • The yellow INFO LED doesn’t have a proper job yet. Currently it’s connected to DEVSEL, so you can see it blink very briefly, when your Apple II scans its bus.
  • The red ERROR LED will be lit when the FPU encounters a so-called ‘protocol violation’, i.e. there’s some problem in the communication between the Apple and the 68881/2.
    See page 30 in the manual for more details.

Q: Can I make the NC-R go faster? What about overclocking?
A: Not really. See the ‘Benchmark‘ section further down.

Q: On the pictures of the NC-R I can identify a 40MHz Motorola 68882. Do all NC-R have such fast FPU?
A: No. I use whatever 68882 is available on the market. Strangely enough, sometimes a 40MHz version is cheaper than say a 16MHz.
So whichever version is installed in your NC-R, it’ll be fast enough and always clocked at 16MHz anyhow.

Q: How can I write programs using the NC-R?
A: This is an extensive question which can’t be answered satisfyingly in an FAQ. Refer to chapter 3 of the manual to learn how the NC-R works internally and how to program it in assembler, C or even Basic.
But I’m also thinking about a dedicated post about just that matter.

Q: I wrote a small program to test the NC-R and it’s not really faster than using SANE on my IIgs
A: There’s a certain amount of calculations need to be done until the NC-R shows its performance. A single addition probably takes longer than it would need on e.g. a stock Transwarp GS because all the communication overhead.
This dramatically changes if you have lots of floating-point calculations in one stream, optimally using the 68881/2 internal registers.

Q: Are you writing software for the NC-R
A: Well, maybe in the future but currently I’m busy with other projects.
But so much can be revealed: Brutal Deluxe has its NC-R already 😲

Q: How can I ask you a question / get help / praise / complain / rant about the NC-R?
A: I reactivated my little Forum on this page. Therefore it got its own Apple II board.
This way everybody can participate in your question or complaint… speaking of complains: Do not complain that you have to register and that it took so long for the approval! This is a one-man show, there are time-zones and despite other rumors, I do have a job 😀

Q: Why did you chose green for the PCBs colo(u)r?
A: It’s a remake. So  it should at least somewhat resemble the original look. Also given there are translucent lid/case options available today I personally don’t like the idea of a motley bunch of green/blue/red/white cards standing out of my Apple II… yeah, I’m old-school 👨‍🦳

Benchmark

Still, we’re all hackers of some sort, so you might ask yourself “What’s the fastest option?” or “Can I make it even faster?”.
Well, while the 68882 is up to 30% faster than the 68881 due to its capability to execute commands in parallel this mostly doesn’t matter in the case of the NumberCruncher Reloaded.

As mentioned in the feature-list, thanks to protocol optimizations the NumberCruncher Reloaded is already a tiny bit (~4%) faster than the original at the same clock-speed, but more important is that it “saturates” later.
This means while with the FPE/NC you don’t see any improvements beyond 12MHz the NumberCruncher Reloaded still benefits from a faster clock up to 16MHz.

From there it does not make much sense as the Apple II bus, clocked at 1MHz, is the limiting factor here.

Using the ‘Zoombaya Mandelbrot’ program included on the Tools floppy disk on a TransWarp GS @ 10MHz gives these calculation times:

MHz Original NC NC-R 68882 NC-R 68881 Faster than NC
11 3:59 3:55 3:55 ~1.5%
12 3:55 3:45 3:45 ~4%
16 3:55 3:12 3:12 ~20%
20 N/A 3:12 N/A N/A

The speed of your Apple computer very much plays into the equation, too. It handles all the writing/reading and of course the endian conversion.
Here’s an NC-R clocked at the default 16MHz installed in an Apple IIgs running the same Mandelbrot at different clocking:

‘normal’  1MHz ‘fast’ 2.8MHz TWGS @ 10MHz TWGS @ 12 MHz
8:35 4:39 3:12 3:10

The TransWarp GS seems to saturate pretty early. Most likely because it’s running the code completely out of its cache.

Purchase

Of course you want one now 😉 And as long you can read this text, I have some available from the first batch of 35.
[currently available/non-reserved: 13 cards]
.

NB: I will be on vacation until August 6th.
Thus orders will be handled after my return.
Ahhh, look at those beauties!

The NumberCruncher Reloaded comes in a nice, eco-friendly box with a high quality, glossy spiral-bound manual and an 800k ProDos floppy.

You can choose from 3 different configurations:

Config 1 “Plug’n’Play”:
NC-R with just the PLCC socket, a 68882 FPU is already installed. Plug the card into your Apple and you’re ready to go:  €88.82

Config 2A “I have a PLCC FPU”:
NC-R with just the PLCC socket – you provide and install a PLCC 68881 or 68882 yourself: €80.82

Config 2B “I have a PGA FPU”:
NC-R with PLCC plus a Pin-Grid-Array (PGA) socket – you provide and install a PLCC or PGA 68881 or 68882 yourself €84.82
(There’s no PGA-only option because the PLCC slot is used for the  final function-test)

Shipping

It’s 2021 and the world is  suffering from the COVID19 pandemic. Shipping services are still providing very limited services.
I am shipping from Germany as trackable package only. Also I chose the box dimensions that way, that it snugly fits into DHLs shipping categories. In that case, I will wrap the original box in strong wrapping paper. It weights 250 grams (that’s 0.55lbs or 8.8oz or 0.039 stone – no idea what’s that converted to Ningi 😉 )
Alternatively I can ship the box in an extra box for better protection against “Crazy delivery men”. That will increase the size and thus the shipping price.

Shipping into the European Union is relatively affordable with 9€ – if you want a “box around the box” it’ll be 14€
(Just in case, for Germany it’ll be 6€ as “Paket”)

UK and Switzerland will be 13€ already. Box-in-box will be 18€ then.

 The US  can choose between 13€ (or 20€ for box-in-box) and risking a wait of 60+(!) days or a hefty 53€ which will be priority shipping.
Australia is even worse. 13€ for the wrapped box, 20€ for box-in-box or insane 68€ for priority shipping.

That said, the priority shipping will be fine for 5kg (11lbs, i.e. good for ~15 NC-Rs plus padding), so if you can organize a group-buy, I’m more than happy to support you.

Ordering

Because of bad experiences in the past, this is pre-paid only.
Again: This is a hobby project made for fun & the community, I’m not a business.
I accept PayPal (for friends) and bank-transfer.
To order, send a mail to ncr@geekdot.com stating

  • your Address
  • the NC-R version(s) you like
  • the preferred shipping method
    • wrapped
    • box-in-box standard
    • box-in-box priority

I will then send you all the details for paying & shipping. As soon I received the money, I will ship and send you the tracking information.
If you want to arrange a group-buy I can reserve a certain amount of NC-Rs for a max of 5 days. After that they will be sold on-demand.

The Number Cruncher – FPU for the Apple II

This is a post I’d like to write since 2006… so 15 years after I’ve put all the Transputer, MIPS, i860 and-what-not stuff aside and made some room for my other (late) love: The Apple IIgs

You might have stumbled about my post/project connecting a Transputer to the Apple II called the T2A2… well, what I did not mention was, that the inspiration to this came from another card built in 1988, called the FPE made by Innovative Systems. That’s a small card featuring just a buffer, an old XILINX FPGA (actually the first of its kind) and the Motorola 68881 floating point co-processor.

Co-pro-ces-sor! I was hooked!  😯

After more research I learned that the FPE was actually a diva like this newsgroup post says:

“The FPE is suffering from a major problem, namely the coproc is crashing internally and has to be reset in software. This happens in a non deterministic way, and software written for that engineering junk must be adapted to that. “

But a bit later there was something better available: The Number Cruncher (NC for short). Fine German engineering  😉 And the newsgroup post was quite nice to it:
“The Number Cruncher is compatible with the FPE but is actually what the FPE was supposed to be – a math coproc that works. It perfoms very well.”

This is the “marketing blurb” from back then

  • totally compatible with the FPE from Innovative Systems
  • much less sensible to heat, voltage problems etc.
  • supports the FPE SANE patch for speeding up any program that does floating point calculations
  • you can compile ORCA/Pascal and C programs to use it directly with the special floatlib for the FPE provided by The Byte Works
  • works in any slot (slot 3 & 4 without need for setting it to “Your Card”)
  • works with TransWarp GS and Zip GS accelerators and RamFAST SCSI card
  • comes with lots of mathematical software and an enhanced SANE patch by Albert Chin-A-Young

I directly contacted the creators of the NC, Dirk and Andreas, but both of them hadn’t had a tiny bit of the NC left. No schematic, no HDL, no nothing 🙁

During looooong and wild eBay raids I managed to get my hands onto an FPE as well as on an NumberCruncher. Woohoo! They both work with the same driver and yes, the NC is much more robust than the FPE.
But it seems that there are only a handful of them survived, if anymore at all. So I thought it might be interesting for me and others to make a re-release…

Let’s re-animate the Number Cruncher!

So, again, it’s up to me to save the world… reverse-engineering time again!
This job consists of two parts – dissect the hardware and to get a better understanding of this, the software/driver, too.

Hardware

Well, you could simply rebuild the card, copy the bitstream from the serial PROM for the FPGA and you’re done.
But it’s not so easy, as the FPGA which has been used, the XILINX XC2064 is long time EOL’ed and if you buy old stock, they’re more expensive than a comparably recent CPLD – if they’re not Chinese fakes and/or broken… not mentioning that a simple copy-paste job is not really manly 😉

The XC2064 is a 5V FPGA (the first of its kind actually), has 600-1000 logical gates and 58 user IO pins of which 32 are used. It is getting its bitstream fed by an external 12K serial PROM. I’ve pulled the bitstream in a file available here, but because XILINX never documented that format, it’s pretty much useless.

The rest is all standard stuff. A ‘245 transceiver, an oscillator (~12MHz)  and some chicken feed.

Revving it up

To prove that I got everything right and having a better attack analysis vector I designed the “NumberCruncher Reloaded v0.1“… more or less a 1:1 clone but bringing out all FPGA signals to pin-headers to have a convenient access for my logic analyzer.
Also I used a PLCC version of the 68881, because I have more of those.

Then, out went the logic analyzer and weeks and weeks of looking/listening to the the FPGAs conversation to the the FPU and Apple bus while having the 68881 manual on my lap, I got an idea of what’s going on.

Software

Later I found the original floppy which came with the NumberCruncher containing  just the init-files for GS/OS patching SANE to use the FPU instead of the 65c816  (download the ShrinkIt archive here) – well, at least something!

Next came my ol’ buddy Mr. disassembler… a lot. About a year on-and-off… and here’s my initial, slightly commented disassembly of the SANE patch.

Again, some years down the road I found the floppies which were delivered with the FPE and those are quite helpful with code examples, assembler macros and libs etc. – that would have been quite helpful during the disassembly  😕 Anyhow, it’s good to see that most of my interpretations were correct.
Also, the FPE came with quite a nice manual – which actually was more valuable than the card 😉
It very well describes the basic functionality, the FPU registers used and the programming side of things – even the most important parts from the 68881 manual are cited. Very good job on that Innovative Systems!

Having all this at my hands I was prepared to start and after some huffing and puffing the NumberCruncher Reloaded v.1.0 was born.

Weitek Abacus FPU

This is the first post on GeekDot about a single IC. While I did a lot of writing about ICs, mainly CPUs and FPUs, back in the BBS days (German) I stopped doing so after that and concentrated on collecting them during the following years – a true love never dies 😉
The Weitek Abacus FPUs were always special, exotic and unaffordable for the most of us. Today they additionally got a touch of a mythic being, especially as 386/486 boards featuring the special  WEITEK socket are dying out fast. Not mentioning the Abacus FPU itself.

Recently I had a mail conversation about the Abacuses and because they’re are memory mapped, I thought they might actually fit quite good into the range of my other accelerator post as the DSM860 or all those Transputer cards.

I wasn’t planning to go into all the details about the Abacus models 3167 and 4167 and went looking for a handy Wikipedia link.
To my surprise that article is pretty general and just briefly touches all product ever made by WEITEK.
So, the best source of technical information about these FPUs is the highly recommendable posting “copro16a.txt“, written by Norbert Juffa in 1994.
I’d call it the most comprehensive write-up about FPUs until that date. Because you’ll never know what happens to external links, here’s the (shortened) part about the Weitek chips:

The architecture of the Weitek chips differs significantly from the 80x87.
Strictly speaking, the Weitek Abacus 3167 and 4167 are not coprocessors in that they do not transparently extend the CPU architecture; rather, they could be described as highly-specialized, memory-mapped IO devices. But as the term "coprocessor" has been traditionally used for these chips, they will
be referred to as such here.

The Weitek coprocessors have a RISC-like architecture which has been tuned for maximum performance. Only a small instruction set has been implemented in the chip, but each instruction executes at a very high speed (usually only a few clock cycles each). [...] 
In contrast to the 80x87 family, the Weitek Abacus does not support a double extended format, has no built-in transcendental functions, and does not support denormals. The resources required to implement such features have instead been devoted to implement the basic arithmetic operations as fast as possible.

While the 80x87 coprocessors perform all internal calculations in double extended precision and therefore have about the same performance for single and double-precision calculations, the Weitek features explicit single and double-precision operations. For applications that require only single-precision operations, the Weitek can therefore provide very high performance, as single-precision operations are about twice as fast as their double-precision counterparts. Also, since the Weitek Abacus has more registers than the 80x87 coprocessors (31 versus 8), values can be kept in registers more often and have to be loaded from memory less frequently. This also leads to performance gains.
[...]

To the main CPU, the Weitek Abacus appears as a 64 KB block of memory starting at physical address 0C0000000h. Each address in this range corresponds to a coprocessor instruction. Accessing a specified memory location within this block with a MOV instruction causes the corresponding Weitek instruction to be executed. (The instructions have been cleverly assigned to memory locations in such a way that loads to consecutive coprocessor registers can make use of the 386/486 MOVS string instruction.)
This memory-mapped interface is much faster than the IO-oriented protocol that is used to couple the CPU to an 80287 or 80387 coprocessor. The Weitek's memory block can actually be assigned to any logical address using the MMU (memory management unit) in the 386/486's protected and virtual modes. This also means that the Weitek Abacus *cannot* be used in the real mode of those processors, since their physical starting address (0C0000000h) is not within the 1 MByte address range and the MMU is inoperable in real mode. However, DOS programs can make use of the Weitek by using a DOS extender or a memory manager (such as QEMM or EMM386) that runs in protected/virtual mode itself and can therefore map the Weitek's memory block to any desired location in the 1 MByte address range.

Typically the FS segment register is then set up to point to the Weitek's memory block. On the 80486, this technique has severe drawbacks, as using the FS: prefix takes an additional clock cycle, thereby nearly halving the performance of the 4167. Most DOS-based compilers exhibit this problem, so the only way around it is to code in assembly language.

Ok, so we have a good idea how the Abacuses work and how they can be used. … but what’s the story behind the company?

This is a ‘web found’ © by antiquetech.com
“Founded by Chi-Shin Wang, Edmund Sun, and Godfrey Fong (President and CEO) in 1981 in San Jose. All founders immigrated from China.
Weitek specialized in high-performance digital semiconductor components and systems for the computer and workstation industries. Weitek floating point units have been used with Inmos Transputers (Floating Point System T-series Hypercube, 1986), National Semiconductor NS32032’s (Encore Multimax, 1986), and Intel 386’s (1988). The 2048 was used in the Thinking Machines Corporations CM-2 Connection Machine. Weitek produced floating point processors for HP. HP allowed to Weitek to use it’s facilities to make chip for themselves and for their competition.”

When dramatically loosing ground to the 486DX2 Weitek moved away from the x86 platform and concentrated on SPARC CPU and FPUs as well as MIPS FPUs. That worked quite well for some time and in the 90s they finally moved into the frame buffer and graphics accelerator business. Their POWER P9000/P9100 models were quite successful but lost when players like S3 and ATI started to flex muscles.

Specs

So from our (PC-compatible) perspective there are two models of interest: The 3167 and the 4167. Basically they only differ in the bus protocol to either the Intel 386 or 486.
If you need more detailed data, I make the original specs available here for the 3167 and 4167 as PDF.
Both Abacuses clocked up to 33MHz. Especially the 4167-33MHz is very hard to find these days. That said, it might be possible to overclock a 25MHz version to 33MHz providing sufficient cooling. I found this press snippet where WEITEK proactively promoted overclocking a 33MHz to 50MHz by using a peltier element made by ICECAP – so when overclocking that by 51%, the 32% from 25 to 33MHz should be an issue:

WEITEK and ICECAP announce 33MHz version of Abacus 4167

Ram Ganapathi Mar 30 1992, 02:57 pm
WEITEK and ICECAP announce 33MHz version of Abacus 4167

Weitek Corporation and ICECAP Technologies announced today that users of 50MHz Intel 80486-based personal computers can realize a 50% performance increase for numeric-intensive applications by combining the 33 MHz version of Weitek’s Abacus 4167 math coprocessor with the ICECAP thermal management device. Superior system performance is achieved without degrading system reliability.

Using it

Applications that took advantage of the Weitek Abacus were scarce. AutoShade, Autodesk Renderman, 3-D Studio were the most prominent to use a Weitek coprocessor.
If you happen to own one, you might actually use or at least test it… so here’s the official test-suite from Weitek. It contains these tools

  • DOS TSR to update the BIOS for Abacus support (if missing)
  • A test tool wich checks for an Abacus presence using INT 11h BIOS calls.
  • A diagnose tool
  • 2 demos: A side-by-side Mandelbrot benchmark (yay!) and rendering a phong shaded beach ball
  • Abacus macros for using it natively in your cool assembly code

When I find the time to pull out my Weitek-PC, I’ll post pictures of the demos… feel free to comment if you feel I’ve been missing out something.