Friday, May 30, 2008

A Tale of Two Developments, Microsoft versus Open Source

Long ago I wrote that I would relate my experience converting a fledgling system from Windows/IIS/SQL Server/.NET to an open source stack, but I never did. This is one of those situations where I've told the story so often that I feel as though I've already told it here, but that's not true, so here it is.

Back in the 1990s I created an application banks used to check their customers' names against a special list of terrorists, narco-traffickers and sanctioned governments kept by Treasury. This list, the list of Specially Designated Nationals, is one of the weapons at the disposal of the Office of Foreign Assets Control in their offense and defense of economic warfare.

Briefly, the government updates the list whenever necessary and publishes text files on a website. The files contain related lists of information such as addresses and aliases. Critically, the names have been translated from other languages and transliterated from other alphabets, so searching presents some challenges.

Early on, this application was a simple desktop application located in a bank's compliance office. It was written in VB with an Access database. Some commercial FTP tool was included to handle the download and VB code downloaded the text files, parse them, create the database codes and handled the transliteration issues, pumped the data into the data tables, cleaned up and presented a user interface. Access provided reports, which by their nature crude, user authentication and logging.

This solution was put together pretty quickly, but left much to be desired. Access was notoriously easy to corrupt, difficult to secure and hard to transport. If you've ever programmed text manipulation in VBA or VB, you know what a painful and resource intense task that can be. As a binary hairball requiring a purchased license, this solution could not necessarily be shared freely. Also, the solution required a particular environment, Windows and Office.

To extend the reach of the application we decided to make it web-based. Naturally, knowing no better, the software stack would be all Microsoft products, Windows, IIS, SQL Server and at the time the brand new .NET driving glittering ASP-delivered content to the obviously superior IE browser. Further, just to demonstrate how great the solution was, we would provide the data interactions through SOAP calls for interoperability.

Microsoft's .NET was quite amorphous at the time, more a strange brand than a development platform. While as a development platform it was interesting though new to me and the folks in my shop. We were too busy with some other things at the time to engage in this experiment of rebuilding a solution on a new platform. Instead I enlisted some colleagues who were interested in seeing how this would all work out. These fellow Microsofties had written with me and had spoken at conferences along side me and had already a couple of years of .Net under their belts by being beta testers, early adopters and evangelists for the platform.

What could go wrong?

That all depends upon your standards and expectations.

In keeping with Microsoft's recommendation at the time that every significant service should run on a separate physical machine, the new prototyped, web-enabled solution straddled 3 different servers. One contained the database, one handled the web server and another handled the ftp monitoring, downloading, parsing and logging. That's three boxes taking up rack space, drinking electricity, demanding administration, repair, patches and licenses. If I remember correctly, IIS wasn't free back then, you had to buy it or buy a special version of Windows Server to get it included and SQL Server certainly wasn't free. Moving from a desktop platform to a server platform still didn't provide adequate, programmable FTP capabilities, so we had to purchase that too. Did I mention the cost of .NET Visual Studio? Adding to the brittleness of the solution, there was no useful built-in logging capability and Window's scheduler was known to be broken, so we had to get a third party application schedule events like checking the government's FTP site for updates. What kind of server doesn't have a functioning scheduling ability? I think Windows still lacks an independent, reliable clock.

Following Microsoft's standards, and in light of the considerable hype from Microsoft about its offerings and .NET, my expectations were that this would be a robust application providing gazelle-like performance and the kind of reliability that would let me sleep through the night. Based upon the cost of this solution and the obvious horsepower being brought to bear on it, I thought these were reasonable expectations.

I was wrong.

Microsoft's suggestion to have only one significant service running on a server at a time wasn't intended to improve stability or performance, it was intended to sell licenses - they don't really care about stability. Each of the three servers is as likely to fail as any of the others, just at different times. Having three boxes running Windows in production meant that we had constant crashes, hang-ups, compromises, memory bust-ups, disk corruptions and reboots. Just about every problem that came up with this infrastructure required a hands-on solution because remote administration of Windows was practically unheard of at the time. Instead of sleeping through the night, it was as if I had incontinent triplets sleeping in cribs in the next room.

This was obviously the price to pay for performance though, right? After all, Formula One race cars are fussy, but you wouldn't race the Grand Prix in anything else, would you? I wouldn't have minded a little grease under my fingernails if I had performance I could have been proud of, but I didn't. Windows and .NET were so inefficient at handling strings and so bogged down with unnecessary interprocess calls within and across the servers, and gobbled up so many cycles and so much memory that the performance was abominable. A simple search to see if a customer's name appeared on the SDN list took a round trip of 7 seconds.

Read this slowly:

1 Mississippi

2 Mississippi

3 Mississippi

4 Mississippi

5 Mississippi

6 Mississippi

7 Mississippi

A human being flipping through an alphabetized list of the SDNs could almost match this performance. The solution's only efficiency came from maintaining the list, keeping a record of all the searches and ensuring that the first aid kit didn't run out of band-aids due to paper cuts. All these benefits were derived from non-Windows software. Natively, Windows provided non of these capabilities. Other than that, the system was a worthless and expensive failure.

I left the project behind and moved on to other things. Among the things I moved on to was the development of a new product, Agelis, and the search for a different default database to use with that product. As you may recall from my earlier blogs, it was that search that introduced me to open source software and defenestrated me. As my search through Unix-based, open source software carried me away I was getting resistance from some of my staff on my new view of the world. They wanted to see something concrete. They wanted proof that open source software could actually deliver.They were still picking crow out of their teeth after following me blindly into the morass of .NET. Their position was that this new fangled, web-based, SOAP stuff was just a fad and essentially untenable.

What they wanted me to prove was that an open source operating system, database, programming languages, web server, development and administrative tools would work better than the Microsoft's stack they knew.

Since I've already discussed FreeBSD and PostgreSQL you can see that the operating system and database are covered. Naturally, the FTP comes built-in on an operating system that's designed to work with other machines and Apache is the standard web server - everywhere, not just on Unix. Also, unlike Windows, FreeBSD comes loaded with language interpreters and compilers. So Perl, PHP, Python, C/C++ and Java are readily available, among others.

One of the real performance bottlenecks with the Microsoft stack was the SOAP service, .NET was simply not up to the task. Creating a SOAP service to provide XML output from a database query should be a simple thing. Since Unix handles text as a matter of course, it's been optimized over the decades to handle text and SOAP and XML are all about text. The only task I had before me was to find a really fast SOAP server. There were several choices. In the end I chose the gSOAP library and tool kit.

The new stack was FreeBSD, PostgreSQL, Apache, Perl, JavaScript and a SOAP server built using gSOAP. Logging was controlled by FreeBSD's logging ability and Cron took care of all the scheduling. Since each of these tools was so light-weight, I ran them all on the same server for the prototype with processing power and memory to spare. There were no memory leaks, no race conditions, no memory gobbling-oh-my-gd-we're-going-to-have-to-reboot -- again -- moments. Ever. Everything that took place on the system did so because we told it to take place and there were no mysterious processes running and absolutely no black magic.

Everything about this solution was better than the Microsoft version. First, it flew. The Microsoft version took 7 seconds to accept a search criteria, transform that criteria into a code, search the database, create the outgoing XML, deliver it to the web server and have the page rendered and sent to the browser. And it took 3 machines all running at the same time to make this happen.

The open source implementation, running on a single machine, did all this while using a small fraction of the system resources and returning results almost too quickly to measure. In production, we have been able to conduct millions of searches a minute without raising any concerns on the main server at all. I can administer every aspect of this solution from an ssh client on my phone.

And the software cost absolutely nothing. Zero. All the money I saved by not paying Microsoft I was able to sink into hardware, hosting and marketing instead.

What started out as an exercise to demonstrate the viability of a particular software stack became a profit making product. The product was always a compelling one, but the platform it needed changed. In an ever more competitive software environment, it's important that our raw material is as cheap as it can be and be of high quality too. Open source provides those benefits. I've seen for myself first hand how a product idea can be killed by Microsoft's lousy technology, but also by its lousy, expensive business model in which it charges for something when there is a better alternative available for less, or nothing.