So here are my thoughts about ParStream based on researching their product on the internet only. I have not used the product, so I am simply assuming it lives up to all claims. As an analytics user and a BI-DW practitioner I sincerely hope that ParStream succeeds.
I'm a GPU believer
I'm a long time believer in the importance of utilising GPU for challenging database problems. I wrote a post in July 2009 about using GPUs for databases and implored database vendors to move in that direction: "Why GPUs matter for DW/BI" (http://joeharris76.blogspot.com/2009/07/why-gpus-matter-for-dwbi.html). Here's the key quote - "There's a new world coming. It has a lot of cores. It will require new approaches. That world is accessible today through GPUs. Database vendors who move in this direction now will gain market share and momentum. Those who think they can wait on Intel and 'traditional' CPUs to 'catch up' may live to regret it."
On the right track
I think ParStream is *fundamentally* on the right track with a GPU accelerated analytic database. The ParStream presentation from Mike Hummel (http://www.youtube.com/watch?v=knicXkXd9hQ) talks about a query that took 12 minutes on Oracle taking just a few *miliseconds* on ParStream. If that is even half right the potential to shake up the industry and radically raise the bar on database performance is very exciting.
Reminiscent of Netezza
I remember the first time I used Netezza back in 2004. I had just taken a new role and my new company had recently installed a first generation Netezza appliance. In my previous job we had an Oracle data warehouse that was updated *weekly* and contained roughly 100 million rows. Queries commonly took *hours* to return. The Netezza machine held just less than 1 *billion* rows. I ran the following query: "SELECT month, COUNT(*), SUM(call_value) FROM cdr GROUP BY month;". It came back in 15 seconds! I was literally blown away.
A fast database changes the game
When you have a very fast analytic databases it totally changes the game. You can ask more questions, ask more complex questions and ask them more often. Analytics requires a lot of trial and error and removing time spent waiting on the database enables a new spectrum of possibilities. For example, Netezza enabled me to reprice _every_ call in our database against _every_ one of our competitors tariffs (i.e. an 'explosive' operation: 50 mil records in => 800 mil records out) and then calculate the best *possible* price for each customer on any tariff. I used that information to benchmark my company on "value for money" and to understand the hidden drivers for customer churn.
ParStream appliance strategy:
So, given that background, let's look at the positioning of ParStream, the potential problems they may face, and the opportunities they need to pursue.
ParStream is not Netezza
I've positively compared ParStream to Netezza above so you might expect me to applaud ParStream for offering an appliance. Sadly not; Netezza's appliance success was due to unique factors that ParStream cannot replicate. Netezza had to use custom hardware because they use a custom FPGA chip. Customers were (and are) nervous about investing heavily in such hardware, however Netezza goes to great lengths to reassure them; providing service guarantees, plenty of spare parts and using commodity components wherever possible (power supplies, disks, host server, etc.). Also we must remember that most customers looking at Netezza were using very large servers (or server clusters) and required *very many* disks to get reasonable I/O performance for their databases. Netezza was actually reducing complexity for those customers.
The world has changed going into 2011
ParStream cannot replicate those market conditions. The world has changed considerably going into 2011 and different factors need to be emphasised. ParStream relies on Nvidia GPUs that are widely available and installed on commodity interconnects (e.g. PCIe). Moreover there are high quality server offerings available in 2 form factors that make the appliance strategy more of a liability than an asset. First, Nvidia (and others) sell 1U rack mounted 'server' that contain 4 GPUs and connect to 'host' server via a PCIe card. Second Supermicro (and others) sell 4U 'super' servers that contain 2 Intel Xeons and 4 GPUs in a pre-integrated package. The ParStream appliance may well be superior to these offerings in some key way however such advantages will be quickly wiped by out as the server manufactures continuously refresh their product line.
Focus on the database software business
ParStream should focus on the database software business where they have a huge advantage not the server business where they have huge disadvantages. You should read this article if you have any further doubts: "The Power of Commodity Hardware" (http://www.svadventure.com/svadventure/2009/01/the-power-of-commodity-hardware.html). Key quotes: "Customers love commodity hardware.", "Competing with HP, IBM, and Dell is dumb.", "Commodity hardware is much more capital efficient". Also consider the fates of Kickfire and Dataupia who floundered on a database appliance strategy, and ParAccel who is going strong after initially offering an appliance and quickly moving to emphasise software-only.
Position GPUs as a new commodity
ParStream must position GPUs and GPU acceleration as a new commodity. Explain that GPUs are an essential part of all serious supercomputers and the technology is being embraced by everyone; Intel with Larabee, AMD with Fusion, etc. Emphasise the option to add 'commodity' 4 GPU pizza boxes servers alongside a customer's existing Xeon/Opteron servers and, using ParStream, make huge performance gains. Talk to Dell customers about using a single Dell PowerEdge C410x GPU chasis (http://www.dell.com/us/en/enterprise/servers/poweredge-c410x/pd.aspx) to accelerate an entire rack of "standard" servers running ParStream. The message must be clear: ParStream runs on commodity hardware; you may not have purchased GPU hardware before but you can get exactly what ParStream needs from your preferred vendor.
One final point here; ParStream needs to make Windows support a priority. This is probably not going to be fun, technically speaking, but Windows support will be important for the markets that ParStream should target (which will have to be another post, sadly).
UPDATE - I followed this post up with:
An overview of the analytic database market, a simple segmentation of the main analytic database vendors, and a summary of the key opportunities I see in the analytic databases market (esp. for ParStream and RainStor)