Clustrix devises new file data management system to better support highly transactional web sites

Share

"web companies shouldn't have to spend time on the infrastructure, they should spend all their time trying to make their site better!" Paul Mikesell, CEO and co-founder of Clustrix

a surfeit of data requires that new tools be invented

Data overflow has never been such a mind-boggling problem. New social media websites with hordes of users and multi-million connections as well as e-commerce behemoths with vast amounts of product, user and user preference data are causing database sizes to reach unprecedented levels. The result is overwhelming: scaling issues are staggering, incremental online expansion is towering, the impact on fault tolerance and availability are dire and we are not even mentioning TCO or ease of data management. Let's take an example with direct impact on users: if someone were launching a query in Facebook across all of your friends' friends you wouldn't be able to do it because of the partitioning of all the databases. These biggest pain points in the market were the starting point for the creation of Clustrix, a new venture co-founded by Sergei Tsarev and Paul Mikesell. On June 4, at the end of our trip to the Silicon Valley, we were greeted by Mikesell (left on our photo) and Daniel Liddle (right) at the Clustrix office, at the heart of San Francisco.

Clustrix has 3 ½ years of existence and has spent much time building the technology, it is funded by Sequoia ventures. The company was born from issues seen about scability issues at Mikesell's previous start-up Isilon. The early 90s and the dot com boom put pressure on application servers and enhanced the need to increase storage hence the building of a massively performant database systems. This is the point from which the need to create Clustrix has arised.

doing away with 'sharding' makes TCO 14 times lower

Clustrix built a system which allows you to upgrade from 1 node (or server) to hundreds of nodes (or servers) in one single database, the system is delivered and sold like a hardware and software appliance. So far, "sharding" has been the only way to solve data management scalability issues, but Clustrix now offers an alternative. "Sharding" is the application partitioning of data across isolated databases whereas Clustrix is offering a single instance scalable database. "Sharding is not just costly, it is also very risky" Mikesell points out. Going back to our Facebook example (see above), Clustrix makes that kind of queries possible due to the fact that data is kept in a single database. To quote Paul Mikesell, Clustrix makes it possible for you to "move the query to the data, not the data to the query, and Clustrix Sierra has taken this simple concept and driven it to the logical conclusion with great benefit". The impact on TCO is also huge, and the Clustrix website makes it possible for you to check this online directly.

A typical example with 20 nodes brings the total cost of ownership over 5 years from approximately $3.5m down to $250k, i.e. 14 times less than the traditional method.

caveats, limitations and "knocking the next bowling pin"

The Clustrix solution only works with applications developed with MySQL, but it does so seamlessly according to the Clustrix representatives. There may be limitations in the number of nodes which can deployed but the company hasn't deployed more than 20 nodes at a time, "although there should be no problem about deploying up to 100 nodes" declares Paul Mikesell. When asked whether the unique SQL choice isn't a bet on the future, the Clustrix representatives respond that "MySQL is just an interface layer, and [they]made this decision because this market is big but [they] could expand to another language. We are trying to remain focused. There is no limitation for the future, we can "knock the next bowling pin if needs be" adds Paul Miskesell.

going out of stealth mode and catching like wildfire

Photo sharing, social media, e-commerce, gaming, dating, travel and advertising are among the most pertinent sectors for the Clustrix solution. The common characteristics have to be high growth sites, a mixed read/write workload, relational and non -batch queries and no analytics. The solution is only geared towards transactional workloads. Multiple paying customers and large scale deployments are on-going. The Company is just about to get out of stealth mode and announce the name of its customers in the US and EMEA (mostly in the UK and France). Mikesell thus describes how the new solution is catching like wildfire: "A great number of DBA's (data base administrators) want to get their hands and exposure on the system because it's 'too good to be true' and when they realize that it could 'actually work' then they all want to be part of the evaluation program but we had to restrict subscriptions to this evaluation program". The technology is protected by individual patents which are being filed but nothing has been granted yet, need for IP will probably .

a limited upfront investment

 $80k is the basic investment, and the limit at which it starts being interesting for clients is the point at which 1 SQL database is no longer sufficient and "sharding" has to take place. "Even implementing sharding is a pain because you have to stop the server and there's downtime for the server" Liddle points out, "besides, "sharding resources are few and far between and they tend to be very expensive".

the interview

Following is an interview of Paul Mikesell which was recorded on location in which he sums up his vision.

Yann Gourvennec

I specialize in information systems, HighTech marketing and Web marketing. I am author and contributor to numerous books and the CEO of Visionary Marketing. As such, I contribute regularly on this blog for Orange Business Services account on cloud computing and cloud storage topics.