One of the challenges of writing highly-scalable web applications lies in managing data related to an individual client's session state, globally changed variables such as inventory count, and sharing data between multiple components of a concurrent application.
Last June, Microsoft announced the first Community Technology Preview of Velocity, a distributed caching technology designed to provide an intermediate data cache independent of a back-end database for these needs./
"Between Tangosol and Microsoft Velocity, we are seeing strong validation for caching APIs," Bill Bain, founder and CEO of ScaleOut Software, said. "We believe that distributed caching will crystallize as a storage medium the same way that network file systems have."
This is not the first distributed cache for .NET application development. ScaleOut Software, Alachisoft and Appistry have been working on similar technology for years in the .NET space. Meanwhile, other companies such as GigaSpaces, GemStone, and Tangosol (recently bought by Oracle) have offered similar capabilities in the Java sector. Sam Charington, vice president of product management and marketing at Appistry, noted that the databases underlying Google Apps' Big Table and Amazon's EC2 SimpleDB provide many of the basic functions of distributed caching architectures.
Microsoft's entry is important, but its offering may not be complete, some people will assert. Bain said that Velocity does not include many important components of competitive distributed caching tools like ScaleOut State Server (SOSS). For example, SOSS works with version dating back to .NET 1.1, compared to Velocity, which is only compatible with .NET 3.5.
Other features that separate SOSS from Velocity include:
- an integrated, high-availability for stored data, which is required for mission-critical applications
- a self-configuring, self-healing and fully peer-to-peer architecture
- optional, scalable and quorum-based updating of stored objects
- asynchronous "push" notifications of cache events using a scalable, highly available delivery mechanism
- optional asynchronous replication between remote data centers to support disaster recovery scenarios
- parallel query of cached objects based on identifying tags
Appistry Fabric Accessible Memory (FAM) provides the data middleware infrastructure for the company's Enterprise Application Framework cloud development platform. According to Chartington, "FAM ensures the reliability of that data so that if one machine goes away or a virtual machine crashes, the data is always available in multiple places. It operates similar to RAID with multiple parity schemes so that it can survive a single machine's failure.
Bain said that some of the basic ideas behind distributed caching go back to parallel computing work in the 1980s and 1990s. This work took on more importance with the growth of the Web, and the need to scale a cluster of Web servers to dynamically meet the needs of millions of end users. Bain helped develop one of the first Internet Protocol load balancers to redirect incoming traffic to one of many .NET servers, which was ultimately sold to Microsoft as the Network Load Balancer tool.
The challenge is that this basic technology is not capable of storing the state of an individual client across Web servers. If an individual server becomes overloaded or crashes, it takes out the session state of all of the clients that had been using it as well. With the rise of AJAX and MVC, more web services are becoming dependant on this state data.
One way to overcome this challenge is to store state data in the back-end database, which Bain explained, are better optimized for long-term storage. Often, this is viewed as an expensive resource because the data is read and written too infrequently. Therefore, he noted, "There is a need for a middleware layer that will tie together the servers that allows data to be placed in the distributed cache, which is globally scalable. Distributed caches are designed to hold relatively fast changing data with a short lifetime and with more updates that you would see in a traditional database."
Charington added, "Over the years, we have seen developers overuse their relational database for storing intermediate state information, when they should be using more agile constructs like distributed caches."
The three key features of a distributed cache include:
- Global accessibility – if you put an object in the cache, the middleware layer will eliminate server affinity.
- Scalable performance – the performance needs to grow with the number of participating servers so that it does not become a bottleneck.
- High availability – it needs to remain available even after the loss of an individual server.
According to Bain, the current work on distributed caches closes the loop on the kind of research they were doing in the late 1980s with parallel computing. The key difference is that today, the distributed caches are designed to be highly available. In the '80s and '90s there was no way to maintain the state of an individual node in a parallel computer. If a hardware or software bug affected one element in a massive array of computers, it would halt the entire application, forcing the sysadmin to correct the problem and rerun the calculation from the start.
Distributed caching capabilities can help accelerate development time for high-performance applications in arenas like finance and scientific computing, which require significant communication between multiple computations, he continued. Traditionally, developers had to write code for passing messages between all of the different nodes involved in computation. By abstracting all of these details into the cloud, the developer can write the application to simply hand the data off to the globally memory system.
Although, he admits that distributed caching does not provide the same level of performance as can be achieved with highly optimized mechanical or fluid flow simulation tools, it dramatically improves the development of new applications. He explained, "You will not get the optimal performance if you write the application to manually move bytes between nodes, but you improve development. Its similar to writing a Windows application in Visual Studio compared to assembly language. It is a very useful tool for creating a high-level abstraction of memory, especially in applications with a fast turn around time."