Product/Service

SmartCache Data Distribution Tool

Source: Blackstone Computing
SmartCache is a data distribution tool for compute farms enabling scalable and reliable data access
SmartCache is a data distribution tool for compute farms enabling scalable and reliable data access. It makes use of unique peer-to-peer distribution technology to make data highly available to the whole compute farm.

Primary Application
SmartCache was invented to enable data intensive applications on compute farm architectures. SmartCache specifically addresses the various sources of I/O bottlenecks incurred in single or multiple file server based architectures for data distribution. It is directly applicable to the data intensive class of genomic sequence analysis applications.

SmartCache works as an enhancement to load management systems in order to provide data locality based scheduling policies. Data are replicated and managed across the compute farm based upon demand and maximizing computational resources.

The net result of a SmartCache enabled compute farm includes improved job throughput, minimization of load on file servers, increased job completion reliability, and improved scalability.

Operation
SmartCache is easy to use and has minimal impact on users. Datasets are registered with SmartCache using the command line or web based tools. Jobs are submitted using the SmartCache job submission tool that is 100% compatible with the load management submission utility.

Data is automatically distributed and replicated across the compute farm based upon demand. Additionally, SmartCache offers manual management of data caching across compute nodes. Recommendations are made available to the load management system as to where to schedule a job based upon data locality with preference to nodes where data are cached in core memory.

Detailed Features

  • Enhances standard load management scheduling policies with data directed scheduling.
  • Automatically maintains the update, cleanup, and integrity of local caches without user or administration intervention.
  • Self-organizes frequently accessed datasets that are automatically replicated to make best use of all compute resources.
  • Provides detailed data access auditing for generating utilization reports.
  • Does not rely upon the Unix Network File System (NFS) for file transfers.
  • Offers compute farm wide cache invalidation for data set version management.
  • Auto detects for the availability of GigaNet low latency high band-width networking for data distribution – no special configuration required.
  • Configured for both command line and web-based user interface.
  • Fast, lightweight, low impact.
Benefits
  • Efficient and effective network utilization, enabling commodity off the shelf network hardware on large compute clusters.
  • Enables data intensive applications on the compute farm architecture.
System Requirements
  • Load management software: Platform's LSF or Sun's GridEngine.
  • Unix environment: Sun Solaris, Linux, Compaq Tru64
  • Web server with CGI capability (optional).

Blackstone Computing, 100 Grove Street, Worcester, MA 01605. Tel: 508-793-2162; Fax: 508-793-2972.