SmartCache Data Distribution Tool
Primary Application
SmartCache was invented to enable
data intensive applications on compute
farm architectures. SmartCache
specifically addresses the various
sources of I/O bottlenecks incurred in
single or multiple file server based
architectures for data distribution. It
is directly applicable to the data
intensive class of genomic sequence
analysis applications.
SmartCache works as an enhancement to load management systems in order to provide data locality based scheduling policies. Data are replicated and managed across the compute farm based upon demand and maximizing computational resources.
The net result of a SmartCache enabled compute farm includes improved job throughput, minimization of load on file servers, increased job completion reliability, and improved scalability.
Operation
SmartCache is easy to use and has
minimal impact on users. Datasets
are registered with SmartCache using
the command line or web based
tools. Jobs are submitted using the
SmartCache job submission tool that
is 100% compatible with the load
management submission utility.
Data is automatically distributed and replicated across the compute farm based upon demand. Additionally, SmartCache offers manual management of data caching across compute nodes. Recommendations are made available to the load management system as to where to schedule a job based upon data locality with preference to nodes where data are cached in core memory.
Detailed Features
- Enhances standard load management scheduling policies with data directed scheduling.
- Automatically maintains the update, cleanup, and integrity of local caches without user or administration intervention.
- Self-organizes frequently accessed datasets that are automatically replicated to make best use of all compute resources.
- Provides detailed data access auditing for generating utilization reports.
- Does not rely upon the Unix Network File System (NFS) for file transfers.
- Offers compute farm wide cache invalidation for data set version management.
- Auto detects for the availability of GigaNet low latency high band-width networking for data distribution – no special configuration required.
- Configured for both command line and web-based user interface.
- Fast, lightweight, low impact.
- Efficient and effective network utilization, enabling commodity off the shelf network hardware on large compute clusters.
- Enables data intensive applications on the compute farm architecture.
- Load management software: Platform's LSF or Sun's GridEngine.
- Unix environment: Sun Solaris, Linux, Compaq Tru64
- Web server with CGI capability (optional).
Blackstone Computing, 100 Grove Street, Worcester, MA 01605. Tel: 508-793-2162; Fax: 508-793-2972.