A CORBA-Java Framework for Drug Discovery
Michael C. Dickson, NetGenics Inc.
Introduction
Why Use CORBA?
Why Use Java?
Benefits of a Framework
Using CORBA for Bioinformatics
The Synergy CORBA-Java Framework
Summary
Glossary
Introduction (Back to Top)
Current drug development efforts benefit from a wealth of available new technologies developed to accelerate the drug discovery process. However, the rate of this technological development has created complex data integration needs. Pharmaceutical and biotechnology companies increasingly are turning to new computer technologies to meet these needs. In particular, these companies have recognized the appeal of a single, flexible software framework that can rapidly be extended to incorporate a variety of data and analysis tools.
Individually, both CORBA (Common Object Request Broker Architecture) and Java offer advantages as the building-block technologies for such a system. Properly combined in a software framework, CORBA and Java can create a system that can be easily adapted, expanded and maintained. This paper explains CORBA and Java fundamentals, describes their advantages in a framework for pharmaceutical informatics, and illustrates the construction of a CORBA-Java framework for bioinformatics.
Why Use CORBA? (Back to Top)
The CORBA standard was created to answer the need for interoperability among the rapidly proliferating number of hardware and software products available today. CORBA allows applications to communicate with one another regardless of where they are located or who has designed them. This makes it the logical architecture choice for the needs of a highly variable and rapidly changing field like drug discovery. The CORBA specification describes a computer architecture that is object-oriented, distributed, language-independent, and hardware-neutral. Let us define these terms:
- Object-oriented. CORBA is object-oriented, making it is easy to build components that can be interchanged and reused.
- Distributed. CORBA's distributed architecture overcomes the inherent inflexibility of traditional client-server systems. To communicate in a traditional client-server environment, the client and server must be intimately familiar with one another. In a CORBA scheme, an object request broker (ORB) sits between the client and server and marshals all communication between them. This decoupling of the client and server enables rapid integration of new and pre-existing components.
- Language-independent. CORBA enables objects to interact through a language-independent specification known as the interface definition language (IDL). IDL defines interfaces to data sources and analysis objects, not implementations. As a result, the code that implements these interfaces can change without impact on the user of the implementation. For example, an IDL can be used to specify the interface to a wrapper (translator code) procedure for an algorithm, such as BLAST. This same implementation can be altered over time to incorporate updates to the algorithm or the source data, without impact on the user of the BLAST interface. This is made possible by the "contract" the interface defines between the BLAST algorithm (actually the wrapper procedure) and the user of the BLAST algorithm. For example, if a request is submitted as a locus name, a string containing a DNA sequence will be returned. The source of the sequence string is irrelevant.
Figure 1
- Hardware-neutral. Using an ORB, a client can transparently invoke a method on a server object regardless of its location, operating system or programming language. In so doing, the ORB provides interoperability between applications on different machines and seamlessly interconnects multiple objects and sub-systems (services).
In the context of software design, CORBA lets programmers choose the most appropriate operating system, execution environment and programming language to use for each component of a system under construction. More importantly, the ORB allows integration of pre-existing components. In an ORB-based solution, legacy components can be integrated by implementing an IDL interface of wrapper procedures that translate between the CORBA objects and the legacy program interfaces. Using wrappers, pre-existing proprietary, in-house software can be incorporated into the larger system without rewriting the incumbent application code.
Why Use Java? (Back to Top)
Java is an object-oriented computing language that supports graphically rich, dynamic programming. Although like HTML in its ability to deliver a zero-footprint client, Java allows the creation of software that "feels" like a real application. The Java "virtual machine" ensures that this application is platform-independent, i.e., that a Java program works equally well on Windows, Macintosh, and Unix workstations. For the programmer, Java facilitates the creation of implementations because it is multi-threaded, garbage-collected and allows object classes to be loaded dynamically.
Java and IDL are conceptually similar, so Java-IDL bindings are easy to understand and create. This makes Java an ideal client companion for a CORBA-compliant ORB. CORBA and Java are powerful technologies, but to use them to their fullest potential, it is necessary to use them together in a framework.
Benefits of a Framework (Back to Top)
One of the key challenges in the development of software systems is planning for change. Open systems must be flexible in that they must easily adapt to new and changing requirements. The best way to support this requirement is to build systems from reusable components conforming to a "plug-in" architecture. The functionality of an open system can then be changed or extended by substituting or plugging in new components.
Software reuse has shifted from the reuse of single components (procedures, functions, classes) to the use of whole abstract system designs or architectures. A software system that may be reused at this level for creating complete applications is called a "framework." The idea is that it should be relatively easy to introduce specific functionality within a certain domain by employing the elements of the framework software.
Figure 2
A CORBA-based framework defines the interaction between objects and services (consumers and providers), independent of location. A well-designed interface between objects and services will provide just enough information about each object to facilitate communication, but will still provide sufficient flexibility to accommodate changes in the implementation of each object without the need to redesign the interface.
This point illustrates one of the goals of a CORBA-Java framework, which is to "maximize ignorance." This means that each object within the framework is ignorant of the implementation and location of other objects (data types and services) in the system, but understands the general "rules of engagement" for interacting with these other
objects.
For example, a DNA sequence object need not know about the details of specific analysis services, like BLAST and Primer 3, with which it may interact. It needs only know that there are objects in its universe called analyses; the specific parameters for interacting with these analyses can be discovered at run time. In this way, the specific nature of the analyses can be variable within the framework.
Similarly, the framework client should ideally be ignorant of the data types that are to be presented. The client need only know that data act according to a generic data object or specification already defined. In this manner, the specific data types need not be hard-coded into a client view and can be delivered on an as-needed basis. This provides maximum flexibility and is important in a field like bioinformatics where data types need to change rapidly.
The framework enables the invocation of available data types through the instantiation of objects, and the client learns of these object classes dynamically (at run time), via a CORBA method call (request for information) on a server object. This list of available object classes can therefore be varied and delivered to the client on a "just-in-time" basis.
Working in concert in a framework, CORBA and Java allow client-server systems to be built that can deliver a graphically rich user interface, capable of accessing multiple functions across a distributed computing network.
Using CORBA for Bioinformatics (Back to Top)
Used in a pharmaceutical research environment, a CORBA-Java framework can reduce the time bioinformaticists must devote to routine administrative and support functions. The framework's universal client, for example, enables scientists to do more of their own work by eliminating the need to manage file transfers.
The framework's ORB and wrapper technologies can assist the bioinformaticist by:
- eliminating the time-consuming tasks of reformatting databases each time a new version is released;
- minimizing the time spent maintaining complicated programs that analyze drug discovery data and databases; or
- permitting easy reuse of existing programs, scripts, and algorithms.
The following example illustrates the use of NetGenics' Synergy discovery software suite, built on a CORBA-Java framework, to search for open reading frames (ORF) in DNA sequences. As mentioned before, the key to a successful CORBA framework lies in the development of an interface using an IDL. The IDL for an ORF service would include information regarding suitable input parameters, interface type definitions, and suitable output attributes. Because there is no information that intimately links these input and output parameters with a particular search algorithm, any suitable ORF algorithm may be called with this IDL.
The Synergy CORBA-Java Framework (Back to Top)
The simplest use of CORBA connects one client with a single server, as shown in Figure 3. There are at least two major problems with this solution:
- The client holds the data, which makes data sharing difficult and data loss easy.
- The client "knows" too much about the server, including what computer it runs on, what services are available, and the interface to those services. Thus, any change to the server requires changes to the client, leading to maintenance nightmares in a rapidly changing system.
Figure 3
In a more flexible CORBA implementation, the connection between the service and the client is made indirectly. This indirection facilitates the addition of new services in the future. Clients and services have no knowledge of each other; instead, each talks to the project manager, which is able to communicate with both.
Figure 4
The project manager is also capable of managing the attributes of data objects and provides only the necessary views of those data objects to the client. For example, DNA sequence objects are managed by the project manager, and the view of the DNA object is displayed on the client. In this way, the client is freed from the task of data management.
The client and its services are distributed one step farther apart by the analysis manager. The analysis manager is a directory of available services categorized by function, much like a Yellow Pages directory. The analysis manager learns about new services dynamically: Services register with the analysis manager, informing it about the data types upon which they operate. The analysis manager delivers this information to the project manager (and the client) at run time. This architecture allows the project manager to concentrate on data management, and the analysis manager to concentrate on the analysis of this data.
The use of this analysis manager allows the addition of a second service, in this case BLAST, to supplement the functionality of the framework without any changes being made to the analysis manager, project manager, or client. The framework has separated the client sufficiently from the services so that the client is independent of changes to services. It is this property of frameworks that allows NetGenics to extend Synergy to provide a solution tailored to the specific needs of each customer site.
Figure 5
The analysis manager gives the system freedom to perform more complex manipulations of data, such as to easily create a service that acts as a client on two already existing services by performing an automatic BLAST search on all ORFs discovered in a DNA sequence. Again, this service is delivered transparently to the client through the project manager by the analysis manager. The client is sufficiently ignorant of the service implementations, in fact, that new services can be added while a client session is open; as soon as the new service registers with the analysis manager, it is available to the client in real time.
Figure 6
Because the components of a framework interact only through interfaces, it is possible for services to operate on data that is created dynamically. For example, the BLAST-ORF service will operate on an object that implements the DNA interface, whether that object exists as a persistent object in a project, as a member of a collection or as a transient object created by calculation. For example, it is easy to run the service on every sequence in a legacy ORACLE database, since this database can be modeled as a collection of DNA sequences, and the collections service allows iteration over a collection to retrieve each object.
Figure 7
As explained earlier, the CORBA IDL specifies interfaces, not implementations. The analysis engines used to perform particular services are not essential to defining the interface, so as new algorithms become available to perform a given function, they can be substituted inside the IDL wrapper program without interruption or loss of function. This allows the users of the framework to make decisions about the best algorithm, hardware, or programming language to use for a specific implementation.
In Figure 8, the BLAST server has been moved to a multi-processor SGI machine, with no other changes necessary in the framework. The client is aware that this change has taken place only because BLAST runs faster.
Figure 8
Since the CORBA IDL specifies interfaces, not implementations, the implementation of a service can be changed without affecting any other component of the framework. For example, the ORF finder that implements the ORF-service IDL can be replaced with another gene-finding program, such as GRAIL, without any other changes to the system.
In a similar manner, a particular data object type can be defined without regard to the size or source of the data within it. So, new data objects, such as protein structures, relative expression data, or collections of small molecules, can be added to the framework, which, while having little or nothing in common with the implementation of DNA sequences, are useful to drug discovery.
Figure 9
Summary (Back to Top)
A software framework can provide maximum flexibility for the needs of pharmaceutical research. The preceding example illustrated the key components of such a system built on a CORBA-Java foundation. Working in concert, these elements render a software framework that can easily be expanded and adapted to meet drug discovery's rapidly changing requirements.
Michael C. Dickson is the chief architect of the Synergy software framework and serves as NetGenics' Vice President, Product Development.
For more information: Michael Dickson, NetGenics, Inc., 1717 E. Ninth St., Suite 1600, Cleveland, OH 44114. Tel: 216-861-4007. Fax: 216-861-4777.
Class
The definition of an object; the prototypical version of some object. A class provides the template, and a class is instantiated to produce an object.
Component
An object that is reusable and can be assembled with other components to form an application. Part of a methodology known as "component architecture."
CORBA (common object request broker architecture)
An open standard proposed by the Object Management Group that defines a protocol for software objects to interact with each other independent of implementation language, underlying platform, or location.
Garbage Collection
A process of reclaiming the memory of objects no longer in use. Java implements garbage collection, freeing the programmer from explicitly having to manage memory.
Framework
A foundation for software development in which the basic components of an application already exist. A well-developed framework allows the developer to "fill in the blanks" to provide his or her own functionality. A framework promotes faster development because underlying features already exist and do not need to be rewritten.
Inheritance
The process by which an object is modified by changing or refining the behavior of an existing object. This is an important concept in object-oriented programming.
Interface
A contract that specifies behaviors an object must have. An object that implements an interface must actually provide those behaviors, but is free to do so in any way desired. This means that multiple objects can implement the same interface, and none of them need to implement it the same way.
Implementation
The actual definition of an object; an object realized in code.
Invoke
To call. To instantiate. An object that is invoked has been created; it exists as part of some active process.
Java
A platform-independent, object-oriented language defined by Sun Microsystems for network programming.
Multi-threading
The process of splitting the main path of execution into two or more separate, but concurrent, paths. A feature provided by Java.
Object
The entity that gives object-oriented programming its name. An object is an encapsulated set of data and methods (actions) that operate on that data. More specifically, an object may be a class that has been instantiated.
ORB (object request broker)
The core of a CORBA implementation. The ORB sits between a CORBA service and its clients, and handles the passing of requests and responses between the clients and a service.
Persistence
Saving the state of an object in such a way that its state can be later restored.
Service
An object that implements an IDL interface. In Synergy, "service" is often used to refer to an object that performs an analysis.
Wrapper
A layer around some entity or object, allowing it to be used in some environment for which it may not have been designed. Thus, a wrapper often functions as an adapter or translator. Wrapping legacy code, for example, allows it to be integrated into a modern system.