Biological Database Integration - Computer Technology
Author: Deron Eriksson
Description: Integration of Data from Heterogeneous Biological Databases using CORBA and XML - The Role of Computer Technology in Bioinformatics.


Page: < 1 2

(Continued from page 1)

4.3.1. Relational Databases

Relational databases are the most predominant type of database used today. The relational database model was proposed by Dr. E.F. Codd in 1970. This model consists of a group of relations, operators to operate on those relations, and integrity constraints. A relation is a two-dimensional table that stores data, and a relational database contains one or more tables. A row in a table is known as a tuple or a record, and it represents a data set concerning a particular entity. A column represents a particular attribute of an entity, and a field occurs where a row and column intersect. A primary key is a column or group of columns that can uniquely identify each tuple in a table. It is possible to relate data in different tables through the use of foreign keys. A foreign key is a column or group of columns that refers to the primary key in another table or even the same table.

SQL (Structured Query Language) is the standard query language used to query and manage data in a database. The SELECT statement is used for retrieving data. The DML (data manipulation language) statements of SQL are INSERT, UPDATE, and DELETE, and these are used to insert, modify, and remove rows from database tables. The transaction control statements are COMMIT, SAVEPOINT, and ROLLBACK. These statements are used to govern database transactions. The DDL (data definition language) statements are ALTER, CREATE, DROP, RENAME, and TRUNCATE, and they allow for tasks such as creating and removing tables. The DCL (data control language) statements are GRANT and REVOKE, and they are responsible for controlling database access rights.

An example of a possible relational database table for holding biological nucleic acid sequence data is shown below:

Table Name: SEQUENCE_TABLE

ID DESCRIPTION ORGANISM SEQUENCE
TRBG361 …non-cyanogenic beta… Trifolium… aaacaaaccaaatatggatt…
HSIGHAF Human Ig gamma3… Homo sapiens cctggacctcctgtgcaaga…
MM12341 …mRNA for TL antigen Mus musculus atgaggatagggaccatggt…

This table is called SEQUENCE_TABLE. Each row, or tuple, in the table contains data related to a sequence entry. There are three row entries in this table. There are four attributes, or columns, in this table. The first attribute is ID, which is a unique identification number for each sequence entry in the table. The ID attribute would be the primary key of the table, since it uniquely identifies each row. The next attribute is DESCRIPTION, which offers a verbal description of the entry. The ORGANISM attribute indicates what species the sequence comes from, and the SEQUENCE attribute contains the actual sequence of nucleic acids. The ‘…’ entries in the table indicate additional characters are present but not shown.

Data in the table can easily be queried using the SQL SELECT statement, as in the following example:

SELECT ORGANISM
FROM SEQUENCE_TABLE
WHERE ID='HSIGHAF';

This SELECT statement would select the organism attribute from the table from the rows in which the ID attribute is equal to the given alphanumeric identification number listed on the WHERE line. This query would generate the following response:

ORGANISM
------------------
Homo sapiens

As another example, we could ask the database for all the IDs for sequences that begin with ‘a’:

SELECT ID
FROM SEQUENCE_TABLE
WHERE SEQUENCE LIKE 'a%';

This would generate the response:

ID
---------
TRBG361
MM12341

Databases used by the data integration system are typically queried via web servers using CGI programs or other similar technologies. A CGI query may take resemble a URL plus a query string such as:

http://www.mybiodatabase.com/cgi-bin/getorganism?MM12341

In this imaginary query, a CGI program called getorganism is executed with the query string MM12341 on the web server at mybiodatabase. The purpose of this program is to retrieve the organism specified by the ID number MM12341. If the program is a C++ program, it might use an ODBC driver to query the database using a SQL query such as:

SELECT ORGANISM
FROM SEQUENCE_TABLE
WHERE ID='MM12341';

The CGI program can obtain the result of the query and then return this result to the user, possibly formatted in HTML for viewing in a web browser.

Thus, if the database being queried is a relational database, although a query such as a call to a CGI program may not resemble a SQL query, the underlying actual access of the database may involve a SQL query.

4.3.2. Object-Oriented Databases

In some ways, the relational model is similar to the use of objects in object-oriented programming. For example, a table is similar to a class, and rows in that table are similar to objects, or instances of that class. Columns are attributes of an entity, much like data members are attributes of a class.

Although these similarities exist, relational databases are not object-oriented in a programming sense. However, object-oriented databases do exist in which objects are stored in databases. Object-oriented databases are also called object databases. ObjectStore is an example of an object-oriented database management system (OODBMS). Object-oriented databases can be used to store objects from object-oriented languages such as Java. They combine many of the powerful features of object-oriented programming and design such as inheritance with many of the benefits of databases such as transaction management and concurrency control.

The use of object-oriented databases is less prevalent than relational databases for a few reasons. To begin with, relational databases have a long, proven track record, whereas object-oriented databases are newer on the market. Relational database management systems follow the relational model and generally follow the SQL standard, whereas OODBMSs lack this standardization across different vendors. However, the ODMG standard of the Object Data Management Group may help standardize object-oriented databases across vendors so that tasks such as database migration from one vendor to another become easier. Object-oriented databases store data in more complex formats (objects) compared to relational databases. This complexity adds powerful features to the database. For instance, it allows for methods to be used in conjunction with data members in an object. However, this power is often not needed and is not worth the added complexity, since the relational model works well for many types of data and is well understood.

It should be noted that object-oriented databases can be useful in cases where complex data are needed, for instance for computer-aided design (CAD) applications and multimedia data.

4.3.3. Object-Relational Databases

Object-relational databases combine relational database technologies with object-oriented technologies. Relational databases allow for complex querying to be performed on simple data in tables. Object-oriented databases allow for simple querying to be performed on complex data types. Object-relational databases allow for complex querying to be performed on complex data types. This can be accomplished by extending relational databases by adding complex data types. Oracle 8 is an example of an object-relational database management system that has added object capabilities via cartridge modules. Oracle 8 extends the capabilities of Oracle 7, a relational database management system.

Page: < 1 2