Biological Database Integration - New Approach
Author: Deron Eriksson
Description: Integration of Data from Heterogeneous Biological Databases using CORBA and XML - New Approach to Data Integration.
(Continued from page 2) 9.2.4. Client ApplicationThe client application in this demonstration is implemented in Visual Basic to run on the Microsoft Windows NT and Windows 2000 operating systems. Other clients to this data integration system can easily be constructed in other languages on other platforms, since access to the system is performed using the GET and POST methods. The prototype client is an executable called BioClient.exe consisting of several primary components on a Form, including the following:
The BioClient application includes several Label components not listed above. The Browser component is responsible for displaying individual query results. It is also responsible for displaying messages such as greetings and diagnostic information. It allows for convenient display of XML-based and HTML-based data, including graphics. ResultTree organizes all of the client application’s query results hierarchically, allowing easy perusal of acquired results. ResultTreeImages is a list of graphic icons used in ResultTree. The contents of the txtURL TextBox specify the URL of the SystemAccessServlet, and txtData lists the query string sent to the data integration system. The tabctrl tabbed control consists of two tabs. One tab specifies single database queries available via the client and the other specifies multidatabase queries. The particular query type to be performed is chosen by selecting the relevant OptionButton on the tabbed control. Additional information, such as an accession number, is entered into the appropriate TextBox. Clicking on cmdSubmitQuery executes the specified query. The DemoCheck CheckBox allows for sample values to be returned from the data integration system without performing an actual query. This allows for a demonstration of the client and data integration system in the event that the biological database to be queried is for some reason unavailable. The BioClient application after start-up is shown in Figure 9.4. Figure 9.4: BioClient Application ![]() When BioClient begins, the Form_Load() procedure is called. This procedure performs various initialization routines. All query results are stored in the file system in directories specific to each type of query. This allows for convenient storage and retrieval of the query results outside of the client application. If the query result directories do not exist in the same directory as BioClient, they are created: For i = 1 To numberoffolders If Not DirExists(App.path & "\" & folderstrings(i)) Then MkDir App.path & "\" & folderstrings(i) End If Next i The query result directories are EMBLFlatFiles, EMBLXMLFiles, EMBLAgaveEntries, PubMedAbstracts, and MultiEMBLtoPubMedAbs. A welcome HTML page located in the BioClient directory is loaded by the Browser: Browser.Navigate App.path & "\welcome.htm" Additionally, the ImageList of the ResultTree is set to ResultTreeImages. ResultTreeImages consists of graphic icons for a closed folder, an open folder, and a document. The folder Nodes of ResultTree are created using closed folder icons: With ResultTree.Nodes .Add , , "f" & folderstrings(1), "EMBL Flat Files", "closed" .Add , , "f" & folderstrings(2), "EMBL XML Files", "closed" .Add , , "f" & folderstrings(3), "EMBL AGAVE Entries", "closed" .Add , , "f" & folderstrings(4), "PubMed Abstracts", "closed" .Add , , "f" & folderstrings(5), "Multidatabase - EMBL Accession Number to PubMed Abstract", "closed" End With After the welcome page is loaded, the Browser_DocumentComplete procedure is called. This procedure is also called after query results are retrieved by the Browser. At this stage, the procedure simply ensures that the text displayed in the Browser window is of a convenient font size. A user specifies a query type by selecting the particular OptionButton for that query type. When the optEMBLflatfile OptionButton (labeled “Obtain flat file from accession #”) is selected, the optEMBLflatfile_Click procedure is called. Selecting an OptionButton enables any relevant TextBox to accept text for query data entry. As seen in Figure 9.4, selecting the optEMBLflatfile OptionButton enables the txtAccNum TextBox (labeled “Accession #:”) and disables the txtUID TextBox (labeled “UID:”) if it is not already disabled. Similar actions are carried out by selection of other OptionButtons. Clicking the cmdSubmitQuery button executes the selected query using the data entered into the relevant TextBox by calling the cmdSubmitQuery_Click procedure. The query data is used to form a query string. In the example in Figure 9.4 in which txtAccNum contains the value TRBG361 and obtEMBLflatfile is selected, the following query string is created: querytype=accessionnumbertoemblflatfile&accessionnumber=TRBG361&version=1&demo=no This query string consists of four name-value pairs. The querytype name is given the value accessionnumbertoemblflatfile, specifying to the data integration system that an EMBL accession number is being submitted and the corresponding flatfile is the desired response. The value of the accession number is passed to the system via the accessionnumber name. The version name specifies the current version of the client software. This version number can be upgraded in future clients and data integration systems so that a user can be forced into using a newer client release, since a particular version number may be required in order to use the data integration system. The demo name specifies whether or not sample result data should be used in the response of the data integration system to the client. It allows for a demonstration of the client application and the data integration system without actually querying any biological databases. In addition, the file name of the file to contain the query results is created based on the date and time at which the query is executed and an identifier such as an accession number or citation unique identifier, as in the following example for an EMBL accession number to flatfile query: thedate = Date thetime = Time filename = NewDateAndTime(thedate, thetime) filename = filename & txtAccNum.Text & ".txt" Following this, the query data is submitted to the SystemAccessServlet. The address of the SystemAccessServlet is obtained from txtURL. In Figure 9.4, the data integration system is run locally, and the SystemAccessServlet is located at: http://localhost:8080/examples/servlet/SystemAccessServlet The query string is appended to the servlet URL after a question mark, and the GET method is used to submit the query string to the servlet using the URLDownloadToFile function. This function saves the returned results to a file. URLDownloadToFile 0, txtURL.Text & "?" & QueryValues, fullpath, 0, 0 The query result is saved to a file at fullpath, which places the file into the correct result directory: fullpath = App.path & "\" & folderstrings(thedir) & "\" & filename After the query results are saved, the file is then loaded into the Browser: Browser.Navigate fullpath The Browser_DocumentComplete procedure is then called. A document node for the new query result is added to ResultTree under the proper query type: parentname = "f" & folderstrings(1) Set thenode = ResultTree.Nodes.Add(ResultTree.Nodes.Item(parentname), _ tvwChild, querytypeandfilename, thedate & ", " & thetime & ", " & txtAccNum.Text, "doc") thenode.EnsureVisible Figure 9.5 shows the client application following execution of an EMBL accession number to flatfile query. The query results are displayed in the Browser window and the resulting document is shown added to ResultTree. Figure 9.5: Result of an EMBL Accession Number to Flat File Query ![]() Figure 9.6 demonstrates the performance of a multidatabase query in which a PubMed abstract referenced in an EMBL entry is retrieved based the submission of an EMBL accession number to the data integration system. The abstract has been downloaded and saved to the file system, and the abstract is displayed in the Browser window. The query result document has been added to ResultTree under the “Multidatabase – EMBL Accession Number to PubMed Abstract” folder. Figure 9.6: Result of an EMBL Accession Number to PubMed Abstract Query ![]() As mentioned elsewhere, the EMBL accession number to PubMed abstract multidatabase query is performed via the parsing of the EMBL flat file to an XML representation using the EMBLParser. An EMBL accession number query can be saved in the data integration system’s XML representation , as shown in the query result in Figure 9.7. Tags clearly label the EMBL entry data in this representation. For example, the additional accession numbers listed in the query results (X56734 and S46826) can also be used to retrieve the same EMBL entry. Figure 9.7: Result of an EMBL Accession Number to XML Result Query ![]() Navigation between result documents can easily be performed using ResultTree. Double-clicking on a closed folder opens that folder, displaying the documents in that folder. Double-clicking on an open folder closes that folder. These actions are handled by the ResultTree_Expand and ResultTree_Collapse procedures, respectively. Clicking on a query result document in ResultTree brings up that document in the Browser window. This is accomplished by a call to ResultTree_NodeClick in which the Browser navigates to the correct document file in the file system: Browser.Navigate App.path & "\" & Node.Key Closing the client application does not remove the query result documents from the file system. These results are saved in the file system so that they can be analyzed at a later time. (Continued on page 4) |