A Layered Architecture based on Java for Internet and Intranet Information Systems
In A Layered Architecture based on Java for Internet and Intranet Information Systems, this we present an architecture for building Information Systems that can be adapted to several runtime environments. Our architecture is structured in three layers: client, services and data layers. All the layers are independent, making the system more flexible and scalable. The core of the IS is implemented in Java to make possible platform and database independence. One of the most popular Spanish Internet Search Directories: BIWE (http://www.biwe.es), implements this architecture.
Since the World Wide Web started at CERN in 1991 its growth has been incredible, owing to the large amount of heterogeneous information (i.e. personal Web pages, research publications, etc) that can support. One of the main factors that has contributed to the growth of the Internet have been the Information Systems (IS’s).
An IS is an organized combination of people, hardware, software, communication networks and data resources that collects, transforms and delivers information . The importance of handling the information has forced many enterprises and organizations to use these systems (IS´s) to manage their Intranet and Internet services.
The IS’s solve the problem of the continuous information updates, by the dynamically generation of the information, avoiding the impression of obsolete information. Nevertheless, an important characteristic of current IS’s is the possibility of adaptation to several runtime environments. This characteristic lets the same IS be valid for Internet and Intranets, and even for very dynamic environments where the number of users, the amount and type of information is continuously changing.
Here we present a multi-platform architecture for IS’s that allows system adaptations, creating a flexible and scalable IS. Also we present an implementation example of the architecture in a Spanish Internet Search Directory called BIWE.
In the next section we describe the objectives we want to obtain with this architecture. In Section 3 we explain the different layers of the proposed architecture to describe in Section 4 the implementation. In Section 5 we describe the industrial benefits of our architecture and in Section 6 we show several configurations for the implementation BIWE. Finally we expose the conclusions obtained from the implementation of our architecture.
We believe that an IS suitable for different and changing environments should address the following basic requirements:
• Platform independence: The system should support different Operating Systems (OS) in order to obtain an IS adaptable to any kind of environment. This will allow, for instance, run a low-loading service on an ordinary PC running Windows, while a highloading service could run on a high-performance UNIX server.
• Database independence: Directly related to the previous requirement, to achieve a total independence, the IS should avoid any restriction with the database management system used in each environment. Both requirements are useful when the system is purchased by third-party entities since the system will seamlessly run in the environment of the buyer entity (consider environment equivalent to operating system and database management system).
• Protocol communication independence: An IS should be designed to use any communication protocol with clients and any language as user interface. Therefore, an IS could choose the more suitable user interface for each environment.
• Extensibility: The system architecture should give support for adding easily new services and features. Also, the IS should allow its users to develop their own services (apart from the existing services), without knowledge about the internal working of the system, just using open frameworks on which all the services are based.
• Scalability: The IS should provide a layered architecture that can be distributed in different ways depending on the system requirements and the network configuration of its environment. This will let the system give a better performance to different environments and be prepared to support the changes of the system requirements (think that in Internet it is usual that a successful service could increase its number of users in a factor of 10 or even more).
3. Description of the architecture
Nowadays the most commonly used architecture for the development of IS’s in Internet is a three-layer architecture . The proposed architecture is built on a data layer, a services layer and a client layer. The connection between the layers can be done through a network, it does not have to take place in the same machine. In the Figure 1 we can see a diagram of the architecture with its the different layers and interconnections.
Figure 1: The high-level architecture diagram
The data layer includes all the elements of the system that store data in a physic device. The data layer is composed by a database management system, although any other storage system that can interchange information with the next layer according to the specified interfaces could be used. The tasks of this layer are to store, update and retrieve data, creating a central point for any data access. The connection with the services layer is a
key issue because the attainment of total independence between both layers is an aspect very important. The independence between these layers means that the IS will be able to change the database management system (the core of the data layer) without any changes in the services layer. There are many different types of interfaces between databases and services (ODBC, JDBC, PL/SQL, etc.) but we will deal with this subject in the next section.
The main task of the services layer is to provide all the services requested by the client layer using, if it is necessary, the data layer for accessing stored information. The services layer is the more complex and important part of the architecture, from the point of view of the independence among the other layers. The elements of this layer are a Web server, a set of services, a Data Access Framework and a Graphic Interface Framework (see Figure 2).
Figure 2: The services layer diagram
The services are provided by a Web server, (the better way to offer services over Internet or Intranets), that is the interaction point with the client layer. The Data Access Framework is the single element of this layer that interacts with the data layer, so it is the central point for any access to information stored in the database. Furthermore, this element isolates logic changes of the database to the rest of services layer to obtain transparency with the database. The Graphic Interface Framework must generate the HTML code that is sent to client layer over Internet. For this purpose, this element is divided in two components: a basic component that generates the HTML code, and a more complex component that takes data structures from the data layer and uses the basic components to generate a graphic interface to be used by all services. Using this functionality, this framework, which for a Internet IS uses HTML to generate the graphic interface, could generate any other type of communication language suitable for other kind of system (i.e., PDF), and the rest of the services layer would not need any changes. Of course, the client layer should be able to interact using this new standard of communication.
Finally, the client layer interacts with the services layer and it is the interface between users and services. In our system, this layer will be composed basically by a Web browser that allows users to see the results of their requests to the different services. This layer would be present for all the users, but at the same time is the simplest layer. In any case, this layer can be more useful if it could be able to execute services that come directly from the services layer over the network. This means that the client layer will have a part of the services layer running on its same machine. But not all the services will go to this layer, just
specific services that can be executed there. These services will not need any interaction with the Graphic Interface Framework since they are just applications running in the client side, but they may need to interact with the Data Access Framework. This interaction can be done by two different ways: taking the Data Access Framework to the client layer (so the Data Access Framework accesses the data layer through Internet) or interacting with the Data Access Framework directly through Internet . In the Figure 3 we show the first configuration for the client layer.
Figure 3: The client layer diagram
From a generic point of view, the main advantage of this architecture is the flexibility provided by the independence between the different layers. This means that the core of the IS could be located in several computers (we will see an example of this configuration in section 6). And moreover, this flexibility also improves the robustness of the system due to services and/or data can be replicated in different computers, obtaining a fault tolerant system. More specifically, this architecture forces the services to access the data layer through the data access interface, which makes possible to perform many changes on the database without having to change the services, just the Data Access Framework. In the same way, as the services use the Graphic Interface Framework to generate the graphic interface, the client layer could use any other type of language instead of HTML only having to change this framework. Another advantage of this architecture is the fact that the data access and the graphic interface components are both frameworks, so developers can design new services without having to know anything about the data and the client layers. Moreover, external users could develop their own services using these frameworks distributing the service layer over many different servers.
At this point, we summarize the main implementation decisions. Our decisions point at achieving OS and database management system independence.
In the data layer a relational database has been used, and the independence of the database access has been achieved using the JDBC 1.0 API. For the implementation of the Internet Search Directory BIWE, we have used Oracle 7.3.3 database management system and the Weblogic JDBC1.0 driver for Oracle.
The services layer has been written in JAVA. The services have been mainly implemented as JAVA servlets instead of traditional CGIs. All the system administration is centralized in a JAVA applet and some administration services have been implemented as standalone JAVA applications. Now, we detail the reasons for the decisions above.
JAVA language was chosen mainly because of its platform-independence feature. Other JAVA features that were considered useful for our implementation were the following:
• JAVA is an Object-Oriented language. This allows an easy and natural implementation of our object-oriented architecture.
• Multi-threading support in JAVA is very flexible, easy to program and of good efficiency for our purposes.
• JAVA is quickly becoming the “de facto” standard for building distributed applications on Internet. There is already an important number of developed and on-going efforts for enhancing the development of distributed applications in JAVA. JAVA servlets were chosen as the main way to implement front-end services because of their advantages over traditional CGIs:
• Servlets are a standard way of executing server-side JAVA programs. So with them we achieve both platform independence and web server independence (as long as the main web servers in the market already provide servlets support).
• Servlets allow multi-threading. They are loaded in memory only once and launch a different thread for each request, which is more efficient than the execution of a CGI.
We chose a relational database system because it provides a reliable, efficient and robust access to data along with facilities for creating secure infrastructures.
For accessing to the relational database, JDBC has been used to achieve platform and database independence. Nevertheless, it is important to point that, like other database independent methods for accessing data (as ODBC), JDBC can be slower than native database-dependent methods. So, for services where efficiency could be critical we recommend establishing a test planning to know whether the difference is actually relevant in that specific case or not. In our system, the results obtained of the tests showed that the difference was irrelevant for our system and it also let us presume that in very few systems it could make a difference.
In any case, our object-oriented design would let us seamlessly create child classes of the currently used classes for accessing the database, in a way that those services which needed to use native interfaces to the database could do it without changing the programming interface.
5. Industrial benefits
An IS based on this architecture can be implemented in different Operating Systems and can also use different database management systems without any changes on the system. This means that the system can be installed in any enterprise using the hardware and software available. At the same time, the system could start with a reduced investment, and later, using its distributed characteristics could increase the performance with a light improvement of the equipment.
Another important benefit of this architecture is that users (or customers) can develop their own services using the open frameworks of the IS. This implies that third-party entities can adapt themselves easily the IS to its new requirements.
All these benefits mean that the IS’s developed using this architecture are immediately exportable to third-party entities (enterprises, institutions, etc.) because of their easy adaptation to any environment and to new requirements.
6. Architecture Configurations
Two main elements of our Internet Search Directory are:
• It can work with any database management system with an available JDBC driver.
• It can work with any Web server with JAVA servlets support.
These features along with the capability of distributing the architecture layers in one or several computers, make possible a lot of valid configurations. Now we detail two of them we have implemented and tested on our Search Directory.
6.1 Configuration I: 2 layers – 1 computer
This is the simplest configuration, where the data layer and the services layer run on the same machine. We are using a Sun Microsystems Ultra Sparc 10 with one processor running at 300 MHz and 256 Mbytes of RAM. From the software point of view, the OS is Solaris 2.5.1, we use Oracle 7.3.3. as database management system and SUN Java Web Server 1.2.
Figure 4: The configuration I diagram
This is the configuration that is being used now in the production system. Though the system answers more than a million queries per month and that we are using low-cost hardware, the system keeps fast response times, even with high load situations.
The main advantage of this configuration is:
• The reduced cost and the high performance obtained. The disadvantages of this configuration are:
• The reduced ability of the system to grow once the maximum capacities of the computer are reached.
• The Internet Search Directory is low protected since all the information system core layers are exposed to attacks through Internet.
6.2 Configuration II: 1 layer – 1 computer
This configuration tries to make use of the characteristics of the architecture proposed. The main aspect of this configuration is that the data and the services layer are in different computers, and also in different operating systems.
In this configuration the data layer is placed in a Sun Microsystems Ultra Enterprise 3000 with two processors at 167 MHz. and 256 Mbytes of RAM with the operating system Solaris 2.5.1. From the software point of view, this layer uses the same database management system used as in the previous configuration, Oracle 7.3.3.
The services layer is placed in a less powerful computer, a Pentium with one processor at 166 MHz. and 64 Mbytes of RAM with Linux as the main operating system. In this case, the software requirements are different from the previous case. Now we use the Apache Web server and Jrun Servlet Engine to support all the services developed as servlets.
Both computers are in the same LAN.
Figure 5: The configuration II diagram
The purpose of this configuration is to divide the workload of the Search Directory. This is the reason why the database management system is located in a powerful computer: the main workload of the system is located in the database access as result of the multiple information requests done by users at the same time. On the other hand, the services layer processes the results obtained by the data layer, so its requirements are fewer.
The advantages of this configuration are:
• The performance obtained by the Internet Search Directory is much higher.
• The system is more secure because it is possible to install a firewall that just isolates the computer with the service layer, keeping the data layer in a secure environment. The main disadvantage of this configuration is:
• The system is not as fault tolerant as in the previous configuration, because the correct functioning of the Search Directory depends on two different computers and the connection between them.
We have presented an architecture for building IS’s adaptable to different environments. Our architecture is structured in three layers and aims to provide with scalability, extensibility, efficiency and platform independence (both, OS and database independence).
Our first working system following this architecture is BIWE, one of the most popular Spanish Internet Search Directories. BIWE has been implemented in JAVA, using JDBC 1.0 for accessing to an Oracle 7.3.3 database. JAVA servlets have been used for building services on the services layer.
Using this architecture and implementation we construct two different configurations for BIWE. The first one is simple and of low cost, but at the same time has a very high performance. The second one uses the advantages of the architecture to make a more distributed and efficient system, obtaining a high performance improvement, showing how easily the performance of the system can be increased.
Future researches around the architecture proposed will try to distribute the different layers of our IS. We will start distributing the data layer, then the services layer and finally we will distribute both, the data and services layers.