Phonetic search with a high-availability system on Linux cluster
Bürgel Wirtschaftsinformationen GmbH&Co. KG, Hamburg
The pilot project
Bürgel, a leading provider of economic and business information in Europe, expected a strong growth for its business unit 'Private Households'. The required increase in system performance would have required a significant upgrade of the mainframe’s computing power and storage capacities. As these upgrades for the host involved large capital expenditures, a feasibility study regarding the realization of a totally new search and selection algorithm accelerating the computing operations was conducted in advance. Instead of the traditional match code search, the new algorithm used methods of text normalization and phonetization. All search operations were to be conducted in the main memory. Subsequent to initial successful trials, employees of cimt AG implemented a prototype, which was developed using the open-source programming environment Linux.
Benefits
During the next steps of designing and further development, the algorithm could be optimized, which significantly increased the quality and speed as compared to the previous method. Hence, the customer decided to separate the search function and transfer it to a Linux-based cluster.
By extracting specific application components, it was possible to cut down costs for additional host computing power. Due to the modular architecture and the algorithm being independent from data, it is possible to include new functionality with little expenses for further adjustments while the separation allows for further downsizing of the mainframe.
The use of a high-availability cluster guarantees the provision of a high-performance and flexible application base that ensures not only suitable system reactions to peak loads but also applications working at almost unchanged response times if up to 50% of the system and network components should break down.
Drawing on open-source solutions for the implementation of the central functionality cuts down the costs of future adjustments and enhancements. As the source code is compatible with different platforms (e.g. Linux, Unix, Windows), it is possible to realize a heterogeneous system architecture within the cluster. With regards to speed and reliability, using Linux as the system’s base was preferred.
The follow-up project
Right from the beginning, a high-availability cluster together with multi-redundant systems were used in order to allow for enhancements of the running systems. After implementing the pilot application on all server nodes, the cluster nodes are currently running at a load of 1% (obviously due to an increasing number of customers). After realizing significant improvements and making additional resources available, all application components using traditional methods on the mainframe are checked with regards to their adequacy for transferring. Using prototypes, possible improvements regarding the search results as well as reductions of search times could be identified for the following fields:
- Searches for legal entities/enterprises.
- Combined searches for links between individual persons and enterprises.
- Validation of entered address data.
Especially with regards to validating address data, the system transfer leads to response times which are ten times shorter while improving the automatic address revision. This quality increase has a direct impact on the quality and speed of customer service.
Technical details
Implementing the mentioned application components did not require any extraordinary efforts. Many of the components, which had been developed during the pilot projects, could be reused if enhanced and adjusted to the respective function. By creating update procedures, it is guaranteed that the data in the main memory is kept up-to-date. A redundant database now supports the complex process of validating and revising address data. Another crucial point was to design the system and applications with regards to the requirement of high availability. Independent from the expected load, the applications needed to be running with continuous accessibility and speed. Based on these requirements, the system was designed as Load Balancing Cluster. The application specific redundancy was completed by backup solutions for all network components. This concerned all cables, routers, and switches. The whole cluster infrastructure was realized by two physically and logically independent networks, which allow for a transparent failover in case of components breaking down. Since commencing operations, the system achieved an availability of 99.9995%.
Project data
Status: operating/Implementing enhancements
Duration: about 1 year
Dimension: 3 consultants, 2 customer’s employees
Perspective: operating since 04/2002
Information technologies
- IBM Mainframe
- Linux
- GNU-C-Compiler <
- ANSI C, Perl
- Multithreading, Multiprocessing
If you would like to receive further information or a non-binding consulting, please contact Golo Lentz (+49 40 533 020).
