THE NATURAL DYNAMICS OF DATA

CHALLENGE - SHARING LEGACY INFORMATION

We identified XML as a standard for exchanging disparate information types

THE PROBLEMS

MONTGOMERY, Ala. - The military must jettison its long-standing concepts regarding information ownership and integrate information technology systems to enable the military to fight jointly, according to two high-ranking generals.

"Our legacy systems just don't talk to each other," said Gen. Lance Smith, commander of the U.S. Joint Forces Command. "And the reason for that is somebody thought that there was data that was unique to them. We have to build a culture that is gathering that kind of information and making it available to commanders in the field."

Smith called for the military to agree on an enforceable, integrated data strategy. The military services have set up data fiefdoms to protect their information, Smith said. "We don't have good standards out there," he added. "We have to agree on a common data model, which is our responsibility."

(Reported in Federal Computer Weekly - 21 Aug 06)

THE BACKGROUND

Openkast has looked at the issue of legacy information integration projects and come to the following conclusions:

We can find no evidence of any major legacy integration project in defense, intelligence or law enforcement or other areas of Government that has delivered its initial purpose on time and on budget. Indeed, costs are invariably significantly more than estimated.
We recognise that replacing all legacy systems with new solutions conforming to one data standard is not an option either in terms of cost of development, the re-training effort required or in terms of sheer common sense.
The use of the Relational Database as the standard vehicle to deliver such a solution fails because of the nature of the science behind relational technology. It is all about pre-defined rigidity and planning for any access path. The real world is not like that!
The cost of manpower working on legacy integration projects far exceeds any other financial aspect, it is the biggest risk and has no past cost justifiable successes to which to point.
Many existing legacy suppliers have no incentive to solve these problems as, in doing so, they would lose service, licence and support revenues associated with protecting their client`s heritage systems.

We identified XML as a standard for exchanging disparate information types.However, using RDBMS technology to host complex XML data structures is highly inefficient.

IBM has recently announced a development effort to add native XML database capabilities to its DB2 technology but recognises the "accepted" problems with XML data management:

- Data bloat (Huge extra disk resources plus the concomitant management resource required

Measurements have been performed on sample Openkast test data sets. The test data has been generated using a test tool that creates a stream of transaction records that represent concurrent user activity.

The total size of all the transaction records was 310Mbytes of XML text (including mark-up and content). The total number of records was 500,000. Once stored and encoded in Openkast the size is reduced to 182Mbytes. This gives a reduction down to 60% of the original XML data size.

- Slow update performance of the data base

The acquisition and storage rate was measured as 14 minutes for 100,000 transaction records or 119 records per second. The host system used for the measurements was equipped with a 1.6GHz Pentium CPU and a 5400rpm disk drive.

Retrieval performance is based on the query and content. The tracker allows free text queries to complete very quickly.

Best case searches are those that can be completed by processing the query against the tracker and do not require further processing. Typically the tracker will be cached in memory after several searches and the search will produce a result within milliseconds on a repository of several million records

THE SOLUTION

The XML silver bullet has problems which we have addressed with a technology called Openkast.

We have understood from the beginning that information integration is all about collections of databases with many schemas. Openkast has only one index mode. Total indexing. Full text plus full context, full path index.

This is achieved without the penalties of disk usage explosion and painful transaction throughput rates normally associated with XML full indexation.

THE CHALLENGES

Cross-Agency Collaboration

Sharing Legacy Information

Restricted bandwidth