Digital Equipment Corporation
MULTIVENDOR CUSTOMER SERVICES
Integrated Document Imaging and Retrieval
Ontario County, New York
June 27, 1998
Table of Contents
Document Imaging Selection Criteria and Assignment*
Solution Implementation Examples*
Recap of Project Critical Path*
Established in 1789, Ontario County encompassed all of western New York State from the Pre-emption Line to Lake Erie and from Lake Ontario to the Pennsylvania border. Now fourteen separate counties, the early history of this vast area are retained in the records of the Parent County.
As a result of 209 years of activity and record keeping for what has become a population activity base of approximately 93,000 persons and 1,200 county employees has resulted in a crush of paper documentation that must be stored, indexed and retrieved. Queries come into the county offices from county employees, historians, county residents and anyone who wishes to exercise their legal rights under the Freedom of Information Laws.
State and Federal Statutes regulate the kinds of records that must be kept and for how long. The State Education Department, State Archives and Records Administration publish these requirements for counties in the "Records Retention and Disposition Schedule CO-2".
The Ontario County Records, Archives and Information Services (RAIMS) department is the agency primarily responsible for storing, maintaining and creating indexed access to the documents that must be kept by the various county agencies. RAIMS not only stores documents that must be kept in their paper form, but also performs microfilming of documents to help speed retrieval, preserve document integrity and provide storage space compression for documents that do not need to be kept in their paper form indefinitely.
Unfortunately, RAIMS does not have access to unlimited, secure and climate controlled space to continuously store documents. Building needed space on an as needed basis would be fiscally challenging and perhaps irresponsible. At current rates of document retention and storage, the RAIMS facility will fill before the end of calendar year 1999.
Over the past few years, RAIMS has also noted a dramatic increase in the requests for record retrievals. Retrievals increased from 32027 to 38,861 retrievals in 1997 resulting in 6834 additional retrievals. Assuming an average retrieval rate of 7 minutes per request, this added a workload of an additional 797 hours or 106 person days in just one year. Extrapolation of this data would indicate that retrieval rates would exceed 300 person days by calendar year end 2000. The following table reflects the activity from 1991 to 1997:
RAIMS activity and workload
Source: 1997 Ontario County RAIMS Annual Report
As a result of the regulatory burdens placed on RAIMS, the increasing population of Ontario County, the rapid depletion of storage space and the ever increasing demands placed on RAIMS staff for record management and retrieval, it is apparent that that a change is needed. An electronic method of record retention and retrieval is needed to ease the burden of management, storage and retention as well as provide a faster more efficient method of allowing persons to obtain the documentation that is of interest to them.
It is the purpose of this study to examine the state of technology that is available, to map the possibilities into the needs of the county and provide Ontario County with a concise set of recommendations for moving forward in solving their most pressing problems.
Defining the measures of a successful project are critical to the ongoing health and acceptance of any system installed into an enterprise. Generally, the success drivers are encapsulated into Business Rules and Technical Requirements.
Business rules define the overarching day to day processes that need to be supported or streamlined. These rules describe the bounds in which a solution must be developed for success.
Technical requirements define the minimum technical capacities, operational and interoperability specifications.
The following describes in detail the broad scope of the requirements that Ontario County places on a system to consider it a viable as well as provide drivers toward a successful implementation.
Paramount is the requirement that any imaging or document management system put into place does not also require any of the County offices to also undertake an major re-engineering of it's business. The goal of document imaging is to compress the amount of space required to store data and to speed document retrieval, not to reinvent county processes.
A solution must be flexible and scaleable in both directions. Since many county departments do not need a full-scale imaging and document management system, it is critical that component pieces of an overall solution be available to the different agencies on an as needed basis. Larger departments that need more robust solutions should also have at their disposal a more complete solution.
Each department must also be assured of the integrity and security of the data that it is storing online. This entails having a system that allows for data partitioning, security and audit. Not only should non-authorized users not be able to view data or images, but they should also be unaware of the existence of such data or of even the system.
Images must also be made available in a variety of ways to meet the needs of any particular department or user. Retrieval methods can range from high speed LAN access, dialup connection to a RAS server or Intranet site, CDROM publishing or local fax-back and print services.
Finally, it is required that end users do not have to go through extensive training to become adequately proficient using the imaging client software.
Document Imaging Selection Criteria and Assignment
With the installation of a new system, questions arise as to which documents need to be stored online and which can or should remain as paper.
A clear case of those that should most likely remain on paper is current existing records of non-historical importance. Imaging activity should proceed from a point forward in time, and collecting critical older documents of historical importance on an as available time allows basis.
The newly generated documents can then be imaged into the system, including the existing working open files in the various departments. Some documents may not need to be saved in either form, and it is up to the individual worker to purge a file before submission to an imaging work area. Additional legal requirement may dictate that certain documents be always available in paper form. Even in this event, long term optical storage is an advantage. Optical storage is defined as a storage media such as CDROM or DVD and in certain cases large format Optical Platter technology.
Imaging these documents should and can be economically done by in house staff.
Solution candidates should have a technology architecture that is "open" in nature. That is to say that data storage methodologies, hardware components and software must be readily accessible and "off the shelf." It is also critical that it is a fully integrated solution, with all the component pieces working together out of the box requiring no coding to integrate.
The preferred operating platform is Windows NT Server. The ISV direction must include a strong commitment towards Windows NT.
Database components must be mainstream databases such as Oracle, Microsoft SQL Server, Sybase or Informix. Proprietary databases that are built to service the needs of the system often times provide better performance, but the all-purpose databases listed above provide very acceptable service and performance. They also have architectural openness that is well understood by the technical communities. These databases also have well built and widely tested API's that the proprietary databases cannot claim to have.
It is also highly desired that the solution software have a published API so that County IT staff can add imaging functionality to their own applications. This API must be accessible from Microsoft Visual Basic to be useful.
It will be necessary to provide image data in a variety of ways. The system should be able to publish data via CD's, as well as across WAN, dialup connections and browsers. End users also need to have the choice of how to retrieve the document: via a local printer or fax back.
Image capture must also include the ability to dump to film and track which film batch and roll each image is copied to. This integration of film indexing along with the image indexing for live retrieval is important so as not to have to manually recreate data that can be captured and recorded in an automated fashion.
Image capture stations must not be limited to a few proprietary devices. A wide variety of devices must be supported ranging a variety of high-speed stations to manual feed bed scanners.
Data storage should be hierarchical in nature and be able to manage and point users to the locations of offsite CDROMS holding that data as well as allow over the line data access. The system should be able to track published data to the field and be able to purge data from the system periodically.
Some departments will not need Optical storage provided to them outside of a CDROM being published to them periodically. The indexing system should be able to point the user to a CD on their premises and allow them to look up that data locally, view the image and print or fax it as well. Published CD's should also be able to be used in a fashion that is completely independent from the online imaging system.
The system should also be able to publish data to a local hard drive for easy transport and research purposes. This activity should be auditable.
A flexible solution will meet the following criteria. It will support the basic requirements of an entry-level system, allows the integration of other types of documents such as Word, Excel, WordPerfect and mainframe COLD data. It will be developed under client/server architecture to offer scalability, and lastly, operate like a mainframe solution by handling large volumes and supporting traditional terminal based retrieval clients.
Security is also a primary concern. The system should have a robust security system built into it or be able to leverage the NT security model to provide log on oriented security grants and revokes over data. It should also be able to work behind a firewall and integrate into the county's security infrastructure without modification or additional security shims to be put into place.
During the course of the investigation of Ontario Counties processes and imaging needs, five different departments were interviewed: RAIMS, Treasury, Department of Social Services, County Attorney and the County Clerk. The County Clerk's Office has stated publicly that it wishes to proceed on it's own and separate from the efforts of the County as a whole.
Records and Archives are primarily concerned with the storage and retrieval of documents for individual departments and outside researchers. Although the documents are not all owned by RAIMS, they are stewards of these documents and are expected to care for the storage and careful indexing in a way that facilitates county business.
RAIMS is running out of space and retrieval rates are putting an ever-increasing strain on staff capabilities. RAIMS needs an Imaging system that meets several basic needs.
First, a candidate solution must be able to cleanly integrate with the filming operations. In fact a single scan which sends output to film as well as optical storage is highly desired. Scanning work must be able to proceed quickly, with much of the resulting paperwork scanned being shredded quickly.
Second, end users can begin to execute their own requests for document retrieval via a remote location. This would free up RAIMS staff to keep up with the flow of paper into the system as well as convert important historical records to the optical system.
Third, the system should not be difficult to learn. Basic functions and high-speed productivity needs to be easily achievable.
The Treasury is responsible for all money flow in and out of the county. It handles all AR, AP and revenue collection as well as internal business and payrolls. The Treasurer's Office handles 28 different county funds. Many of the transaction records are kept 10 years by rule. The data that the Treasury keeps is open to the public via the Freedom of Information Act, but there are proper channels that individuals must take to obtain that data.
Often, requests are made for source documents that make up a journal transaction. Currently, those source documents are indexed and sent to RAIMS for filming. A report of roll and frame for the documents are sent back to the Treasury and this data is then added to the index. When a request for a document is made, the journal entry is then referenced to find the indexed location of that document. When found, the request is made to RAIMS to send over a fiche or a printed copy of the film frame needed.
The Treasury would like to simplify the record lookup and retrieval, as well as speed the process.
Department of Social Services
The County Department of Social Services (DSS) is by far the largest paper producer in the county. DSS handles a very wide variety of matters ranging from Protective Services to Food Stamps, Child Support and Medicare.
The basic unit of work handled by caseworkers is Case. A case is associated with a person or perhaps a family. Each case may have several types of documentation, and each person may belong to several cases. For example a person could be receiving Food Stamps and have another case in the Child Support unit.
DSS involvement in this study was initially to compress the amount of paper at the close of a case through an electronic archival process. It became readily apparent that DSS caseworkers could dramatically benefit from placing documentation online much quicker. This would allow very fast document retrieval and searching capabilities. It is also desired that a system be put into place that allowed caseworkers to see what other services the recipient was receiving from the county.
The County Attorney's Office generates a great deal of documentation. It would like to be able to create a knowledge base of past matters that are searchable and readily obtainable.
The basic unit of work at the office is the Matter. Each matter consists of many related documents as well as laws. Many matters are relatively straightforward, similar in scope and nature to what a private firm may handle, while others are unique to municipal settings.
For some matters, such as a labor, it may be related to a contract agreement. However, these contracts are not always static in Ontario County, but are "Evergreen", subject to revision and change over the term of the contract at the mutual agreement of both parties. As a result one matter may relate to the contract as it was at one point in time, while another is affected by a change. It will be necessary to relate matters to a proper revision of an agreement rather than the evergreen document itself.
The office would also like to be able to download a number of matters to the local hard drive for work in court or off site with all search and retrieval functionality intact.
The working system will be able to image and capture a broad variety of data and allow for much faster and easier retrieval of documents. It should also be able to relate one document to another if they are in fact related. This system should provide the county with measurable savings from the first year of the installation.
Flexibility of data retrieval is a key. Each department will be able to design and implement their own indexing system that meets their own unique needs.
Savings measures include, but are not limited to: Storage space reduction, quicker image capture via electronic imaging, speedy data retrieval, physical space not consumed due to archival of paper that should be destroyed and more efficient use of RAIMS staff time and energy.
Document imaging refers to the capability to take a paper-based document and transform it into a digital image for storage and retrieval. Since its inception, document imaging has evolved from a simple storage and retrieval application into electronic management of images and associated text files--document image management. The technology has become an integral part of the way organizations do business. The systems themselves have evolved from standalone, proprietary systems to image management software that runs on industry-standard platforms, typically using a client/server architecture with industry-standard personal computers (PCs) connected via a local area network (LAN) as the image workstations.
Technological developments in this market include blending with document management and COLD (Computer Output to Laser Disc) technology, Workflow and Work Management, Web-enabled solutions, and interoperability standards; support for Windows NT as a server platform; front-end image capture; forms processing; and the continued movement away from a focus on paper to electronic source documents. Images are becoming just another data type to manage within an overall document management solution. Document management software organizes electronic documents, managing content, enabling secure access, routing document-based tasks and facilitating document distribution. Document management products provide functionality for storing, locating, and retrieving information throughout the document's life cycle, i.e., from the time it is created to the time it is archived to offline storage media.
COLD is an integrated hardware and software solution that takes computer generated pages containing transaction histories, customer statements and invoices and processes, indexes, compresses and stores them on inexpensive media such as optical disk or CDROM.
In the storage arena, trends include CD jukeboxes (which record as well as read CDs) and software for recording CD-ROMs on networks. There was also significant activity in the 5.25" and large-format optical areas, as well as some intriguing emerging optical storage technologies.
An image-enabled system looks like any other computer-based document management system, in that it provides for some type of file searching capability (keyword, text retrieval, etc.), as well as management of the storage and access to documents. Many imaging systems also provide basic workflow management capabilities, such as document routing, distribution, and audit trail and status reporting. But since imaging systems are designed to handle the input and output of images--in addition to text and data--they must also include devices and software to facilitate the following:
The most common input device for document images is the scanner. Scanners translate the image on a sheet of paper or other surface into a digitized image that can be read by the computer. Usually, images are compressed into a standard image file format, such as Tagged Image File Format (TIFF) or TSS (formerly CCITT) Group 3 or Group 4, before they are entered into the system.
In the client/server approach, scanners are often attached to a dedicated scanning workstation or server. In some cases, scanning workstations operate offline; the images may be cached on magnetic disk and then later transmitted to online optical disk storage (optical disks are specialized storage devices for images), or the images are scanned to offline optical disks. Later, the optical disks are brought online.
Today, users have three choices of how they want their documents entered into the system: as a compressed digitized image, as both digitized image and text, or as both a digitized image and an image stored on microfilm. Scanners capable of both image and text scanning are equipped with OCR software, or the OCR capability is imbedded in the image input software.
Though CD-ROM's 650MB holds vast amounts of text, greater capacities are needed for tomorrow's video, still images, and sound. For example, today's CD-ROMs hold only 74 minutes of video, which is not enough for a full-length film. A higher-density disk is needed to deliver high-quality video on television and sophisticated multimedia on computers. Though the driving force behind DVD is the entertainment industry--Time Warner and a handful of other studios were deeply involved in the development of the format--it also has the blessing of the major computer and operating system makers, including IBM and Microsoft, and such CD-ROM makers as Sony, Philips, Toshiba, Matsushita, and others. Current capacities for DVD writeable disks are about 7.4 GB per disk.
CD-ROM adheres to more standards than any other optical disk type. Sony and Philips, the co-developers of CD-ROM, specified strict mastering and production guidelines in a document called the Yellow Book. In 1987, the key vendors in the CD-ROM market announced standards for the arrangement of information on the disk. The High Sierra format (named for the hotel in Lake Tahoe, Nevada, where the vendors met) allows different retrieval programs to read data from the disk in the same way, regardless of the host computer or operating system. The ISO codified most of the High Sierra format in its ISO 9660 standard, which now guides CD-ROM formats.
Several derivatives of CD-ROM permit the distribution of multimedia information. Although not widely used, CD-I, or Compact Disk-Interactive, a full-fledged CD standard for data, audio, and video, is employed for interactive kiosks and training stations. It is codified in the Green Book. CD-XA, or Compact Disk-Read-Only Memory, Extended Architecture, is a variation on CD-ROM that adds color images and several levels of digital sound quality to CD-ROM and is used for multimedia. DVD-ROM was designed as the successor to CD-ROM and was standardized in 1996 by merging two rival schemes.
CD-R. The standards for CD-R, or Compact Disk Recordable, specify only disks that are created all at one time. CD-R is based upon the Orange Book standards. The first-generation recorders prior to the Orange Book allowed only "write at once," where the entire disk had to be written in one single stream or session. The disks were written strictly in realtime, which meant that a sustained data rate of at least 150 kilobytes per second (150K bps) had to be maintained until all data had been written to all tracks on the disk. Any interruption would render the disk unreadable.
Orange Book devices support a number of incremental write modes, allowing a disk to be written in multiple sessions. After the last writing session, the disk contents are finalized and then frozen by writing out the Table of Contents. Only then can an ordinary CD-ROM reader read the disk. Multisession disks are covered in ECMA 119/ISO 13490.
CDR Writing and CDF. Now that the vast majority of the CD-ROM drives in the field can read multisession CDs, vendors have shifted the focus of their concerns to adapting CD-R for convenient use as a storage device. The recent success of high-capacity, removable disks such as Iomega's Jaz drive, shows there is real demand for a device that can store, back up, and interchange large capacities of data. The key is to make it easier to store and retrieve data. Early CD recorders were designed to record large blocks of data at one time, not for the continual reading and writing associated with realtime data storage. The solution is to implement a technology called variable packet writing, which updates the disk in smaller increments and does not involve opening a new session with each update. This means going outside the limits of the ISO9660 file system, so there must be some alternative logical file support. Though Sony and some other vendors experimented with proprietary file system in 1996, a unified logical file format called UDF now dominates. UDF, which stands for Universal Disk Format was developed and formalized by the leading optical storage vendors under the auspices of the Optical Storage Trade Association (OSTA). It is intended for any writeable disk medium, including DVD, and ensures that the resulting file structures can be used on different media and computers.
After the image is scanned and verified, it must be classified or indexed. Image indexing involves the entering of character data that describes or tags an image for subsequent retrieval. This data can range from serial numbering to a lengthy, structured description. Traditionally, this data is entered manually from the keyboard. However, with the development of more sophisticated scanners, intelligent controllers, and OCR technology, manual indexing can be substantially reduced through intelligent character recognition of specific zoned areas or bar codes scanned from the document. Indexing data is stored in a database specifically designed to hold image data. What data can be used to retrieve an image is a function of the image index database. Some allow the image transaction history to be recorded. Some index databases link the image files to existing databases from other applications by pointers or common fields.
Imaging systems may not need to provide index management services if an existing database can search for documents by attributes such as an applicant's name or social security number. For example, a social security number can be assigned as a document locator number during scanning. Then, the existing database can be used, say, to search for all documents that pertain to a particular region by the social security numbers of all applicants that live within a certain state or zip code area.
Two factors influence the choice of storage media for imaging systems: how long the documents need to be stored and how often the images are retrieved. For documents that must be stored over long periods of time, microfilm or optical storage is preferred over magnetic storage. Magnetic storage is useful in applications that require a high retrieval rate but relatively low volume. Optical storage is recommended for high-volume, high-retrieval rate environments. The choice of rewritable versus write-once (WORM) optical disks depends on the application's requirements for ensuring the preservation of the original document over time.
Physically similar to compact disks, optical disk platters come in various sizes, the most popular being 5.25" or 12". The optical disk drive uses a high-power laser beam to etch data into the disk; a low-power beam is used to read the data. A 12" disk can usually hold about 2GB of data or about 50,000 images of standard business letters, approximating 25 filing cabinets. Jukeboxes, which automatically retrieve disks from racks and load them onto drives, can hold hundreds of optical disks, which is equivalent to ten million standard business documents.
When considering the use of optical media for long-term storage, it must be remembered that optical disk storage has not been proven equal to archived microfilm for longevity. This shortcoming of optical disk storage can be overcome through periodic backups or a record management procedure that coordinates optical disk storage with paper and film. In the latter case, the image index database is used to keep track of all media.
To increase access speeds, the image index database is often stored on magnetic disk, apart from the images on optical disk. The index database may also be stored on a separate dedicated server or on a host mainframe, apart from the imaging system.
Image retrieval is primarily a character-based function. Users search on keywords to find images and related information. Ideally, system design should consist of extensions or modifications to an existing database, text search, or linking software to enable them to handle images as one more type of data.
The majority of systems on the market today are designed to accommodate document retrieval over local area networks (LANs) or wide area networks (WANs). Once retrieved, the image can be displayed, printed, or faxed and/or routed via e-mail and workflow. The Internet and corporate Intranets are also emerging as a mechanism for distributing images and work items throughout an enterprise and beyond.
Additional capabilities are also becoming popular, such as the ability to 'publish' CD's containing not only the image data, but also a database file containing the indexes pointing to the images and even the executable front-end client programs.
Most displays present images at 70 to 130 dots per inch (dpi) and allow users to zoom in for full detail. Many applications require the display of images, graphics, and character data to be integrated from existing applications. For these applications, standard workstations or PCs are recommended over image-specific hardware.
For casual use, standard PC displays with VGA terminals are adequate. For high-volume, transaction-processing applications, large (19") higher-resolution displays are recommended.
Image data decompression is usually performed at the display workstation to reduce the burden on the network. The specific decompression hardware and software used will often determine how fast an image can be retrieved from the network.
Most commercial laser printers can print images along with character data. In LAN configurations, data decompression is usually performed at the printer to reduce the burden on the network. To accommodate the large number of pixels per page (ten times more than that needed for displaying images), printers are often connected to dedicated PCs acting as print servers that can be shared by all users on the network.
A growing application area for image processing to improve the speed of information distribution is fax integration. Facsimile management software controls the sending and receiving of facsimiles to and from conventional fax machines. Some image management software is capable of routing faxes throughout an organization.
Fax integration with document image management systems also provides fax applications with capabilities such as annotation, conversion for editing by OCR devices, automatic filing by keywords, and shared workgroup access. With the addition of voice telecommunications, imaging software can be coupled with a voice-response system for faxing optically stored images (e.g., in reply to common customer inquiries).
Operating Systems Platforms
Document imaging systems are now being made to run in a variety of environments. Windows NT is cited by many as a suitable imaging platform, because it runs on Intel microprocessors as well as Digital Equipment's Alpha series. SunSoft's Solaris 2.0 operating system will run on both SPARC microprocessors and PCs. These operating systems make it easier for imaging vendors and users to image-enable existing networks.
In addition, operating systems are incorporating imaging services that can be accessed by application programs. Imaging capabilities are becoming a core part of the computing and communications infrastructure. Imaging, document management, and workflow capabilities are also becoming more tightly integrated with groupware products such as Lotus Notes and Novell GroupWise and Microsoft Exchange.
Depending on the degree to which the software can be used to implement a customized system, image-enabling application solutions can be categorized as follows:
If an existing application currently uses standard software, users can check to see whether or not image I/O capabilities have been built into the application. If so, the vendor has image-enabled the software so that the package can now handle images as just one more data type. Usually, imaging functionality is integrated through an add-on module to the software that will provide the necessary image utilities, image device support, and interface to image I/O services. Users should be aware that specialized imaging hardware must still be purchased and configured to the network in order to take full advantage of the imaging I/O capability.
A variety of shrink-wrapped imaging systems are currently marketed. All are designed to specifically manage document images. While many are general-purpose systems, some include the capability to tailor the user interface to match an organization's way of doing things. These systems may also be capable of managing other document types along with images, such as word processing documents or data downloaded from mainframe applications programs.
These systems usually come with their own database and image-indexing methods (which may or may not be customizable), as well as image utilities, image device support, and the application interface to image I/O services. Workflow automation (such as document routing) may also be included. Typical examples are IBM's ImagePlus and FileNet's WorkFlo Business System and Hyland Software's OnBase
Because image software applications have varying requirements, these shrink-wrapped packages often come with a means to customize the basic image management runtime components of their software. Some image software systems provide their own programming language. This is generally a modified and/or extended version of the language used by the underlying database package. Others include application program interfaces that can be called from specific high-level programming languages (such as C) to obtain imaging services. In general, however, these systems are designed to function as separate imaging application solutions and offer limited integration with existing applications.
Often, users find that investing in a turnkey system is like investing in a dedicated word processing machine; it cannot do more than the one imaging application. This is often true for vendors that require users to purchase dedicated servers from them rather than allowing them to add the software to the server already in house. Even if the system includes tools for integrating imaging into existing applications, it can often be a force fit; operations have to be changed--not for greater productivity, but to adapt to software limitations. If a user finds that turnkey systems force the organization to mold operations to the software rather than the software streamlining existing operations, then "building your own" may be the solution.
Image Management Development Software
Image management development software provides ready-made imaging services that can be installed on existing LANs and made to work with existing software. It also provides a number of applications development tools for allowing new or existing programs to access those imaging services. These tools allow developers to integrate image management functionality into specific business applications programs, thereby allowing them to concentrate on building the business application rather than programming image I/O. Examples include BancTec's Plexus XDP.
Depending on the software's database independence, users can use their own database management tools or those provided by the development software to build the image index database. Other development tools provided by this type of software include the following:
The growing acceptance of Windows as the preferred graphical user interface (GUI) at the image workstation not only allows transfer of image index information between applications, but also allows the use of standard Windows development tools for interfacing user applications to imaging services provided through imaging DLLs.
While "building your own" is more expensive than installing an off-the-shelf solution, it still affords both in-house developers the opportunity to focus their efforts on developing a strategic business application for imaging, not imaging I/O. However, there may be cases when organizations find that their applications require more complex image I/O functionality than is provided by the imaging services inherent in the available development software. Some may require specialized drivers to nonstandard imaging peripherals; others may require special image manipulation and viewing capabilities. For these applications, users will find that it is best to inquire whether the vendor can provide extensions to its image management development software to meet special needs. If not, the answer may require the development of a totally proprietary system.
Proprietary Image I/O
Developing a proprietary image management system for image integration is the most expensive delivery option. However, rather than work completely from scratch, developers may find that their special needs can be satisfied by building a system based on one or more imaging I/O subsystem software packages. Developers must then build their own central server or management software using in-house database programming tools. While this option often provides the user with better performance and perhaps technically innovative solutions, these benefits must be weighed against the high cost of development. In addition, use of nonstandard peripherals must be weighed against the possibility of vendor support for these peripherals being withdrawn from the market in the short run.
Solution Implementation Examples
Best Case Example
The best possible solution is one where the entire county can benefit from a single, centralized Imaging System. This system should have the capability of handling all the Imaging, Document Management and COLD needs of the county in one completely integrated offering.
The benefits of using a centralized system are many. First, the cost of a single system is far cheaper than a multiple system installation. Having a single system also leverages costly IT resources needed to care for it. A single, centrally located system allows for easier security administration. It also homogenizes access to the data, allowing live access via LAN, Intranet, RAS or fax back service. This system would also allow for data publishing via CD.
The component pieces of the solution are broken down into the hardware and software components.
This server should be a scaleable dual processor (Intel) server with expansion capability to four processors. It should be a system capable of high bandwidth data transfers to and from a connected Optical system or RAID 5 disk array. Connection to the network is 100baseT.
The Jukebox is the main long-term storage option for this system. All images are written to the CD's contained within it. It is served off of the Imaging Server and should be able to accommodate at least 300 GB of near-line storage. Latency of data retrieval is longest when a CD changer arm has to remove a CD and load one for retrieval. Using a RAID 5 disk array for live or most active data will keep latency to a minimum.
It is also required that this system be purchased with scaling in mind. It would be wise to overbuy capacity. If a three-year plan suggests a six-spindle jukebox, purchase of ten-spindle system is recommended. Initial designs of an Imaging system usually underestimate the amount of optical storage needed over the intended term of the purchase by as much as 30% - 50%.
Depending on the offerings available at the time of purchase, a high-speed jukebox with a fiber channel connection to the server is desired. Newer technology usually costs more initially, but the long-term savings are often dramatic due to the need to upgrade older equipment sooner.
The optical Jukebox can be plain vanilla CDROM, Proprietary CDROM format or DVD. Each of these options is valid, with DVD gaining popularity.
Array Image Cache
This is a high capacity, high-speed RAID 5 disk array. The data stored on this array would be destined for transfer to the optical system, and the often-accessed images and data less than a year old could also be kept there.
The size of this array is dependent on the amount of paper generated and captured and at what rate the county accesses that data. It is apparent that many departments do not often access data once it is stored, but others, such as the Department of Social Services and the County Attorney's office do. High repeat requesters will require this high-speed storage while offices like the County Treasurer will only require the Optical options.
Backup of this array can be handled via a striped DLT array, or the images stored on the array can also be written to the optical system. Live documents that have a revision history can also be stored to the optical system and Document Management Software can handle the check-in, check-out process. Connection to the Image Server is Fiber Channel.
Application Database Server
This server exists to accept and carry out the requests and queries from the imaging client software. Since this platform runs very well as an NT Server, it is recommended that this component run Microsoft SQL Server. It should be configured to handle a fairly busy transaction load. It handles the index pointers to the image database. Connection to the network is 100baseT.
The RAS Server is installed to allow dialup access to the image database. Most likely the need for this service will be fairly light, and therefore this server can be lightly configured. It is important to note that fax back services can also be offered via this server. Off the shelf fax software along with a Brooktrout fax board will provide adequate service. The county as a whole without impacting the fax back imaging service can also utilize this fax service.
The recognition server is a critical piece of an efficient and robust imaging solution. This server examines predefined zones on the image and does pattern recognition and Optical Character Recognition (OCR). Many departmental forms have zones that contain the information that identify the document. The recognition server examines those zones for the data, which can be keywords, and creates and index of that document. Poorly scanned documents or documents that cannot be recognized are sent on to the edit stations for hand indexing or rescan.
Primary image scanners should be able to attain image capture rates of 120 pages per minute in duplex mode. Secondary rescan stations can be the flatbed variety.
Workstations need to be high performance graphics workstations. These should be configured with mirrored HDD's, 64 MB RAM, 21-inch high-resolution monitors, quality graphics cards, and the fastest Intel Pentium processors available. They should also run NT Workstation.
All of the imaging components listed above that comprise the imaging production area should be fully connected via a 100-megabit Ethernet network. Since production image traffic can be heavy at times, component isolation via a 100-megabit switch is critical. Additional WAN configuration work may also be needed to allow the free flow of Windows NT based network traffic on the WAN.
Since the majority of users are not connected by a robust WAN, Ontario County should look to either: upgrading to a more capable WAN infrastructure (more capacity); install additional ISDN links that could be dedicated to image traffic or make most access to the imaging system via dialup.
OnBase, from Hyland Software provides a full featured document-imaging solution that is scalable, and based on common standards. It is a Client Server application consisting of a database, image repository, and client tools. OnBase is sold as a modular system, providing customers the opportunity to build a entry level system, adding functionality to it on an as needed, or as budget allows. It also meets all the needs and criteria listed in the Need Analysis portion of this study. The following is a description of the suggested modules and their function in an overall solution.
OnBase has a rich set of security tools. These tools allow users to see only certain sets of data that they have been defined to see. All access to the image database is governed by a system logon and access to image files are determined by the rights granted to that logon.
OnBase is also DMA compliant, which allows other imaging system clients to access the data managed by the OnBase database.
Datacap recognition software is used to speed the work of indexing documents that are standardized forms based. For example, many of the forms used by DSS are standard, having standard fields for input and check off. Datacap can recognize this data, and utilizing such software increases operator keystroke speed from 120 per hour to as high as 12,000 to 14,000 per hour. Datacap is also used to recognize batch header sheets to recognize barcode and character data that indexes each batch of images for the system.
The client software is the front end that users utilize to search, view and work with images, COLD data, Internet and Workflow. All online images and document types are treated the same way: as data objects.
Document retrieval is done in any of five ways: Retrieval Dialogue, Foldering, Customer Query, Text Search and Cross-Referencing. Once a document is retrieved, it can be annotated, printed, faxed or emailed.
The user interface is a fairly easy to use Windows based application. Menus and options are fairly intuitive for quick productivity by end users new to Windows, but powerful enough to allow sophisticated users more flexibility and control.
The Client software is a key component. This software is not a full-featured Document Management solution, but based on conversations with Ontario County staff at several locations, full-featured Document Management is not what is required at this time or for the foreseeable future. Document Management packages have much greater learning curve and feature options that are currently not used in Ontario County processes.
The Imaging module incorporates the ability to scan, index, archive and retrieve. This is a key component of the system. It provides the back end engine to handle the filing of images and the creation of the associated indexes.
Configuration of this software is point and click and includes an image viewer for an operator. It can handle tens of thousands of documents per day and allows for manual indexing with automatic field fill, such as entering an account number and auto filling related data already stored in the database.
The software supports TWAIN and Kofax scanner interfaces and more than 50 file formats.
COLD (Computer Output to Laser Disk) provides mainframe direct to Optical Disk capability. Currently, greenbar reports are scanned into the Microfilm system for future retrieval. COLD allows for immediate transmission to CDROM system any mainframe based output.
OnBase provides this functionality along with cross-referencing, allowing simple double clicks on predefined fields of the document to bring up referenced documents.
OnBase also allows for overlay generation, simulating greenbar or other computer generated forms suck as invoices and purchase orders.
OnBase COLD supports up to 50 GB per day processing of COLD data.
The Web Server component allows the creation of Intranet or Extranet capabilities. Using industry standard browsers such as Microsoft Internet Explorer or Netscape Communicator, users can search for and retrieve documents.
Extended Mail Services
Mail services allow any document stored within the OnBase system to be sent to any MAPI or VIM compliant system. Microsoft Exchange is a MAPI compliant system and would integrate nicely into an overall image delivery scheme.
Where end users do not have direct, dialup or fax access, the mail router could be used to email documents to the recipient, even over the Internet.
CD Authoring provides a way to create a CD based disk backup of any disk volume that has been defined on the OnBase system. A disk volume is the definition of a particular CD or CD's that contain only specific data. For example the Treasury Department may have assigned to it only one or two CD's n the system at any particular time. This is transparent to the users, but is a critical for storage management. CD Authoring allows copies of existing online storage to be copied off to another set of CD's for the purpose of having an additional backup.
Export is a function of copying the OnBase images and their respective Indexes to CD's, Disk or other devices. The OnBase client still queries the online database for the whereabouts of such exported data, and can point a query to a locally held CD for retrieval.
OnBase publishing provides for a CD to be created that is a wholly self-contained image retrieval system. It contains the image data, indexes, run-time database and executables needed to complete a full search and retrieval of needed documents. Departments utilizing this method of document retrieval would not need connectivity to the OnBase system.
Implementation Time Frame
Implementation time frames are wholly dependent upon which department and how fully integrated the system should be. As a closely held rule, a great deal of planning goes into the implementation of a system before the first PC is even switched on. It is strongly recommended to fully integrate as much as reasonably possible the system into the workflow. It is always easier to scale back than it is to scale in after the initial production starts.
Full integration means to capture all paper and COLD data that may be needed in the future, and to fully index that data now even if these indexes have marginal value based on current project scope. Future additions to the project scope will immediately benefit from this upfront work when cross-referencing of data is available to use as the new project phase comes on line. The idea is to plan for a very rich data set to query against.
Based on vendor experience, expect 45 - 60 days initial setup for DSS. This does not include the upfront planning and development work expected to properly implement the system.
System Administration Requirements
Imaging systems, due to their complex nature and numerous parts require care and feeding. It is recommended that any system installed be managed by county IT staff or an outside vendor. The skill set required for a successful implementation includes strong PC and network skills, SQL DBA, server administration, Network Operating System administration, Capacity Planning and Performance Monitoring skills and strong support and troubleshooting skills.
A qualified administrator should have advanced training and experience on the operating system platform, be an experienced network administrator, obtain formal training on the Imaging solution chosen and have advanced training or work skills in the administration and SQL programming of the database engine chosen to drive the application.
Overall administration should not be burdensome, but could be an additional work item for a dedicated LAN Manager. Generally, much of the administration tasks are associated with system monitoring and proactive management. Most of the SQL administrative tasks can be automated with Microsoft SQL Server, but some performance tuning, monitoring and administrative tasks will have to be done by the administrator.
Department of Social Services (DSS)
DSS would have the bulk of the imaging work done in the county and would most likely prove to be a heavy user of the system. It is therefore sensible to either place an imaging system in house at DSS or make a system available to it over a robust WAN link. In either case the system would appear much like the solution on the next page.
INSERT DSS PICTURE HERE
To be expeditious, DSS may wish to have at least one high-speed scanner and a bed scanner available on site, along with edit stations. Doing this would necessitate the installation of a high-speed WAN link back to the imaging servers. Ultimately, the best installation would be a locally installed system in its entirety. Outside access to this system for image retrieval by other departments could easily be controlled by a firewall, allowing only clients that are predetermined to access the system at all.
Once DSS is online, additional capacity can be added as new departments are added to the system and RAIMS staff can augment the imaging staff either onsite at the image center or be staffed at the RAIMS building. Again, it will be necessary to add additional WAN capacity if imaging work sites are distributed.
It is most likely too difficult to begin with the entire DSS organization. It is often a good course of action to begin with just one subsection of the organization, install the solution and get it running. This small-scale project can then move forward with more speed across the rest of DSS and the county.
County Treasurer's Office
The Treasurers Office is concerned primarily with document retrieval of source documents that generate the journal entries. This department does not need full access to the imaging system, but is an excellent opportunity to utilize the CD Publishing facilities of OnBase.
CD Publishing places all data, indexes, databases and image data directly on a CD that is an entirely self-contained image retrieval system. CD's can be published to the department on a monthly or quarterly basis, depending upon how much paper data must be kept in the office at any given moment.
Additionally, the Treasurers office would highly benefit from COLD technology, sending reports from the mainframes to the optical systems. It is also possible to install a few low-end bed scanners into the office to allow staff to immediately scan and index documents as they come into the office.
Migration of this office to a new imaging system is perhaps the most straightforward office available for conversion. All documentation already has built in links to one another via account numbers. These account numbers are a ready made index that can be instantly utilized in a powerful way by the department.
County Attorney's Office
The County Attorney's Office could also benefit from imaging. Benefits are numerous, ranging from having all documents generated readily available and searchable, as well as being able to download all work product and ancillary documents to a laptop PC for easy transport and use offsite.
Additional functionality that this office would benefit from is the ability to do full text searching of its own work product. Since the suggested imaging system would be able to import documents in Word or WordPerfect format, creating a full text index of these documents is not difficult or beyond the reach of current technology.
The Attorney's office would benefit most by attaching directly to the imaging database, rather than having CD's published to them due to the dynamic nature of the work being done there.
The Attorneys office is different from the rest of the county offices interviewed in that there is a much more pronounced need for a robust document management system as well as a case management system. There are packages available for the legal community that combines these roles, but no software currently exists that fills out the full requirements of the legal environment. As a result most firms are now utilizing both systems, with a planned cut over to either of the two when functionality reaches the desired critical mass.
This system is therefore not being suggested as a document management system, but as a document image archival and retrieval system.
County Records, Archives and Information Management Services (RAIMS)
RAIMS is charged with operating the county's records archival and retrieval program. As a result of the quickening pace of document archiving and retrieval, RAIMS is rapidly reaching critical mass as to what it can handle without making large and permanent nonperforming capital expenditures.
An Imaging system provides solutions to many of the problems that RAIMS is facing. Storage space is at a premium. Reducing a document to the size of a CD solves, long term, the space problems. It also reduces the amount of extra handling a fragile and valuable historical document gets over time. Additionally, CD's and the data on them can be made available online to a wide audience via a number of delivery and request mechanisms, many of which would require absolutely no staff intervention or assistance.
It is recommended that RAIMS and DSS work together to pool resources and capabilities to co-develop an overall imaging architecture for the county rather than work individually and separately. Since RAIMS and DSS reside on the same campus, a highspeed fiber link could economically be laid between the two buildings to provide needed connectivity. Estimated fiber cable runs on private property for a 1500-foot length is about 10K - 15K plus associated equipment like switches and routers.
With a fiber connection between the two sites, imaging equipment could be distributed according to need and use, including imaging servers and optical jukeboxes. It is important however to ensure high-speed access to a single image index database to all users.
Security is important, and the imaging servers can be firewalled off from the rest of the enterprise and the Internet, allowing only access to those who are specifically allowed to access the site. Certain firewalls like Altavista allow itself to be configured to pass traffic from particular logons or IP addresses. As a result, the firewall can provide a critical layer of security between the archives and threats.
An initial system configuration should be large enough to accommodate a full DSS rollout. This system would initially accommodate the needs of RAIMS and DSS and provide a good foundation for future growth. The initial system would look like the diagram in the solution implementation examples. This area would then also be protected by a firewall such as Altavista.
Due to the apparent concentration of potential users and highest traffic concentrations, the system should most likely reside at the RAIMS building or at DSS. An alternate location would be at the county IT offices. One drawback to this location is the lack of WAN connectivity that would support imaging traffic back to the new campus where the heaviest usage would occur. Also, all documents sent in for imaging would have to be sent by courier to the imaging center.
The WAN infrastructure at the new campus does not support this configuration. It would be cost effective and highly desirable to install new fiber optic cabling between RAIMS and DSS. Since the land is private property, cable can be put down with a minimum amount of trouble or regulation. An FDDI fiber loop provides the best solution due to the redundant nature and scalability of these installations. The new building should also be wired into the FDDI loop for inclusion into the WAN.
Additional Fiber is also recommended between RAIMS and the downtown complex if possible. Another approach is to consider a wireless solution, which may be more cost effective over the distance covered by cable. Wireless WAN technology now offers 10-megabit speeds routed or bridged IP and IPX traffic.
Initial Project Scope
DSS is the most active department regarding the creation of paper based data. It would make good sense to begin the imaging project there in conjunction with the efforts at RAIMS.
At DSS the project could begin with a single section of the office, such as Food Stamps. If a vendor experienced in DSS imaging projects is engaged (highly recommended) then the scope of the initial project can be greatly broadened to encompass much more of the DSS scope of services.
RAIMS should be tightly integrated into the project from the earliest kickoff and planning meetings. And have representation at all stages of the project till completion.
County IT staff should also be present at any technology-related meetings and installations. They should also acquire full product training for administration and troubleshooting as early as is prudent. It is recommended that county IT staff assume administration duties over the system from the beginning.
As project success is achieved the scope and pace of the project can widen and accelerate. Familiarity with the system will lessen dependence on outside vendors, but it is plausible and highly desirable to engage a qualified vendor to provide feed back on plans and progress. Scope widening would include the addition of other county departments as well as rounding out the installation at DSS.
Overall Time Frame
Project time frames are dependent upon several factors. Constraints include fiscal realities; social and organizational culture as well as there being some business processes that may have to be rethought to accommodate an imaging system.
An initial time frame for planning and implementing the above suggestion would run about 6 months. The technology installation itself is rapid, with the follow on work bringing it into production would run 30 to 45 days in duration. After implementation of the initial project completes, a full assessment of the entire implementation, as well as the planning process should be examined and studied for refinement opportunities. Secondary projects and follow on work can and most often do proceed at a much quicker pace.
A full implementation overall would most likely be achievable over three years.
Recap of Project Critical Path
Costs over a time frame of three years are difficult to estimate with accuracy. Hardware and software costs are spiraling downward, and Moore's law state that every eighteen months the cost of computing halves while raw power on the CPU doubles. This law has proven true over the test of time and there is every indication that it will continue to do so for the foreseeable future.
Based on similar installations occurring in New York State Counties, an initial investment ranging from $500,000 to $600,000 is not unusual. This can certainly be scaled back, but it is critical that any smaller plans will scale up to meet the needs of a growing system without incurring any additional costs due to unanticipated data migrations.
Over three years, the county should expect to spend between $750,000 and $1,000,000 on the overall project with 100% participation.
Return on Investment
Cost justification of a document imaging system is a topic that is either of critical interest to the organization or is totally ignored in the planning process. It seems that there is little middle ground with this issue. Depending on whom one listens to, these systems either cannot be cost justified on the basis of hard-dollar savings, or else they are the easiest systems possible to justify. Experience has demonstrated that the following points are worthy of consideration when beginning to cost justify a solution.
Foremost in the calculation is the opportunity for cost avoidance. Investment returns that are upfront are the obviated need for a new facility or hardware to house paper documents. Arguably, this is just transference of cost, yet over time, a new building is a very pricey proposition.
Many surveys indicate that up to 35% of knowledge workers time is spent in the searching for and retrieval of information. In a paper filled office this is often exacerbated by the fact that only one person at a time can access a file and that documents are easily mislaid, misfiled or simply lost. Studies show:
The time spent in data retrieval and document handling reveals a large opportunity to lessen the amount of time and money spent in the daily paper handling routines.
Ontario County cannot expect to justify the system on the basis of staff cuts. In most cases, the total number of people involved in a process that has been image-enabled is the same after installation as before. What tends to happen is that personnel whose skills have become unneeded with the document imaging system (e.g., file clerks) either get retrained in more skillful tasks (e.g., scanning and indexing) or are replaced by individuals capable of performing these tasks.
In applications with a large customer contact component (e.g., insurance claims processing, DSS), the cost justification components include an enhanced customer service, which will lead to an increased work volume capability of those who use the system. Far less time is spent looking for information. Such "soft-dollar" savings are difficult to quantify, but the results can be significant.
In applications such as accounts payable and accounts receivable, significant benefits can be realized with regard to timeliness of payments and faster billing. By enhancing the process, the organization can often make the "float" work in its favor, adding to the cost-effectiveness of the system. Additionally, research of journal entries can be point and click. A line item that an analyst wants to see the source documents for can be simply clicked, and a listing along with the associated images are made available in seconds.
If the application requires response to customer telephone inquiries, a document imaging system can often allow the customer service representative to answer the question during the initial customer-paid call, rather than having to wait for documents and initiate a callback. In a large company, the savings in long-distance telephone charges (not to mention customer service representatives' time) can be especially significant.
Cost justification is not a simple process, nor are all of the potential savings possibilities necessarily obvious. The list above is only a beginning, but those items cited have been common and important in a large number of installations.
Ontario County has a great deal to gain from an Imaging system well beyond the initial requirement to merely compress the space that paper data consumes. By adding an imaging component into the daily routine, short term and well as long term storage can be dramatically impacted in a positive manner. Document retrieval, which is a very large hidden cost of doing business, can also be greatly reduced to a fraction of the current cost.
By applying this technology carefully, Ontario County can improve it's processes, reversing the trend of higher and higher expenditures for records management and create opportunities for all departments to access and use their paper based data in a vastly more efficient and cost effective manner.
In the short term, six months to one year, RAIMS must implement an imaging system to avoid experiencing filling up the archive area. DSS, which is an enormous generator of paper data, would do well to team with RAIMS to quickly implement a solution.
County IT should take a leadership position, working to develop an RFP that addresses not only the technical components of a solution, but also addresses and defines required vendor performance for development and installation of a system as well as clearly defined system performance metrics. For example, one system metric is to be able to find and retrieve any online document within two minutes.
Development of these measurement metrics is a time consuming and difficult process, but working through such an exercise brings to clear focus what processes need support and how should it be measured for improvement. Any RFP award should be followed up with a Service Level Agreement (SLA) as part of a contract that defines the roles and responsibilities of the Vendor and those of the county. This SLA should have rewards and penalties based on the metrics developed above.
Finally, work to begin this project should begin immediately. Space constraints on RAIMS and other departments are beginning to fill up. A project of this magnitude takes a number of months before production begins in earnest.