The database or information system businesses can contain documents from various
sources that contain similar or identical information. The same text can be with
different headlines, with a few changes or additions that makes some confusion
when it is used. For example, the base of the enterprise may contain several
similar documents virtually identical in content, but with different headlines
and slight changes in the text. And this could be a situation where one will
give comments on the document No. 1, another specialist-to document No. 2 and so
on. The first is double the work (why two or three times to comment on the same
document?) And secondly, the continued use (if we assume different commentaries)
of the treated and imposed specialists can provide information unclaimed.
Existing search technology similar to the content of the documents and decide
this issue by comparing incoming business documents in the database with data
already contained therein and to identify duplicates.