As a result of scanning a document obtained his electronic image , which is a sort of digital pattern of dots – a raster. Of the points is everything – and the background, and the image itself . Speaking of points, we mean not circular elements of the image as it is considered to be intuitive, and squares, on which the image is divided in the scanner imaginary grid of horizontal and vertical lines
The resulting electronic image of a document is stored in the generated file, where all points of the image is described as a point in the plane, each with its coordinates, color, and other attributes. In such a file of text and numbers, and other elements of the image are recorded the same way – as graphic images composed of pixels.
Graphic bitmap file stored in Windows, you can learn by extension – the last three letters standing after the dot in the file name : *.tif, *.png, *.bmp, *.jpg , etc.
So, as a result of the algorithms page OCR software converts from a set of graphic images in the text characters , and in the specified format , such as Word or Excel. This preserves the appearance, that is, the formatting of the original document are stored tables , graphics , etc. In addition to the XLS and DOC program supports all commonly -known text and graphic formats, spreadsheet formats , and the formats of Internet Explorer and the previously mentioned Adobe PDF. For better recognition first of all you must clear your scanned document, you can do it here.
Modern programs OCR used quite sophisticated algorithms , leading to very high results . When clean, contrasting images composed of standard , widely used fonts usually achieved recognition accuracy 99,5-99,8 %. Despite the fact that such recognition accuracy seems very high , the number of errors still can not be considered negligible. If we assume that on a standard page is 1800 characters , then the ratio of successful recognition of 99.5 % on average on each page , you may receive up to nine incorrectly recognized characters . Of course , the result will vary widely , depending on many factors.