Introducing a new optical character recognition system.

OCRs (for Optical Character Recognition System) is an optical character recognition system developed as part of the EPITA 2nd Year Project. This system is powered by an artificial intelligence composed of a neural network system capable of learning millions of characters, such as the Unicode table. This software extracts a text within an image (photograph, digitized text, scanned document) via a segmentation of the image (separation of the different elements constituting the image) and the use of a neural network capable of matching the characters.

A super cool Optical Character Recognition System.

HELE (for SaraH-PierrE-PauL-NepheliE) is a group formed at the school EPITA so as to work on a project which is creating a version of an OCR. An optical character recognition system is developed as part of the EPITA second year project. It is powered by artificial intelligence with a neural network which is able to learn a tones of different character such as Unicode table.

This software is capable of extracting a text from an image with a photo or not by segmenting the image in the different letters composing the page and by putting them in the neural network which match those characters with the expected.

REQUIREMENTS

The HELE's OCR require the following C libraries to work:

SDL
GTK+

INSTALLATION

Before installing, make sure that you fill all the requirements. Please refer to Requirements section.

Download the archive from GitHub
Unzip the archive (Using 7-ZIP or Winzip for example).
When you are in OCR/ repository, open a Terminal and execute the following commands:

$ make

$ ./OCR

$ make cli

$ ./OCR train epoch

$ ./OCR path

HOW IT WORKS ?

The OCR is processing as follow:

Loading the image
Deleting the colors (gray scale, black and white)
Pre-treatment
Detection of text blocks
Character detection
Recognition of detected characters
Reconstruction of the text

The division of blocks of text into characters is necessary to send them to the neural network.

Character recognition is the central part of OCR, it requires a learning phase during which the neural network will learn to recognize the different characters.

To launch it, please refer to section Installation.

CREDITS

Students of Epita at the origin of the code : Unikarah, Lyrianna, Pierre and Paul

Get In Touch

We will be glad to receive reviews and updates from you !

Address
86 Bd. Marius Vivier Merle
69003 Lyon
France
Phone
04 84 34 02 61