OCR — Hanwritten image identification

Avik Das
1 min readJan 7, 2020

Description:

You all would have often faced the issue of not being able to recognize handwriting, either it is a Doctor’s prescription or sometimes, even your friend’s assignment. This problem might have caused some harm, maybe due to the delay in submitting the assignment or seeking chemists’ that can recognize that particular handwriting. Therefore, in this talk, we will be focusing on how Python and Data Science can be used to recognize handwritten digits and character which will ease out the pain of recognizing haphazard writings.

Before we begin, I will succinctly enumerate the steps that are needed to detect handwritten digits -

  1. Create a database of handwritten digits.
  2. For each handwritten digit in the database, extract HOG features and train a Linear SVM.
  3. Use the classifier trained in step 2 to predict digits.

MNIST database of handwritten digits

The first step is to create a database of handwritten digits. We are not going to create a new database but we will use the popular MNIST database of handwritten digits. The MNIST database is a set of 70000 samples of handwritten digits where each sample consists of a grayscale image of size 28×28. There are a total of 70,000 samples. We will use sklearn.datasets package to download the MNIST database from mldata.org. This package makes it convenient to work with toy datasbases, you can check out the documentation of sklearn.datasets here.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Avik Das
Avik Das

No responses yet

Write a response