Scanning Images with python

Eman Diab
4 min readJul 12, 2020

Hello ,

I was dealing with an API that extract text from images. and i noticed something important API only have good results when i send it scanned image. This simply means if I take a photo from my mobile camera and send it directly to the API i won’t get results.

The API was simple and its accuracy was fine giving scanned photo so I decided to look into how to scan image in python. After some digging and watching many tutorials i realized that each tutorial have a part of the puzzle and i should bring them together in order to got what i want.

I am still a beginner in image processing but i’ll try to make this as much simple and clear as i could I hope you enjoy the journey.

First let’s install necessary packages in our PyCharm project you can use whatever editor you like i preferred PyCharm as it creates virtual environment let you do whatever you want inside it without affecting the main root.

Installing Packages :

1- opencv-contrib-python you can use opencv-python package but i prefer contrib package to get access to extra modules developers have created.

2- scikit-image to handle some image processing stuff.

3- numpy to handle matrices

4- imutils to make image processing functions easier.

all these package you can install it using pip already defined in venv of PyCharms projects that’s it now we have our environment clear and ready to go.

Edge Detection :

we will explain how the code work briefly to get sense of what is happening

first we are reading our image then, we take ratio of how big our image is compared to height 500 pixel. after that we resize our image using imutils library, this step aims to speed up our process and make our edge detection more accurate. and here comes the interesting part.

we convert our image from colored to gray scale

gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

then we apply bilateral Filter we should talk about that a little bit

bilateral filter is basically used for smoothing images, reduce the noise while preserving edges. normal blurring usually get rid of noise whether it was noise or actual edges to overcome this problem we use non-linear bilateral filter.

gray = cv2.bilateralFilter(gray, 11, 17, 17)

Now let’s smooth the edges a little bit with medianBlur

gray = cv2.medianBlur(gray, 5)

Now, we should be able to detect our edges perfectly fine using canny algorithm you can read more about it here. it is quite interesting.

edged = cv2.Canny(gray, 30, 400)

Finding contours :

Before looking into the code we need to imagine the idea now we have each edge of the image. and we have a very important advantage, as we are scanning a piece of paper which usually will take the shape of rectangle So what we know until now is that we have a rectangle shape with four points and four edges. usually the document to be scanned would be the largest area in the image which means in other words largest edges have higher probability to be the document we are scanning.

in open-cv we detect the object black on background white so it is important to apply threshold or canny edge detection before coming this far.

See, there are three arguments in cv.findContours() function, first is source image, second is contour retrieval mode, third is contour approximation method. And it outputs the contours and hierarchy. Contours is a Python list of all the contours in the image. Each individual contour is a Numpy array of (x,y) coordinates of boundary points of the object. “as open-cv documentation said”

in line 6 we sort the contours in reverse way so that we have the largest contours which we are interested in first.

then we loop through contours and approximate the contours then specify that it is a closed shape passing parameter True to cv2.arcLength

if we have four points we simply determine the width and height and the top left two points of our rectangle as in line 17 using cv2.boundingRect

then we save all rectangles and widths we find to apply findLargestCountours and find the largest one as we said the largest rectangle should be our document.

Check

if length of points is less than 2 this means we failed to find rectangle so print that. Also if the widths are not the same this means that our shape is nor rectangle thus print mismatch in rectangle.

Apply Transform and show the results

Now we take our largest screen and reshape it to 4*2 means four points each one has x,y. In line 3 we just order our points to shape the rectangle using order_points function

In line 5 we apply four point transformation this is simply gives you a bird look to your document like you are flying and see from a vertical perspective “birds eye view” you can find more about that here.

then we apply gray scale to our image and define threshold to get the black white image scanned as in line 8&9

we see the value of ratio in line 6 is that we can get back the size of the original image. Notice that we working on an image with 500 pixel hight only while our original image is 500*ratio.

Then we show the results in line 10:13 and we write the results to disk for further using.

Here is how the results should look like the original image on the left the scanned one on the right

Awesome! Looks like we actually did it.

Thanks for following, The full code will be available on my github here.

It was really helpful to use this resources to help me write and explain this tutorial:

1 — source 1

2 — source 2

3 — source 3

--

--