RustyStriker/computer_vision_project

Fork 0

End project for the computer vision course. Upon the input of an image with text in it, the location of the text and content, the program needs to distinguish between 7 fonts

Find a file

Rusty Striker 5a51510209 todo list		2024-01-20 21:37:07 +02:00
fonts	add the fonts	2024-01-12 14:46:42 +02:00
classify.py	implement avg color function	2024-01-19 15:51:35 +02:00
rasterizer.py	rasterizer using PIL	2024-01-12 14:39:30 +02:00
README.md	readme and stuff	2024-01-19 13:37:22 +02:00
TODO.md	todo list	2024-01-20 21:37:07 +02:00

README.md

The ALGORITHM!!!

The general idea is to make a filter/mask of each of the corresponding fonts, and attempt to match them to the given letter.

Scoring system

The score each font will have will be based on the average color(acolor) underneath each font mask(might be different acolor for each mask).

After obtaining the acolor for a mask, the score will be calculated as the sum of the different pixel scores.

For a given pixel(po for the original image and pm for the mask, same position) its score will be calculated as follows:

S_p = | po - acolor | x (0.5 - pm)

it is assumed that the font mask is of values between 0..1 and made as a 'white on black' text(so 1 is where the font is).

The given score calculation will take into consideration color variations of where the letter should be, while also taking into consideration the fact that the background should be of different color.

Potential improvements

Some potential improvements would be:

Only consider pixels in the font and their outline. This might be helpful, as it would mean we dont care about pixels that are too far away, but assuming a good bounding boxes, it probably wont give much better results(or at all). Additionally, it poses some questions of which pixels should be considered, as both the text and mask are anti-aliased(thus having "weak" pixels)
Increase the area around the font. This idea can make sure we are not looking too inwards, although it shouldn't matter since we are looking to classify from a predefined set and not search them randomly, thus the potentially good information missed shouldn't matter that much (i.e. all scores will be 0.1 lower but the correct font shall still be picked)