End project for the computer vision course. Upon the input of an image with text in it, the location of the text and content, the program needs to distinguish between 7 fonts
Find a file
2024-01-29 19:50:02 +02:00
fonts add the fonts 2024-01-12 14:46:42 +02:00
classify.py try different scoring approach and attempt an image construction based on the avg colors of the image 2024-01-29 19:50:02 +02:00
rasterizer.py fix some fonts not rendering fully(getting cropped) 2024-01-26 15:27:10 +02:00
README.md it seems i am missing something maybe? 2024-01-29 19:45:11 +02:00
TODO.md todo list 2024-01-20 21:37:07 +02:00

The ALGORITHM!!!

The general idea is to make a filter/mask of each of the corresponding fonts, and attempt to match them to the given letter.

Scoring system

The score each font will have will be based on the average color(acolor) underneath each font mask(might be different acolor for each mask).

After obtaining the acolor for a mask, the score will be calculated as the sum of the different pixel scores.

For a given pixel(po for the original image and pm for the mask, same position) its score will be calculated as follows:

v for variance

||po - acolor|| - ||v - acolor||
S_p = (|po - acolor| - v) x (0.5 - pm)

it is assumed that the font mask is of values between 0..1 and made as a 'white on black' text(so 1 is where the font is).

The given score calculation will take into consideration color variations of where the letter should be, while also taking into consideration the fact that the background should be of different color.

I seem to be missing something in the original idea, as some fonts gets better score on incorrect guesses with bigger color variance, and others get the smallest color variance on some other fonts.

Potential improvements

Some potential improvements would be:

  • Only consider pixels in the font and their outline. This might be helpful, as it would mean we dont care about pixels that are too far away, but assuming a good bounding boxes, it probably wont give much better results(or at all). Additionally, it poses some questions of which pixels should be considered, as both the text and mask are anti-aliased(thus having "weak" pixels)
  • Increase the area around the font. This idea can make sure we are not looking too inwards, although it shouldn't matter since we are looking to classify from a predefined set and not search them randomly, thus the potentially good information missed shouldn't matter that much (i.e. all scores will be 0.1 lower but the correct font shall still be picked)