Our brains are wired to detect familiar shapes by noticing features such as lines, curves, and angles, and comparing them against stored memories.
Optical Character Recognition (OCR) is a technology that allows machines to interpret text from images, scanned documents, or photos.
When passing in an image that contains text within, AI models incorporate something called a convolutional neural network.
A CNN consists of two main components: Convolution and pooling that are usually implemented multiple times to continue reducing images into more compact sizes. After this step, we get to the flattening layer.
To build our CNN, we can use popular frameworks like PyTorch or TensorFlow, and the Mnist dataset which provides unique images of numbers that challenge our model's predictions.
We start by importing libraries that store data and help us build the CNN. We then define our convolutional neural network.
The built-in CrossEntropyLoss function helps us find the error in the neural network's guesses. Error measures how far our prediction is off from the real answer and helps the neural network eventually get the right answer if it is mistaken.
We can set epochs and learning rates to ensure our model is more accurate, even though it takes longer to learn.
An optimizer helps update the model's parameters based on the loss and accuracy of the model.
The author of this article is a 15-year-old high school student who is fascinated by the power of AI and its impact on our society.