Creating a custom OCR with Pytorch (2023)

  • Configure the data
  • Definition of our model
  • The loss of CTC
  • the training circuit
  • put everything together
    • Defining the hyperparameters
  • evaluate and test
  • Diploma

Creating a custom OCR with Pytorch (1)

So in this tutorial I will give you a basic code walkthrough to create a simple OCR. OCR, as you may know, stands for Optical Character Recognition or generally speaking for text recognition. Text recognition is one of the classic problems of computer vision and is still relevant today. One of the most important applications of text recognition is the digitization of old manuscripts. Physical copies of books and manuscripts are prone to quality degradation. Over time, printed characters begin to fade. A simple way to preserve these documents is to make a digital copy and store it in the cloud or on your local hard drive, which would ensure its continuity. Scanning the document or taking a photo seems like a viable alternative, but if you want to find and retrieve the document or perform some other editing action, OCR is the way to go. It can also be used in forensics related to handwriting recognition.

Okay, now that I've given you enough motivation to know why OCR matters, let me show you how to create one. Inside you will find the ipython notebook as well as the other dependenciesbuy back. So if you want to run the code on the fly, do it quicklyGit-Clone

This blog assumes that you are familiar with the Python and Pytorch deep learning framework. If not, I recommend going through Pytorch60 minute flashbook page. It provides a good introduction to the Pytorch framework. For more details, you should take the Fastai course.

To get you started, I'll start by listing some of the essential packages you would need to create your first OCR. As mentioned before, we will work with PyTorch 1.5 as it is one of the most efficient deep learning libraries on the market. The other packages are as follows:

  • MatplotlibGenericName
  • Tqdm
  • Text spacing
  • lmdb

You can install them via a pip or conda. I will also provide a requirements.txt which you can find in my github repository. make it easypip install -r Requirementsand you're good to go.

Configure the data

We start our project by importing the required libraries. But first we need data. Now you are free to use any data (as long as it is document related) and you may need to create your own data loader to do this. However, to keep things simple, we'll use a small package calledStandard, a synthetic image generator for OCR. All relevant information about this package can be found on yourGithub-Repository. You can generate printed and handwritten text images and add various types of noise and degradation to them. In this project I usedStandardto generate images of printed words from a single font. You can use any font you like. just download one.ttffile to your font and when creating the word images, be sure to include the-ftparameters as your source file.

You can generate the word pictures for training with the following commands:

trdg -i words.txt -c 20000 --output_dir data/train -ft your/fontfile

Here,-Crefers to the number of word images you wish to generate.words.txtFile contains our input word vocabulary while--output_dire-ftsee output or source file. Likewise, you can generate test word images to assess your OCR performance. Note, however, that the words for training and testing are mutually exclusive.

Now that we've generated the word images, let's display some images using matlplotlib%# TODO display images from the folder

Let's start by importing the libraries that we would need to create our OCR

(Video) Build a Custom OCR Model in TensorFlow: A Step-by-Step Tutorial

Object osObject SystemObject pdbObject sixObject arbitrarilyObject lmdbvon PIL Object BildObject stun if npObject mathematicsvon collections Object OrderedDictvon itertools Object ChainObject loggingObject torchObject tocha.nn if nnObject not working.Flashlight if Fvon Object recordvon Object samplerObject archvision.transformed if transformvon arch.optim.lr_scheduler Object CosinusglühenLR, PassoLRvon tocha.nn.utils.clip_grad Object clip_grad_norm_von Object random_splitvon src.utils.utils Object average meter, Evaluation, OCRLabelConvertervon src.utils.utils Object early stop, gmk isvon src.optim.optimizer Object STLRvon src.utils.utils Object Gaussianovon tqdm Object *

Next, let's create our data pipeline. We do this by inheriting the PyTorch Dataset class. The dataset class has some methods that we need to stick to, such as__len__e__titled__Method. O__len__method returns the number of items in our dataset while__titled_returns the data item for the passed index. For more information about the PyTorch Dataset class, see the official PyTorch documentation page.

You'll notice that we first convert each image to grayscale and then to a tensor. It then normalizes the images so that our input data falls within a range of [-1, 1]. We pass all these transformations to a list and then call the transformations.ComposeFunction provided by PyTorch. Otransforma.ComposeThe function applies each transformation in a predefined order.

classroom SynthDataset(record): def __Start__(Auto, choose): super(SynthDataset, Auto).__Start__() Auto.Away = os.Away.bring together(choose['Away'], choose['imgdir']) Auto.Pictures = is a list(Auto.Away) Auto.nAmostras = len(Auto.Pictures) F = Lambda X: os.Away.bring together(Auto.Away, X) Auto.image paths = List(Map(F, Auto.Pictures)) transform_list = [transform.greyscale(1), transform.totensor(), transform.Normalize((0,5,), (0,5,))] Auto.reshape = transform.Compose(transform_list) Auto.group_fn = SynthCollatorGenericName() def __len__(Auto): turning back Auto.nAmostras def __titled__(Auto, Index): claim Index <= len(Auto), 'index range error' image path = Auto.image paths[Index] image file = os.Away.base name(image path) Bild = Bild.Open(image path) with Auto.reshape It is NO none: Bild = Auto.reshape(Bild) Article = {'img': Bild, 'idx':Index} Article['Label'] = image share('_')[0] turning back Article 

As we train our mini-batch gradient descent model next, it is important that each image in the batch is the same shape and size. For this we define theSynthCollatorGenericNameclass that first finds the image with the maximum width in the stack and then fills the remaining images so that they have the same width. You might be wondering why we don't care about the height because when generating the images it uses theStandardpackage I set the height to 32 pixels.

classroom SynthCollatorGenericName(Object): def __call up__(Auto, Charge): Broad = [Article['img'].form[2] for Article em Charge] indices = [Article['idx'] for Article em Charge] Pictures =[len(Charge), Charge[0]['img'].form[0], Charge[0]['img'].form[1], maximal(Broad)], dtyp=torch.float32) for idx, Article em enumerate(Charge): attempt: Pictures[idx, :, :, 0:Article['img'].form[2]] = Article['img'] except: press(Pictures.form) Article = {'img': Pictures, 'idx':indices} with 'Label' em Charge[0].Chaves(): labels = [Article['Label'] for Article em Charge] Article['Label'] = labels turning back Article

Definition of our model

Now let's define our model. We used the Shi et al. proposed CNN LSTM-based architecture. in your excellent roleA continuously trainable neural network for image-based character string recognition and its application to in-scene text recognition. The authors used it for scene text recognition and showed through extensive experiments that they achieved significant gains in accuracy compared to all other methods that existed at the time.

Creating a custom OCR with Pytorch (2)

The figure above shows the architecture used in the article. The authors used a 7-layer convolution network with BatchNorm and ReLU. This was followed by a stacked RNN network consisting of two bidirectional LSTM layers. The convolutional layers acted as feature extractors while the LSTMs layers acted as sequence classifiers. The LSTM layers generate the probability associated with each output class at each time step. You can find more details in his document and I strongly encourage you to read it for a better understanding.

(Video) Optical Character Recognition with EasyOCR and Python | OCR PyTorch

The following code snippet was taken from itGithub-Repositorywhich provides a Pytorch implementation of your code.

classroom Bidirectional LSTM(nn.Module): def __Start__(Auto, NO, I'm not hiding, August): super(Bidirectional LSTM, Auto).__Start__() Auto.rnn = nn.LSTM(NO, I'm not hiding, bidirectional=TRUE) Auto.inclusion = nn.Linear(I'm not hiding * 2, August) def Advance payment(Auto, Verboten): Auto.rnn.flatten_parameters() recurring, _ = Auto.rnn(Verboten) T, B, H = recurring.Size() t_rec = recurring.see(T * B, H) exit = Auto.inclusion(t_rec) # [T * b, nAus] exit = exit.see(T, B, -1) turning back exitclassroom CRNN(nn.Module): def __Start__(Auto, choose, leaking relu=INCORRECT): super(CRNN, Auto).__Start__() claim choose['imgH'] % 16 == 0, 'imgH must be a multiple of 16' k = [3, 3, 3, 3, 3, 3, 2] PS = [1, 1, 1, 1, 1, 1, 0] ss = [1, 1, 1, 1, 1, 1, 1] nm = [64, 128, 256, 256, 512, 512, 512] cnn = nn.Sequentially() def convRelu(EU, batch normalization=INCORRECT): NO = choose['nCanais'] with EU == 0 Others nm[EU - 1] August = nm[EU] cnn.add_module('conv{0}'.Format(EU), nn.Conv2d(NO, August, k[EU], ss[EU], PS[EU])) with batch normalization: cnn.add_module('Stacking rule{0}'.Format(EU), nn.LoteNorm2d(August)) with leaking relu: cnn.add_module('repeat{0}'.Format(EU), nn.LeakyReLU(0,2, no place=TRUE)) Others: cnn.add_module('repeat{0}'.Format(EU), nn.continue(TRUE)) convRelu(0) cnn.add_module('Grouping{0}'.Format(0), nn.MaxPool2d(2, 2)) # 64x16x64 convRelu(1) cnn.add_module('Grouping{0}'.Format(1), nn.MaxPool2d(2, 2)) # 128x8x32 convRelu(2, TRUE) convRelu(3) cnn.add_module('Grouping{0}'.Format(2), nn.MaxPool2d((2, 2), (2, 1), (0, 1))) # 256x4x16 convRelu(4, TRUE) convRelu(5) cnn.add_module('Grouping{0}'.Format(3), nn.MaxPool2d((2, 2), (2, 1), (0, 1))) # 512x2x16 convRelu(6, TRUE) # 512x1x16 Auto.cnn = cnn Auto.rnn = nn.Sequentially() Auto.rnn = nn.Sequentially( Bidirectional LSTM(choose['hidden']*2, choose['hidden'], choose['hidden']), Bidirectional LSTM(choose['hidden'], choose['hidden'], choose['nClass'])) def Advance payment(Auto, Verboten): # Conversion functions conversion = Auto.cnn(Verboten) B, C, H, C = conversion.Size() claim H == 1, "conv height must be 1" conversion = squeeze(2) conversion = conversion.Exchange(2, 0, 1) # [w, b, c] # functions of rnn exit = Auto.rnn(conversion) exit = exit.Transport(1,0) #Tbh para bth turning back exit

The loss of CTC

Cool, now that we have our data and model pipeline ready, it's time to define our loss function, which in our case is the CTC loss function. We will use PyTorch's excellent CTC implementation. CTC stands for Connectionist Temporal Classification and was suggested by Alex Graves in his articleConnectionist temporal classification: labeling unsegmented sequence data with recurrent neural networks.

Honestly, the above work was a game changer for many string-based tasks like speech and text recognition. In all sequence-based tasks, it is important that the input and output labels are properly aligned. Proper alignment results in an efficient loss calculation between the network predictions and the expected output. In approaches based on segmentation, that is, when the input word or line has been segmented into its constituent characters, there is a direct one-to-one mapping between the segmented character images and the output labels. However, as you can imagine, getting these segmentations for each character can be a very tedious and time-consuming task. Therefore, CTC-based transcription layers have become the de facto choice for OCRs and speech recognition engines, as they allow lossy computation without explicit mapping between input and output. The CTC layer takes the output of the LSTMs and calculates a score using all possible orientations of the target labels. OCR is then trained to predict a sequence that maximizes the sum of all these scores.

If you want more complete details about the CTC layer, I suggest you check out the following blogs and lecture videos

classroom CustomCTCLoss(torch.nn.Module): # T x B x A => Softmax in Dimension 2 def __Start__(Auto, darken=2): super().__Start__() Auto.darken = darken Auto.ctc_loss = torch.nn.CTC loss(reduction='mean', zero_infinity=TRUE) def Advance payment(Auto, logs, labels, prediction_sizes, targets): ENV = 1e-7 Loss = Auto.ctc_loss(logs, labels, prediction_sizes, targets) Loss = Auto.disinfect(Loss) turning back Auto.debugging(Loss, logs, labels, prediction_sizes, targets) def disinfect(Auto, Loss): ENV = 1e-7 with Abdomen(Loss.Article() - hover('inf')) < ENV: turning back torch.zeros_like(Loss) with mathematics.It is a(Loss.Article()): turning back torch.zeros_like(Loss) turning back Loss def debugging(Auto, Loss, logs, labels, prediction_sizes, targets): with mathematics.It is a(Loss.Article()): press("Loss:", Loss) press("Logs:", logs) press("Bookmark:", labels) press("prediction_sizes:", prediction_sizes) press("Targets:", targets) Elevation Exception("NaN loss reached. But why?") turning back Loss

the training circuit

The code snippet above creates a wrapper around Pytorch's CTC Loss function. Basically it calculates the loss and passes it through an additional method calleddebugging, which looks for cases where loss becomes Nan.

shout afterJerin Phillipfor this code.

So far we have defined all the important components we need to create our OCR. We define the data pipeline, our model, and the loss function. Now it's time to talk about our training loop. The code below might look a bit complicated, but it provides a nice abstraction that is quite intuitive and easy to use. The code is based onPytorch lightingBolting plate model with some own modifications. :P

I'll give you a basic overview of what it does. Feel free to examine each method with the Python debugger. OOCRTrainerClass receives the training and validation data. It also takes into account the loss function, the optimizer, and the number of epochs needed to train the model. The training and validation load method returns the data loader for the training and validation data. Orun_batchThe method makes a direct pass to a stack of image-label pairs. It returns loss as well as character and word accuracy. Then we have the step function that does the backpropagation, computes the gradients, and updates the parameters for each batch of data. Also, we have those tootraining_endeend_validationMethods that calculate the average loss and accuracy for all batches after completing a single epoch

(Video) MMOCR - Text Detection, Text Recognition PyTorch toolbox demo #computervision #ocr

The methods defined are quite simple and I hope you get the hang of it quickly.

classroom OCRTrainer(Object): def __Start__(Auto, choose): super(OCRTrainer, Auto).__Start__() train = choose['data_train'] selection = choose['data_val'] Auto.Model = choose['Model'] Auto.criteria = choose['Criteria'] Auto.optimizer = choose['Optimizer'] Auto.the plan = choose['the plan'] Auto.Converter = OCRLabelConverter(choose['Alphabet']) Auto.appraiser = Evaluation() press('The schedule is {}'.Format(Auto.the plan)) Auto.Planer = CosinusglühenLR(Auto.optimizer, T_max=choose['mal']) Auto.batch size = choose['batch size'] Auto.tell = choose['eras'] Auto.seasons = choose['mal'] Auto.cuda = choose['cuda'] Auto.group_fn = choose['fn_grouping'] Auto.init_meters() def init_meters(Auto): Auto.avgTrainLoss = average meter("train loss") Auto.avgTrainCharAccuracy = average meter("Train Drawing Accuracy") Auto.avgTrainWordAccuracy = average meter("Train Word Accuracy") Auto.avgValLoss = average meter("loss of validity") Auto.avgValCharAccuracy = average meter("Validation Character Accuracy") Auto.avgValWordAccuracy = average meter("Word Accuracy Validation") def Advance payment(Auto, X): logs = Auto.Model(X) turning back logs.Transport(1, 0) def loss_fn(Auto, logs, Goals, pred_sizes, targets): Loss = Auto.criteria(logs, Goals, pred_sizes, targets) turning back Loss def stage(Auto): Auto.max_grad_norm = 0,05 clip_grad_norm_(Auto.Model.Parameter(), Auto.max_grad_norm) Auto.optimizer.stage() def agenda_lr(Auto): with Auto.the plan: Auto.Planer.stage() def _run_batch(Auto, Charge, accuracy report=INCORRECT, validation=INCORRECT): Verboten_, Goals = Charge['img'], Charge['Label'] Goals, lengths = Auto.Converter.encode(Goals) logs = Auto.Advance payment(Verboten_) logs = logs.coherent().CPU() logs = torch.nn.functional.log_softmax(logs, 2) T, B, H = logs.Size() pred_sizes = torch.LongTensor([T for EU em Area(B)]) Goals= Goals.see(-1).coherent() Loss = Auto.loss_fn(logs, Goals, pred_sizes, lengths) with accuracy report: Problems, Print = logs.maximal(2) Print = Print.Transport(1, 0).coherent().see(-1) sim_pres = Auto.Converter.decode(Print.Data, pred_sizes.Data, Cru=INCORRECT) ca = np.mean((List(Map(Auto.appraiser.character accuracy, List(zipper(sim_pres, Charge['Label']))))))) von = np.mean((List(Map(Auto.appraiser.word accuracy, List(zipper(sim_pres, Charge['Label']))))))) turning back Loss, ca, von def run_epoch(Auto, validation=INCORRECT): with NO validation: charger = Auto.train_dataloader() pbar = tqdm(charger, Description='Don't: [%D]/[%d] education'%(Auto.tell, Auto.seasons), leave=TRUE) Auto.Model.trem() Others: charger = Auto.val_dataloader() pbar = tqdm(charger, Description='validate', leave=TRUE) Auto.Model.Evaluation() exits = [] for batch_nb, Charge em enumerate(pbar): with NO validation: exit = Auto.training_step(Charge) Others: exit = Auto.step_validation(Charge) pbar.set_postfix(exit) exits.add to (exit) Auto.agenda_lr() with NO validation: Result = Auto.turn_end(exits) Others: Result = Auto.end_validation(exits) turning back Result def training_step(Auto, Charge): Loss, ca, von = Auto._run_batch(Charge, accuracy report=TRUE) Auto.optimizer.null_grad() Loss.return() Auto.stage() exit = OrderedDict({ 'Loss': Abdomen(Loss.Article()), 'train_ca': ca.Article(), 'train_wa': von.Article() }) turning back exit def step_validation(Auto, Charge): Loss, ca, von = Auto._run_batch(Charge, accuracy report=TRUE, validation=TRUE) exit = OrderedDict({ 'val_loss': Abdomen(Loss.Article()), 'val_like': ca.Article(), 'val_wa': von.Article() }) turning back exit def train_dataloader(Auto): #'Training data loader called') charger = torch.useful.Data.DataLoader( train, batch size=Auto.batch size, group_fn=Auto.group_fn, Mix=TRUE) turning back charger def val_dataloader(Auto): #'Val Data Loader called') charger = torch.useful.Data.DataLoader( selection, batch size=Auto.batch size, group_fn=Auto.group_fn) turning back charger def turn_end(Auto, exits): for exit em exits: Auto.avgTrainLoss.add to (exit['Loss']) Auto.avgTrainCharAccuracy.add to (exit['train_ca']) Auto.avgTrainWordAccuracy.add to (exit['train_wa']) train_loss_mean = Abdomen(Auto.avgTrainLoss.calculation()) train_ca_mean = Auto.avgTrainCharAccuracy.calculation() train_wa_mean = Auto.avgTrainWordAccuracy.calculation() Result = {'loss of turn': train_loss_mean, 'train_ca': train_ca_mean, 'train_wa': train_wa_mean} # resultado = {'progress_bar': tqdm_dict, 'log': tqdm_dict, 'val_loss': train_loss_mean} turning back Result def end_validation(Auto, exits): for exit em exits: Auto.avgValLoss.add to (exit['val_loss']) Auto.avgValCharAccuracy.add to (exit['val_like']) Auto.avgValWordAccuracy.add to (exit['val_wa']) val_loss_mean = Abdomen(Auto.avgValLoss.calculation()) val_as_mean = Auto.avgValCharAccuracy.calculation() val_wa_mean = Auto.avgValWordAccuracy.calculation() Result = {'val_loss': val_loss_mean, 'val_like': val_as_mean, 'val_wa': val_wa_mean} turning back Result

put everything together

Finally we have itStudentclassroom. It implements some more methods like thatsave on computereLaden. It also tracks losses and stores them in acsvFile. This is useful when we want to analyze the behavior of our training and validation loops. It initializes ourOCRTrainermodule with the necessary hyperparameters and then calls theadjustMethod that executes the training loop.

In addition to these methods, we have several helper methods such asOCRLabel_converter,Evaluationeaverage meter. I'm not including them in this notebook, I've written them to the file and am importing them from there. If you want to take a look, you can poke around in the file. All required documentation is included in the file itself.

classroom Student(Object): def __Start__(Auto, Model, optimizer, save path=none, continue=INCORRECT): Auto.Model = Model Auto.optimizer = optimizer path = os.Away.bring together(save path, 'best.ckpt') Auto.cuda = available() Auto.cuda_count = torch.cuda.number of devices() with Auto.cuda: Auto.Model = Auto.Model.cuda() Auto.Eras = 0 with Auto.cuda_count > 1: press("Let's Use", torch.cuda.number of devices(), "GPUs!") Auto.Model = nn.parallel data(Auto.Model) result = none with continue e os.Away.exist( path): Auto.checkpoint = torch.Laden( path) Auto.Eras = Auto.checkpoint['eras'] result=Auto.checkpoint['better'] Auto.Laden() Others: press('Control point does not exist') def adjust(Auto, choose): choose['cuda'] = Auto.cuda choose['Model'] = Auto.Model choose['Optimizer'] = Auto.optimizer logging.basicConfig(file names="%S/%s.csv" %(choose['log_dir'], choose['Name']), eben=logging.INFORMATION) Auto.Schutz = early stop( path, patience=15, wordy=TRUE, best result) choose['eras'] = Auto.Eras Trainer = OCRTrainer(choose) for Eras em Area(choose['eras'], choose['mal']): move_result = Trainer.run_epoch() result_value = Trainer.run_epoch(validation=TRUE) Trainer.tell = Eras Information = '%D,%,6F,%,6F,%,6F,%,6F,%,6F,%,6F'%(Eras, move_result['loss of turn'], result_value['val_loss'], move_result['train_ca'], result_value['val_like'], move_result['train_wa'], result_value['val_wa']) logging.Information(Information) Auto.val_loss = result_value['val_loss'] press(Auto.val_loss) with path: on computer(Eras) with Auto.Schutz.Start stop: press("early stop") break def Laden(Auto): press('load checkpoint on {} trained for {} epochs'.Format( path, Auto.checkpoint['eras'])) Auto.Model.load_state_dict(Auto.checkpoint['state_dict']) with 'opt_state_dict' em Auto.checkpoint.Chaves(): press('Load Optimizer') Auto.optimizer.load_state_dict(Auto.checkpoint['opt_state_dict']) def save on computer(Auto, Eras): Auto.Schutz(Auto.val_loss, Eras, Auto.Model, Auto.optimizer)

Defining the hyperparameters

We've come a long way now, and there's just one more hurdle to clear before we start training our model. We begin by defining our vocabulary, which are the alphabets that will serve as the output classes for our model. We define an appropriate name for this experiment, which also serves as the name of the folder where checkpoints and log files are stored. We also define the hyperparameters like stack size, learning rate, image height, number of channels, etc.

Next, we initialize our dataset class and split the data into training and validation. Then we start our model and CTCLloss and finally we call themaluno.fitFunction.

Once the training is over, we can find the saved model in theControl Points/NameBinder. We can load the model and evaluate its performance using test data or adjust it using some other data.

Alphabet = """Apenas thewigsofrcvdampbkuq.$A-210xT5'MDL,RYHJ"ISPWENj&BC93VGFKz();#:!7U64Q8?+*ZX/%"""argument = { 'Name':'exp1', 'Away':'Data', 'imgdir': 'trem', 'imgH':32, 'nCanais':1, 'hidden':256, 'nClass':len(Alphabet), 'lr':0,001, 'mal':4, 'batch size':32, 'save_dir':'Checkpoints', 'log_dir':'Historical', 'continue':INCORRECT, 'cuda':INCORRECT, 'the plan':INCORRECT }Data = SynthDataset(argument)argument['fn_grouping'] = SynthCollatorGenericName()train_split = int(0,8*len(Data))val_split = len(Data) - train_splitargument['data_train'], argument['data_val'] = random_split(Data, (train_split, val_split))press('Training data size:{}\NVal data size:{}'.Format( len(argument['data_train']), len(argument['data_val'])))argument['Alphabet'] = AlphabetModel = CRNN(argument)argument['Criteria'] = CustomCTCLoss()save path = os.Away.bring together(argument['save_dir'], argument['Name'])gmk is(save path)gmk is(argument['log_dir'])optimizer = torch.optimized.Adam(Model.Parameter(), lr=argument['lr'])apprentice = Student(Model, optimizer, save path=save path, continue=argument['continue'])apprentice.adjust(argument)
(Video) Extract Text From Images in Python (OCR)

evaluate and test

Once our model is trained, we can evaluate its performance against the test data. I wrote a separate functionget_accuracywhich takes the trained model and the test data and runs a run that gives us the logits. After getting the logits, we perform an argmax operation on each time step that we treat as our predicted class. Finally, we perform a decode operation that converts the token IDs to their respective class IDs. We compare the predicted string to its corresponding database, which gives us the accuracy. We do this for all images in our test data and calculate the average accuracy.

We also display 20 random images of our test data with their corresponding predicted label using matplotlib library

Object matplotlib.pyplot if pltvon archvision.utils Object make_grid
Device = torch.Device("cuda:0" with available() Others "CPU")def get_accuracy(argument): charger = torch.useful.Data.DataLoader(argument['Data'], batch size=argument['batch size'], group_fn=argument['fn_grouping']) Model = argument['Model'] Model.Evaluation() Converter = OCRLabelConverter(argument['Alphabet']) appraiser = Evaluation() labels, predictions, Pictures = [], [], [] for Repetition, Charge em enumerate(tqdm(charger)): Verboten_, Goals = Charge['img'].for(Device), Charge['Label'] Pictures.enlarge( squeeze().pull apart()) labels.enlarge(Goals) Goals, lengths = Converter.encode(Goals) logs = Model(Verboten_).Transport(1, 0) logs = torch.nn.functional.log_softmax(logs, 2) logs = logs.coherent().CPU() T, B, H = logs.Size() pred_sizes = torch.LongTensor([T for EU em Area(B)]) Problems, Pos = logs.maximal(2) Pos = Pos.Transport(1, 0).coherent().see(-1) sim_pres = Converter.decode(Pos.Data, pred_sizes.Data, Cru=INCORRECT) predictions.enlarge(sim_pres) # make_grid(imagens[:10], nrow=2) Figo=plt.Figure(fig size=(8, 8)) columns = 4 lines = 5 for EU em Area(1, columns*lines +1): Bild = Pictures[EU] Bild = (Bild - Bild.Minimum())/(Bild.maximal() - Bild.Minimum()) Bild = np.diversity(Bild * 255,0, dtyp=np.uint8) Figo.add_subplot(lines, columns, EU) plt.title(predictions[EU]) plt.axis('out of') plt.imshow(Bild) plt.Show() ca = np.mean((List(Map(appraiser.character accuracy, List(zipper(predictions, labels))))))) von = np.mean((List(Map(appraiser.word_accuracy_line, List(zipper(predictions, labels))))))) turning back ca, von
argument['imgdir'] = 'test'argument['Data'] = SynthDataset(argument)resume file = os.Away.bring together(argument['save_dir'], argument['Name'], 'best.ckpt')with os.Away.isfile(resume file): press('Loading the model%S'%resume file) checkpoint = torch.Laden(resume file) Model.load_state_dict(checkpoint['state_dict']) argument['Model'] = Model ca, von = get_accuracy(argument) press("Character Accuracy:%,2F\Nword accuracy:%,2F"%(ca, von))Others: press("=> no checkpoint found on '{}'".Format(save file)) press('Leave')
0%| | 0/2 [00:00<?, ?it/s]Loading model checkpoints/exp1/best.ckpt100%|██████████| 2/2 [00:00<00:00, 2.25 it/s]

Creating a custom OCR with Pytorch (3)

Character Accuracy: 98.89 Word Accuracy: 98.03
(Video) Develop Machine Learning Models - Predicting the numbers using OCR and PyTorch (Sample Video)


In this blog we saw how we can create an OCR from scratch using PyTorch. For this we define the three basic modules, namely the data module, the model and a user-defined loss function. Then we tie a sleeve in the shape of a around the modulesOCRTrainerClass that handles forward and backward propagation and precisions. We also define oursStudentclass containing theadjustMethod that initializes theTrainerclass and starts training the OCR model and then saves it. Finally, we test our model on a retained sentence and evaluate its performance in terms of character and word accuracy.


How do I make Tesseract OCR more accurate? ›

For better accuracy images are scaled at least 300 DPI(Dots Per Inch). Keeping DPI lower than 200 will give unclear and incomprehensible results while keeping the DPI above 600 will unnecessarily increase the size of the output file without improving the quality of the file.

How accurate is Tesseract OCR? ›

The method was evaluated using Tesseract and compared to ABBYY FineReader and HANWANG OCR. The following results are presented for Tesseract: the original set of samples achieves a precision of 0.907 and 0.901 recall rate, while the preprocessed set leads to a precision of 0.929 and a recall of 0.928. Thompson et al.

Is there a better OCR than Tesseract? ›

The 7 best OCR software are Nanonets, ReadIRIS, ABBYY FineReader, Kofax OmniPage, Adobe Acrobat Pro DC, Tesseract, and SimpleOCR.

How can I improve my OCR results? ›

5 Ways to Improve OCR Accuracy
  1. Good Quality of Source Images. Before using OCR, make sure you can read the images with your own eyes. ...
  2. Right Size of Images. ...
  3. Remove Noise / Denoise. ...
  4. Increase Image Contrast. ...
  5. De-skew Original Source.

How to build an OCR in Python? ›

Python OCR is a technology that recognizes and pulls out text in images like scanned documents and photos using Python. It can be completed using the open-source OCR engine Tesseract. We can do this in Python using a few lines of code. One of the most common OCR tools that are used is the Tesseract.

Which deep learning model is best for OCR? ›

CNNs are one of the best techniques to use for deep learning OCR for the step of text detection. Convolution layers are commonly used for image classification tasks due to their efficiency in feature extraction. They allow detecting the meaningful edges in an image and (on a higher level) shapes and complex objects.

How to build OCR engine in Python from scratch? ›

As you can see it contains some text that we can't directly copy and paste. We will have to convert this text in the image into an editable text. This is where OCR comes into the play.
Implementing OCR
  1. tesseract-ocr.
  2. libtesseract-dev.
  3. pytesseract.
  4. OpenCV.
  5. pandas.
  6. numpy.
  7. matplotlib.
Feb 28, 2021

Which programming language is best for OCR? ›

OCR Tools and Libraries

Not only is Python an easy (and forgiving) language to code in, but it's also used by many computer vision and deep learning practitioners, lending itself nicely to OCR.

What is the best OCR algorithm? ›

Popular answers (1) The tesseract algorithm is available on Google Code, and is one of the best open source OCR out there. I have attached the link. Join ResearchGate to ask questions, get input, and advance your work.

Does OCR use CPU or GPU? ›

You might maximize memory size, but if it takes too long to write the images to memory, it's never utilized. OCR is a CPU HOG. It will take 99% of any single thread when it is running, so putting energy into a more powerful CPU with more threads is not a bad idea.

Is there anything better than Tesseract? ›

TensorFlow, OpenCV, Google Cloud Vision API, Amazon Rekognition, and Tesseract. js are the most popular alternatives and competitors to Tesseract OCR.

What is the limitation of Tesseract? ›

Limitations of Tesseract OCR

If the image is noisy and the separation of foreground and background is not significant, it can generate errors. It does not recognize handwritten text. It introduces garbage characters in some cases. It cannot recognize text of languages that are out of its scope.

Which OCR is best in Python? ›

Pytesseract or Python-tesseract is an OCR tool for python that also serves as a wrapper for the Tesseract-OCR Engine. It can read and recognize text in images and is commonly used in python ocr image to text use cases.

What is the most accurate OCR open source? ›

Tesseract is the most acclaimed open-source OCR engine of all and was initially developed by Hewlett-Packard. It's a free software under Apache license that's sponsored by Google since 2006. Tesseract OCR engine is considered one of the most accurate, freely available open-source systems available.

What is the most accurate OCR free? ›

List of the Top Free OCR Software:
  • #1) Nanonets.
  • #2) Adobe Acrobat.
  • #3) Filestack Capture.
  • #4) ABBYY Cloud Reader.
  • #5) OmniPage Ultimate.
  • #6) OnlineOCR.
  • #7) Cisdem pdf converter.
  • #8) Easy Screen OCR.
4 days ago

What is one of the disadvantages of using OCR? ›

6 Glaring Limitations of OCR for Identity Verification
  • Structuring the Data Involves More than Just OCR. ...
  • OCR Must Combine with Image Rectification. ...
  • IDs with Colored Backgrounds Can Be Problematic for OCR. ...
  • Glare and Blur Can Cause Mistakes. ...
  • Webcams are a Challenge for Traditional OCR. ...
  • OCR May Be Challenged by Some ID Subtypes.
Jun 25, 2019

Why is OCR so difficult? ›

Complex documents

Pages with a lot of design features — including elements as simple as colored backgrounds — can make it difficult for OCRs to recognize characters.

What is the failure rate of OCR? ›

Obviously, the accuracy of the conversion is important, and most OCR software provides 98 to 99 percent accuracy, measured at the page level. This means that in a page of 1,000 characters, 980 to 990 characters will be accurate. In most cases, this level of accuracy is acceptable.

What image format is best for OCR? ›

We always recommend feeding the OCR engine images saved with the following specifications:
  • High resolution (300 DPI is good).
  • Saved as 1-bit (black and white) mode.
  • Saved in a lossless format, such as LZW TIFF or CCITT Group 4 TIFF.
Feb 8, 2007

What is the difference between OpenCV and Tesseract? ›

OpenCV is a library for CV, used to analyze and process images in general. Tesseract is a library for OCR, which is a specialized subset of CV that's dedicated to extracting text from images.

How does Python calculate OCR accuracy? ›

Measuring OCR accuracy is done by taking the output of an OCR run for an image and comparing it to the original version of the same text. You can then either count how many characters were detected correctly (character level accuracy), or count how many words were recognized correctly (word level accuracy).

Which algorithm is used in Tesseract OCR? ›

The line finding algorithm is one of the few parts of Tesseract that has previously been published [3]. The line finding algorithm is designed so that a skewed page can be recognized without having to de-skew, thus saving loss of image quality.

Can TensorFlow be used for OCR? ›

This reference app demos how to use TensorFlow Lite to do OCR. It uses a combination of text detection model and a text recognition model as an OCR pipeline to recognize text characters.

Which neural network is used in OCR? ›

The OCR can be implemented by using Convolutional Neural Network (CNN), which is a popular deep neural network architecture.

What is the biggest deep learning model? ›

GPT-3's deep learning neural network is a model with over 175 billion machine learning parameters. To put things into scale, the largest trained language model before GPT-3 was Microsoft's Turing Natural Language Generation (NLG) model, which had 10 billion parameters.

Is Google Tesseract free? ›

Tesseract is an optical character recognition engine for various operating systems. It is free software, released under the Apache License.

Is NLP used for OCR? ›

Natural Language Processing and OCR are a particularly powerful combination. NLP enriches the OCR process, allowing relevant information to be extracted from documents, which is often the main purpose of using OCR in the first place.

How do you make an object recognition software in Python? ›

Steps to download the requirements below:
  1. Run The following command in the terminal to install opencv. pip install opencv-python.
  2. Run the following command to in the terminal install the matplotlib. pip install matplotlib.
  3. To download the haar cascade file and image used in the below code as a zip file click here.
Jan 4, 2023

Is OCR based on deep learning? ›

OCR, or optical character recognition, is one of the earliest addressed computer vision tasks, since in some aspects it does not require deep learning. Therefore there were different OCR implementations even before the deep learning boom in 2012, and some even dated back to 1914 (!).

Is OCR AI or ML? ›

Optical character recognition (OCR) is based on machine learning (ML) and computer vision. Machine learning (ML) is a subfield of artificial intelligence (AI).

Which language program is most difficult to write in computer? ›

Malbolge. Malbolge is the toughest programming language as it took at least two years to write the first Malbolge program.

Is Tesseract good for OCR? ›

While Tesseract is known as one of the most accurate free OCR engines available today, it has numerous limitations that dramatically affect its performance; its ability to correctly recognize characters in a scan or image.

What is the difference between OCR and EasyOCR? ›

Keras-OCR is image specific OCR tool. If text is inside the image and their fonts and colors are unorganized. Easy-OCR is lightweight model which is giving a good performance for receipt or PDF conversion. It is giving more accurate results with organized texts like PDF files, receipts, bills.

Does OCR use machine learning? ›

OCR is a Machine Learning and Computer Vision Task

Modern machine learning algorithms make the text recognition process more advanced and provide a higher level of recognition accuracy for most fonts, regardless of input data formats.

Can OCR detect images? ›

Optical Character Recognition (OCR)

The Vision API can detect and extract text from images. There are two annotation features that support optical character recognition (OCR): TEXT_DETECTION detects and extracts text from any image.

What is Tesseract OCR Python? ›

Pytesseract or Python-tesseract is an Optical Character Recognition (OCR) tool for python. It will read and recognize the text in images, license plates, etc. Here, we will use the tesseract package to read the text from the given image.

Why is it impossible to build a tesseract? ›

Is it possible to make a physical model of a tesseract? No, that's not possible because we live in a 3D world, we cannot construct a 4D model of an object. One way around this is a shadow. 3D things cast 2D shadows, so it stands to reason that while we can't make a 4D object like a Tesseract, we can make a 3D shadow.

How many languages can tesseract detect? ›

The Tesseract OCR engine supports multiple languages. To detect characters from a specific language, the language needs to be specified while creating the OCR engine itself. English, German, Spanish, French and Italian languages come embedded with the action so they do not require additional parameters.

What is so special about the tesseract? ›

The Space Stone allows the person who holds it to control the fabric of space and teleport anywhere in the universe. A blue cube called the Tesseract was built to contain the stone. The Tesseract spent much of its life on Asgard before it was brought to Earth for safekeeping.

How do I improve Tesseract OCR accuracy in Python? ›

OCR accuracy on Unclear image.
  1. fix DPI (if needed) 300 DPI is minimum.
  2. fix text size (e.g. 12 pt should be ok)
  3. try to fix text lines (de-skew and de-warp text)
  4. try to fix illumination of image (e.g. no dark part of image)
  5. Convert an image into gray scale.
  6. Binarize (Gray Scaled) and de-noise image.
Apr 26, 2022

Which Python certification is valuable? ›

The CEPP certification represents the most highly advanced level of certified Python knowledge available at present. The CEPP status is awarded to those individuals who complete the OpenEDG Python Institute General Programming certification program in its entirety.

Does Google use Tesseract? ›

Tesseract is the most prominent opensource OCR engine. Originally developed by Hewlett-Packard, it is now sponsored by Google.

What resolution is best for Tesseract? ›

Tesseract works best on images which have a DPI of at least 300 dpi, so it may be beneficial to resize images. For more information see the FAQ. “Willus Dotkom” made interesting test for Optimal image resolution with suggestion for optimal Height of capital letter in pixels.

What is the best image format for Tesseract OCR? ›

File Input Formats

Tesseract will only take image files for input. These include: TIFF (preferred)

How can image recognition accuracy be improved? ›

How to Improve the Accuracy of Your Image Recognition Models
  1. Get More Data. Deep learning models are only as powerful as the data you bring in. ...
  2. Add More Layers. ...
  3. Change Your Image Size. ...
  4. Increase Epochs. ...
  5. Decrease Colour Channels. ...
  6. Transfer Learning.
Nov 29, 2021

What is the best resolution for OCR? ›

The recommended resolution for scanning documents for optimal OCR accuracy is 300 dots per inch (dpi). However, if the text font size is particularly small (less than 10pt), a dpi of 400-600 may be best.

Can Tesseract run on GPU? ›

Using Tesseract with OpenCL. Normally Tesseract works with OpenCL Installable Client Drivers (ICD). It tests for available OpenCL drivers at runtime, so a Tesseract binary can work with different GPU hardware on different computers. All you have to do is installing the OpenCL driver for your GPU hardware.

Is Tesseract OCR deep learning? ›

The latest release of Tesseract 4.0 supports deep learning based OCR that is significantly more accurate. The OCR engine itself is built on a Long Short-Term Memory (LSTM) network, a kind of Recurrent Neural Network (RNN).

Is Tesseract and Tesseract OCR same? ›

What is Tesseract? Tesseract is an open-source OCR Engine that extracts printed or written text from images. It was originally developed by Hewlett-Packard, and development was later taken over by Google. This is why it is now known as “Google Tesseract OCR”.

Does OCR need GPU? ›

You do not need to do image preprocessing, which can be done automatically. By default, EasyOCR uses GPU for computing, which increases its OCR speed. If you want to use CPU mode, which is slower than Tesseract, you need to set gpu=false . You need a GPU accelerated environment if you want to use GPU.

Which algorithm is best for image recognition? ›

CNN is a powerful algorithm for image processing. These algorithms are currently the best algorithms we have for the automated processing of images. Many companies use these algorithms to do things like identifying the objects in an image. Images contain data of RGB combination.

Does increasing batch size increase accuracy? ›

Finding: higher batch sizes leads to lower asymptotic test accuracy. The x-axis shows the number of epochs of training. The y-axis is labelled for each plot. MNIST is obviously an easy dataset to train on; we can achieve 100% train and 98% test accuracy with just our base MLP model at batch size 64.

What is the best image recognition? ›

What Is the Best Image Recognition Software?
  • Meltwater Image Search: A Game-Changer in Social Media Monitoring.
  • Google Reverse Image Search: No Words Needed.
  • Clarifai: Data, Data, Data.
  • Imagga: Organize Away.
  • Amazon Rekognition: Make a Move.
  • WhatFontIs: Upload a Font & Let AI Do the Work.
  • Anyline: Scanning on the Go.
Dec 7, 2022

Does dpi matter with OCR? ›

The recommended resolution for best scanning results for OCR accuracy is 300 dots per inch (dpi). Brightness settings that are too high or too low can have negative effects on the accuracy of your image. A brightness of 50% is recommended. The straightness of the initial scan can affect OCR quality.

How do I know if my OCR is accurate? ›

Measuring OCR accuracy is done by taking the output of an OCR run for an image and comparing it to the original version of the same text. You can then either count how many characters were detected correctly (character level accuracy), or count how many words were recognized correctly (word level accuracy).


1. Demo | Numberplate OCR | PyTorch
2. Captcha recognition using PyTorch (Convolutional-RNN + CTC Loss)
(Abhishek Thakur)
3. AI Project: Building your own Optical Character Recognition System
(Anantharaman Narayana)
4. Nanonets - How to Train your own OCR Model
(Sarthak Jain)
5. Keras OCR - Reading Text from Images and Custom Models using Python
6. TensorFlow Step-by-Step Captcha solving tutorial with custom OCR model
(Python Lessons)
Top Articles
Latest Posts
Article information

Author: Gregorio Kreiger

Last Updated: 02/10/2023

Views: 6209

Rating: 4.7 / 5 (57 voted)

Reviews: 88% of readers found this page helpful

Author information

Name: Gregorio Kreiger

Birthday: 1994-12-18

Address: 89212 Tracey Ramp, Sunside, MT 08453-0951

Phone: +9014805370218

Job: Customer Designer

Hobby: Mountain biking, Orienteering, Hiking, Sewing, Backpacking, Mushroom hunting, Backpacking

Introduction: My name is Gregorio Kreiger, I am a tender, brainy, enthusiastic, combative, agreeable, gentle, gentle person who loves writing and wants to share my knowledge and understanding with you.