Сенсорные системы, 2020, T. 34, № 3, стр. 217-225

Analysis of a stopping method for text recognition in video stream using an extended result model with per-character alternatives

K. B. Bulatov 12, B. I. Savelyev 12*, V. V. Arlazarov 12, N. V. Fedotova 2

1 Federal Research Center “Computer Science and Control” of RAS
117312 Moscow, 60-letiya Oktyabrya avenue 9, Russia

2 Smart Engines Service LLC
121205 Moscow, Skolkovo innovation center, Nobel st. 7, 132, Russia

* E-mail: bsaveliev@smartengines.ru

Поступила в редакцию 7.04.2020
После доработки 22.04.2020
Принята к публикации 29.04.2020


In the field of document analysis and recognition using mobile devices for capturing, and the field of object recognition in a video stream, an important problem is determining the time when the capturing process should be stopped. Efficient stopping influences not only the total time spent for performing recognition and data entry, but the expected accuracy of the result as well. This paper is directed on extending the stopping method based on the modelling of the next integrated recognition result, in order for it to be used within a string result recognition model with per-character alternatives. The stopping method and notes on its extension are described, and experimental evaluation is performed using the open datasets MIDV-500 and MIDV-2019. The method was compared with previously published methods based on input observations clustering. The obtained results indicate that the stopping method based on the next integrated result modelling allows to achieve higher accuracy, even when compared with the best achievable configuration of the competing methods, however the computations required are significant and more research should be targeted on optimizing its implementation.

Key words: recognition in video stream, mobile OCR, stopping rules, decision making, mobile document recognition, anytime algorithms

DOI: 10.31857/S0235009220030026

Список литературы

  1. Polevoy D.V. Ispol’zovanie mobil’nyh ustrojstv dlja vyjavlenija priznakov fabrikacii dokumentov, udostoverjajushhih lichnost' [Identity documents forgery detection with mobile devices]. Sensornye sistemy [Sensory systems]. 2019. T. 33 (2). C. 142–156 (In Russian).

  2. Slugin D., Arlazarov V.V. Poisk tekstovyh polej dokumenta s pomoshh’ju metodov obrabotki izobrazhenij [Text fields extraction based on image processing]. Trudy ISA RAN [Proc. Institute for Systems Analysis RAS]. 2017. V. 67 (4). P. 65–73 (In Russian).

  3. Arlazarov V.V., Bulatov K., Chernov T., Arlazarov V.L. MIDV-500: A Dataset for Identity Documents Analysis and Recognition on Mobile Devices in Video Stream. Computer optics. 2019. V. 43 (5). P. 818–824.

  4. Arlazarov V.V., Bulatov K., Manzhikov T., Slavin O., Janiszewski I. Method of determining the necessary number of observations for video stream documents recognition. In Proc. SPIE (ICMV 2017). 2018. V. 10696. https://doi.org/10.1117/12.2310132

  5. Berezovskij B.A., Gnedin A.V. Theory of choice and the problem of optimal stopping at the best entity. Automation and Remote Control. 1981. V. 42. P. 1221–1225.

  6. Bulatov K. A method to reduce errors of string recognition based on combination of several recognition results with per-character alternatives. Bulletin of the South Ural State University. Ser. Mathematical Modelling, Programming & Computer Software. 2019a. V. 12 (3). P. 74–88. https://doi.org/10.14529/mmp190307

  7. Bulatov K., Arlazarov V.V., Chernov T., Slavin O., Nikolaev D. Smart IDReader: Document recognition in video stream. In 14th International Conference on Document Analysis and Recognition (ICDAR). 2017. V. 6. P. 39–44. https://doi.org/10.1109/ICDAR.2017.347

  8. Bulatov K., Matalov D., Arlazarov V.V. MIDV-2019; challenges of the modern mobile-based document OCR. Twelfth International Conference on Machine Vision (ICMV 2019). 2020a. V. 11433. P. 717–722. https://doi.org/10.1117/12.2558438

  9. Bulatov K., Razumnyi N., Arlazarov V.V. On optimal stopping strategies for text recognition in a video stream as an application of a monotone sequential decision model. International Journal on Document Analysis and Recognition (IJDAR). 2019b. V. 22. P. 303–314. https://doi.org/10.1007/s10032-019-00333-0

  10. Bulatov K., Savelyev B., Arlazarov V.V. Next integrated result modelling for stopping the text field recognition process in a video using a result model with per-character alternatives. Proc. SPIE 11433, Twelfth International Conference on Machine Vision (ICMV 2019). 2020b. V. 114332M. https://doi.org/10.1117/12.2559447

  11. Chernyshova Y., Aliev M., Gushchanskaia E., Sheshkus A. Optical font recognition in smartphone-captured images and its applicability for id forgery detection. In Proc. SPIE (ICMV 2018). 2019. V. 11041. https://doi.org/10.1117/12.2522955

  12. Chow Y.S., Robbins H. A martingale system theorem and applications. Proceedings of the 4th Berkeley Symposium on Mathematics, Statistics and Probability. 1961. V. 1. P. 93–104. University of California Press, Berkeley, CA.

  13. Christensen S., Irle A. The monotone case approach for the solution of certain multidimensional optimal stopping problems. 2019. arXiv.1705.01763

  14. Dangiwa B.A., Kumar S.S. A business card reader application for iOS devices based on Tesseract. 2018 International Conference on Signal Processing and Information Security (ICSPIS). 2018. P. 1–4. https://doi.org/10.1109/CSPIS.2018.8642727

  15. Esser D., Muthmann K., Schuster D. Information extraction efficiency of business documents captured with smartphones and tablets. In Proceedings of the 2013 ACM Symposium on Document Engineering. 2013. P. 111–114. ACM, New York, NY, USA. https://doi.org/10.1145/2494266.2494302

  16. Ferguson T.S. Optimal stopping and applications. 2006. URL: https://www.math.ucla.edu/~tom/Stopping/Contents.html (accessed 03.05.2020).

  17. Ferguson T., Klass M. House-hunting without second moments. Sequential Analysis. 2010. V. 29 (3). P. 236–244. https://doi.org/10.1080/07474946.2010.487423

  18. Fiscus J.G. A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER). In IEEE Workshop Automatic Speech Recognition and Understanding. 1997. P. 347–354. https://doi.org/10.1109/ASRU.1997.659110

  19. Llobet R., Cerdan-Navarro J., Perez-Cortes J., Arlandis J. OCR post-processing using weighted finite-state transducers. In 2010 20th International Conference on Pattern Recognition. 2010. P. 2021–2024. https://doi.org/10.1109/ICPR.2010.498

  20. Povolotskiy M., Tropin D. Dynamic programming approach to template-based OCR. In Proc. SPIE (ICMV 2018). 2019. V. 11041. https://doi.org/10.1117/12.2522974

  21. Ravneet K. Text recognition applications for mobile devices. Journal of Global Research in Computer Science. 2018. V. 9(4). P. 20–24.

  22. Skoryukina N., Shemiakina J., Arlazarov V.L., Faradjev I. Document localization algorithms based on feature points and straight lines. In Proc. SPIE (ICMV 2017). 2018. V. 10696. https://doi.org/10.1117/12.2311478

  23. Smith R. An overview of the Tesseract OCR engine. In Proceedings of the Ninth International Conference on Document Analysis and Recognition (ICDAR 2007). 2007. V. 02. P. 629–633.

  24. Van Phan T., Cong Nguyen K., Nakagawa M. A nom historical document recognition system for digital archiving. International Journal on Document Analysis and Recognition (IJDAR). 2016. V. 19 (1), P. 49–64. https://doi.org/10.1007/s10032-015-0257-8

  25. Yujian L., Bo L. A normalized levenshtein distance metric. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2007. V. 29 (6). P. 1091–1095. https://doi.org/10.1109/TPAMI.2007.1078

  26. Zilberstein S. Using anytime algorithms in intelligent systems. AI Magazine. 1996. V. 17 (3). P. 73–83. https://doi.org/10.1609/aimag.v17i3.1232

Дополнительные материалы отсутствуют.