Comparison of optical character recognition software

This comparison of optical character recognition software includes:

  • OCR engines, that do the actual character identification
  • Layout analysis software, that divide scanned documents into zones suitable for OCR
  • Graphical interfaces to one or more OCR engines
  • Software development kits that are used to add OCR capabilities to other software (e.g. forms processing applications, document imaging management systems, e-discovery systems, records management solutions)
Sortable table
Name Founded year Latest stable version Release year License Online Windows Mac OS X Linux BSD Programming language SDK? Languages Fonts Output Formats Notes
Tesseract19853.05.022018ApacheNoYesYesYesYesC++, CYes100+[1]Any printed fontText, hOCR,[2] PDF, others with different user interfaces[3] or the APICreated by Hewlett-Packard; under further development by Google[4]
Readiris198616?Proprietary?YesYes???Yes100+[5]??Owned by Canon
CIB OCR [6]20112.08.002018FreewareYes[7]YesYesYesYesC++, Java, Python, Objective-CYesGerman, English, Spanish, Russian, Chinese, Japanese, Italian, FrenchAny printed fontText, hOCR, PDFCIB OCR supports more than 160 input formats
Screenworm20131.02014ProprietaryNoNoYesNoNoObjective-C++No57?TXTProduct of Funchip. Uses the Tesseract OCR-engine.
ExperVision[8] TypeReader & RTK19877.1.170.11252010ProprietaryYesYesYesYesYesC/C++Yes212618Has a Mobile and Embedded System version for iOS/Android/etc.
AliusDoc AD-SCI[9]20052.12015ProprietaryNoYesNoNoNoVB.NetFor ExtensionsAll ASCII-compatible languages?XML, PlainText, any other thru SDK extensionsMinimal need for post-sale Professional Services. Works with structured, semi-structured, and unstructured documents.
ABBYY FineReader1989142017-01-25ProprietaryYesYesYesYesYesC/C++Yes192[10]?DOC, DOCX, XLS, XLSX, PPTX, RTF, PDF, HTML, CSV, TXT, ODT, DjVu, EPUB, FB2[11]ABBYY also supplies SDKs for embedded and mobile devices. Professional, Corporate and Site License Editions for Windows, Express Edition for Mac.[12]
E-aksharayan 2010 Yes No Yes No 14 RTF, TXT, BRL
Asprise OCR SDK1998152015ProprietaryYesYesYesYesYesJava, C#,VB.NET, C/C++/DelphiYes20+[13]?Plain text, searchable PDF, XML[14]Java, C#, VB.NET, C/C++/Delphi SDKs for OCR and Barcode recognition on Windows, Linux, Mac OS X and Unix.[15]
Nicomsoft OCR SDK19995.52015ProprietaryNoYesNoYesNoC#, VB.NET, C++, Delphi, JavaYes25+[16]?Searchable PDF, Text, RTFC#, VB.NET, C++, Delphi, Java OCR tool for Windows and Linux.[17]
AnyDoc Software1989??ProprietaryNoYesNoNoNoVBScript???Works with structured, semi-structured, and unstructured documents.
LEADTOOLS[18]1990[19]19.02014ProprietaryYesYesYesYesNoC/C++, .NET, Objective-C, Java, JavaScriptYes56[20]Any printed fontPDF, PDF/A, DOC, DOCX, XLS, XPS, RTF, HTML, ANSI Text, Unicode Text, CSV[21]Supports Latin, Asian, Arabic, and MICR character sets.[18] For full page, zonal, and form image processing. Includes OCR, barcode, OMR and forms recognition.[22] ICR (handwritten text recognition) is supported.[23]
CuneiForm19961.12011-04-19BSD variantNoYesYesYesYesC/C++Yes28Any printed fontHTML, hOCR, native, RTF, TeX, TXT[24]Enterprise-class system, can save text formatting and recognizes complicated tables of any structure
OCR.space20153.022017GPLYesYesNoNoNoC#Yes23Any printed fontTXTWindows desktop software, Windows Store application and online web app - converts scanned documents to editable text documents using OCR.
SimpleOCR20023.52008ProprietaryNoYesNoNoNo????
Dynamsoft OCR SDK20038.22012ProprietaryYesYesNoNoNoC/C++Yes40+[25]?PDF, TXT
OmniPage1970s19.22015ProprietaryYesYesYesYesNoC/C++, C#[26]Yes125[27]Machine and handprinted fontsDOC/DOCX XLS/XLSX PPTX RTF PDF PDF/A Searchable PDF HTML Text XML ePUB MP3Product of Nuance Communications
Microsoft Office OneNote 20072011?2007ProprietaryNoYesNoNoNo????
FreeOCR?4.2August 2012ProprietaryNoYesNoNoNo????[28]
gImageReader[29]20093.2.992017-07GPLNoYesYesYesNoC++?100+Any printed fontTXT, PDF, hOCRuses Tesseract OCR engine
GOCR20000.502013GPLYes[30]YesYesYesYesC?20+?
Ocrad?0.25[31]2015-04-16GPLYesYesYesYesYesC++YesLatin alphabet?Command line
SmartScore199110.5.82015-07ProprietaryNoYesYesNoNo????For musical scores
Microsoft Office Document Imaging?Office 20072007ProprietaryNoYesNoNoNo????Uses OmniPage
OCR.net 2016 ? 2016 Proprietary Yes No No No No Java, C++, PHP, Objective-c No 100+ ? TXT, Searchable PDF Online service powered by PDF OCR X for conversions.
PDF OCR X 2008 2.0.22 2016 Proprietary No Yes Yes No No Java, C++, Objective-C No 100+ ? TXT, Searchable PDf Drag and drop UI.
Puma.NET??2009-10-29BSDNoYesNoNoNoC#Yes28Any printed font.NET OCR SDK based on Cognitive Technologies' CuneiForm recognition engine. Wraps Puma COM server and provides simplified API for .NET applications
ReadSoft???ProprietaryNoYesNoNoNo????Scan, capture and classify business documents such as invoices, forms and purchase orders integrated with business processes.
Scantron???ProprietaryNoYesNoNoNo????For working with localized interfaces, corresponding language support is required.
OCRFeeder2009-030.8.12014-12-22GPLNoNoNoYesNoPython???Features a full user interface and has a command-line tool for automatic operations. Has its own segmentation algorithm but uses system-wide OCR engines like Tesseract or Ocrad
OCRopus20071.3.32017-12-16ApacheNoNoYesYesYesPython?All languages using Latin script (other languages can be trained)Normal Latin script and Fraktur (other scripts can be trained)TXT, hOCR[32], PDF[33]Pluggable framework under active development, used for Google Books
MathOCR20140.0.32015GPLNoYesYesYesYesJava???HTML, LaTeXFeatures mathematical formula recognition and logical layout analysis, can use OCR engines like Tesseract or Ocrad as back-end.
MeOCR20121.0.02012FreewareNoYesNoNoNoC/C++/C#Yes28Any printed fontHTML, hOCR, native, RTF, TeX, TXTWindows application. Converts scanned documents to editable text documents using OCR and exports them to Microsoft Word with one click. Features a full user interface and also has a .NET Interface library[34] for developers.
Yunmai OCR SDK20021.02013ProprietaryYesYesYesYesYesJava, C++, C, object pascal, objective-CYes14Any printed fontTXT, PDFHas the advantage of Chinese characters recognition.[35]
Anyline SDK 2013[36] 3.5.1[37] 2016[37] Free non-commercial use[38] No No* No* No* No* Java (Android), Objective-C & Swift (iOS), C# (Windows Phone, Xamarin), JavaScript (Cordova)[39] Yes[38] 2 (German, English) Any printed trainable font[40] Plain text, verification image *Customizable mobile OCR SDK for Android, iOS, Windows Phone, Smart glasses (Google Glass, Epson Moverio,...)
Name Founded year Latest stable version Release year License Online Windows Mac OS X Linux BSD Programming language SDK? Languages Fonts Output Formats Notes

References

  1. Based on count of language training files for version 3.04. Available at the download page.
  2. Usage explained in the Tesseract Readme and FAQ
  3. Such as ODF with OCRFeeder
  4. "GitHub - tesseract-ocr/tesseract: Tesseract Open Source OCR Engine (main repository)". Retrieved 2018-01-17.
  5. http://www.irislink.com/EN-GB/c1462/Readiris-16-for-Windows---OCR-Software.aspx
  6. "CIB OCR". cib.de. 2018-10-01. Retrieved 2018-10-01.
  7. "CIB doXiview". cib.de. 2018-10-01. Retrieved 2018-10-01.
  8. "OpenRTK – ExperVision OCR SDK | OCR Software, OCR SDK & Toolkit, OCR Service – ExperVision OCR". Expervision.com. Retrieved 2013-09-12.
  9. "AliusDoc AD-SCI". AliusDoc.com. Retrieved 2015-10-16.
  10. "ABBYY FineReader 14: Technical Specifications". Finereader.abbyy.com. Retrieved 2017-02-23.
  11. "ABBYY FineReader 11: Technical Specifications". Finereader.abbyy.com. Retrieved 2013-09-12.
  12. "Top OCR Software". Ocrworld.com. 2010-03-30. Retrieved 2013-09-12.
  13. "Asprise OCR SDK Features". asprise.com. Retrieved 2014-06-21.
  14. "Asprise Java OCR Library Features". asprise.com. Retrieved 2014-06-21.
  15. "Asprise Java, C#/VB.NET OCR API". asprise.com. 2015-11-19. Retrieved 2015-11-19.
  16. "Nicomsoft OCR SDK Features". nicomsoft.com. Retrieved 2015-01-08.
  17. "Nicomsoft OCR, C#/VB.NET OCR API". nicomsoft.com. 2015-01-08. Retrieved 2015-01-08.
  18. 1 2 "Ocr Sdk". Leadtools. Retrieved 2013-09-12.
  19. "LEAD Technologies, Inc. Corporate Information". Leadtools.com. Retrieved 2013-09-12.
  20. "Ocr Sdk". Leadtools. Retrieved 2013-09-12.
  21. "OCR SDK Output Formats". Leadtools. Retrieved 2013-09-12.
  22. "LEADTOOLS Recognition Imaging Developer Toolkit". Leadtools.com. Retrieved 2013-09-12.
  23. "Icr Sdk". Leadtools. Retrieved 2013-09-12.
  24. Debian manual page for Cuneiform for Linux version 1.1.0
  25. "OCR SDK Language Packages Download". Dynamsoft.com. Retrieved 2013-09-12.
  26. "OmniPage CSDK - OCR Document Capture Toolkit | Document Imaging & OCR". Nuance. Retrieved 2013-09-12.
  27. "OmniPage Standard Document Conversion". Nuance. Retrieved 2014-02-25.
  28. "Free OCR Software - Optical Character Recognition Software for Windows import from PDF and Twain Scanners". Paperfile.net. Retrieved 2013-09-12.
  29. "gImageReader". github.com. Retrieved 2018-03-25.
  30. "GOCR". Jocr.sourceforge.net. Retrieved 2013-09-12.
  31. Diaz, Antonio (2015-04-16). "GNU Ocrad 0.25 released" (Mailing list). info-gnu.
  32. OCRopus includes the ocropus-hocr tool which produces hOCR from the recognition results.
  33. In combination with the hocr-tools
  34. "MeOCR .NET Library".
  35. "List of Yunmai OCR SDKs". yunmai.com. Retrieved 2015-07-12.
  36. "Company | Anyline". Anyline. 2016-06-30. Retrieved 2016-06-30.
  37. 1 2 "Release Notes Archives - ANYLINE". ANYLINE. Retrieved 2016-06-30.
  38. 1 2 "anyline". npm. Retrieved 2016-06-30.
  39. "API Reference". documentation.anyline.io. Retrieved 2016-06-30.
  40. "Fonts | Anyline". Anyline. 2016-06-30. Retrieved 2016-06-30.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.