The website "djvu-soft.narod.ru." is not registered with uCoz.
If you are absolutely sure your website must be here,
please contact our Support Team.
If you were searching for something on the Internet and ended up here, try again:

About uCoz web-service

Community

Legal information

The DjvuOCR v2.2 beta program. The cvhtml2 utility

Вернуться к разделу "Программа DjvuOCR".


The DjvuOCR v2.2 beta program. The cvhtml2 utility


README for cvhtml2

Version 2.0

- added option '-j'

- Improved the processing of lines with hyphenated words at the end of line


Version 1.0

I use the program dtSearch to make a CD with a full-text search. Since DJVU files are not recognized by dtSearch, I made a utility that converts the OCR layer file into an HTML file with the recognized text. This HTML file can be stored within a ZIP file together with the book (dtSearch can search inside ZIP files). In this way you can have a large DJVU collection with full-text search. When dtSearch finds something within a ZIP file, you should load the corresponding DJVU file, with a suitable naming convention, for example,

myfile.djvu
myfile.djvu.zip

Usage:

cvthtml [-j] <in_file> <out_file>

    -j - glues together lines that appear to be parts of one paragraph. (i.e. removes CR/LF at the end of lines that do not end by
          a punctuation sign)

    in_file - a text file, result of FRFGrab.EXE or extracted form a DJVU file using the command

djvused -e output-txt Myfile.djvu > ocrfile.txt

                    Note: please check at the end of the file ocrfile.txt, whether there are any error messages from djvused.exe

    out_file - resulting HTML file in UTF8 encoding. This file can be directly viewed in a web browser.


Автор: gencho  djvuocr [почтознак] mail2world.com

Подготовил: monday2000.

9 марта 2007 г.

E-Mail  (monday2000 [at] yandex.ru)

Hosted by uCoz