Ever wondered what a DjVu file is?
The DjVu file format usually uses the extension .djvu and was developed as an alternative to the Adobe PDF format in 1996. Due to advanced compression algorithms DjVu is optimized for scanned documents which contain both pictures and text. In most cases DjVu files are more compact than comparable PDFs.
This was the short answer to your question. However, most likely you arrived at this page since you came across an DjVu file recently and wonder how to open it. Therefore, the first part of this article will provide you with all tools and strategies to open a DjVu file. But also if you are looking into deeper insights about DjVu you are at the right place here: The second part of this article will provide you with historic and technical background information about this interesting file format so stay tuned!
How to Open a DjVu file?
|Internet media type||
|Developed by||AT&T Labs – Research|
- 1. For converting any DjVu file to PDF you can directly use our DjVu to PDF converter.
- 2. If you prefer to install software to directly view your DjVu on the other hand you might be interested in our up-to-date list of DjVu readers.
Where did the DjVu file format come from?
The DjVu format was developed as an alternative to the PDF format in 1998 at AT&T Labs where some ground-breaking inventions like the transistor were done. The main contributors to the developments of DjVu were Yann LeCun, Léon Bottou, Patrick Haffner, and Paul G. Howard and the leading idea behind the development of DjVu was to create a file format which is optimized for scanned documents which contain both pictures and text. A key requirement here was that the new file format performs better than PDF for this kind of documents. A key advantage of DjVu is the limited file size of DjVu files. Therefore, it is frequently used for the distribution of scanned documents on the web. In contrast to PDF it is an open file format which means that it can be used by both open source software and proprietary software without any charge. The DjVu Format usually uses the extension .djvu or sometimes also only .djv.
Why are DjVu files special?
DjVu files use advanced compression technologies which are about 5 to 10 times better then those of JPEG and Tiff. A scanned page in color (resolution 300 DPI) with a file size of lets say about 25 MB can be easily compressed to only 100 kB (!) using DjVu. All DjVus can be equipped with a text-layer to make them searchable. These searchable DjVus behave very similar to PDF documents.
A key to achieve this excellent compression is so-called multi-scale bicolor clustering which allows a foreground/background mask separation that is way more general than the standard text/image segmentation. Along with a set of soft pattern matching algorithms, the JBIG2 compression which is used by DjVu beats the JBIG1 compression, which has been the standard for bi-level images for a long time, by a factor of two. The principle behind the JBIG2 encoding is the following: First the method identifies nearly identical shapes on the page, such as multiple occurrences of a particular character in a given font, style, and size. Then it compresses the bitmap of each unique shape separately, and then encodes the locations where each shape appears on the page. Like that similar shapes are only compressed once instead of multiple times which explains the advantage in terms of file size DjVu files usually show.
Further key components of the compression technique used by DjVu are a multi-scale successive projections algorithm and the so called ZP-coder.
DjVu files with hidden OCR Layers
Up to now one can think of a DjVu files as a loose collection of rastered images which do not contain any searchable text information and therefore appear difficult to handle. So we have to get used to the fact that PDF is the more handsome format without discussion? Of course not! The authors of DjVu were clever enough to find a smart work-around here: In ordre to make DjVu files searchable and therefore behave very similar to PDFs they added a hidden OCR layer to the definition of the file format. This is a very economic way of providing the text information in searchable way on the one hand and keeping a strict separation between the visual appearance of the document and the content which can be searched by the reader. Most DjVu files which are circulating in the web contain such a text layer. The main difference between DjVu and PDF is that the DjVu format is a raster image format while the PDF format is a scalable vector file format. This trick even allows to copy and paste text easily from any DjVu, which is equipped with such a layer, like one is used from dealing with PDFs.
Licensing and Adaption of DjVu
DjVu appeared first as an open-source implementation which was named "DjVuLibre" and used the GNU General Public License. However, the copy rights to the commercial developments of the encoding software have been transferred to several different companies over the years, including AT&T Corporation, LizardTech, Celartem among others. Although PDF is used more frequently than DjVu despite some experts being of the opinion that DjVu is in fact the better format for documents due to the superior compression algorithms, DjVu reached a considerable level of acceptance because of this open-source licensing.
Since DjVu was developed at the peak of the age of digitalization where many books were scanned still many scanned documents and books across the web are using DjVu. Furthermore, in 2002 the Internet Archive with its Million Book Project which provides millions of scanned public-domain books also decided to support DjVu along with PDF.
The technical file specifications of DjVu
DjVu was originally derived from the Interchange File Format (IFF) which is based on based on hierarchically organized chunks. Like it is the case for IFF, its structure is preceded by a 4-byte AT&T magic number. This identifier is followed by a marker indicating if one has to do with a single-page (DJVU)or a multi-page document (DJVM), respectively. In case you want to create DjVu files yourself you can use a PDF to DjVu converter. Going into more detail here would certainly go beyond the scope of this article. Another important specification, however, is the internet mime-type for DjVu which image/vnd.djvu or image/x-djvu. The current version of DjVu is Version 26 which was released more than 10 years ago.
The Future of DjVu
Although there has not been much progress on the development of DjVu during the last years, number of DjVus which are produced has been increasing ultimately again. This may be due to the most compelling strength of DjVu files, namely their incredibly compact file size. In times where the world wide web is more and more sticking to mobile devices and bandwidth is still a cost factor using the superior compression of DjVu may help to save time and money.Meanwhile, there exist also Apps which can be used to display DjVus on smart phones and tablets. Of course this is pure speculation but maybe the future of DjVu might be brighter then its past. In fact the name DjVu implements some hidden message like since its inspired by the French expression DjVu [deʒaˈvy] which means something like “already seen”. Since you are now familiar with DjVu you can be quite sure that you will see it again very soon as long as you are moving in the world of the web.