I'm experimenting, but perhaps someone else has done it before and could offer a word of advice:
I'm making PDFs out of scans of printed pages. The pages are mostly text and line drawings. Black on white (although there is some gray shadings). I can make PDFs, no problem, but they are fairly big for what they contain. I'd like to minimize them, yet keep them readable. I've tried a few things so far, but I'm still not sure what works best:
- I cropped most of the unneeded margins
- I downsampled scans to some 1000 pixels per A4 page width
- I have converted 24 bit colour scans to 8 bit grayscale
- I have stretch the contrast so that the paper is featureless white and the fonts and lines are mostly black, not gray
- I tried to compress files as PNG rather than JPG and then print them to PDF, but the resulting PDFs are still quite big
Is there some other trick that could help to make the PDFs smaller?
I may also mention that after a PDF is constructed from such scanned images, I OCR it (in order to be able to search the PDF for words), but that doesn't change the file size (actually it increases it slightly by the size of the new text layer), because the underlying scanned images are still there. But since most of the pages are white, I was hoping that the pages could be compressed more dramatically than they presently are.
Cheers!
Minimizing PDF file size - scanned text
Moderator: jsachs
-
- Posts: 1431
- Joined: April 25th, 2009, 12:56 am
- What is the make/model of your primary camera?: Fuji X-E2
- Contact:
Minimizing PDF file size - scanned text
Maciej Tomczak
Phototramp.com
Phototramp.com
-
- Posts: 453
- Joined: April 24th, 2009, 11:47 am
- What is the make/model of your primary camera?: Nikon D700
- Location: Salzburg / Austria
Re: Minimizing PDF file size - scanned text
Maciej,
What size do you get now, and what size do you hope to achieve?
I have done some scanning and PDF-printing of manuals and old documents, and for a single black and white page i'm in the range of 100 to 300 kB, roughly.
Depends of course greatly what the page contains.
Honestly, I have never put any thought or effort to reduce the size of the files.
I make them with usually 150 dpi, so for a A4 page with 210 mm with (roughly 8 inches) it gives 1200 pixel, its in the same range that you are.
What size do you get now, and what size do you hope to achieve?
I have done some scanning and PDF-printing of manuals and old documents, and for a single black and white page i'm in the range of 100 to 300 kB, roughly.
Depends of course greatly what the page contains.
Honestly, I have never put any thought or effort to reduce the size of the files.
I make them with usually 150 dpi, so for a A4 page with 210 mm with (roughly 8 inches) it gives 1200 pixel, its in the same range that you are.
Dieter Mayr
Re: Minimizing PDF file size - scanned text
During the PDF creation process images are resized and recompressed by the printer driver, so it does not matter much how small the images are when you start. The current Adobe PDF driver, under Print Properties, has a Paper/Quality tab - clicking the Advanced button lets you select different Print Quality settings. Reducing the print dpi should reduce the file size considerably. Some older PDF printer drivers used to have separate quality sliders for images and text.
Jonathan Sachs
Digital Light & Color
Digital Light & Color
-
- Posts: 1431
- Joined: April 25th, 2009, 12:56 am
- What is the make/model of your primary camera?: Fuji X-E2
- Contact:
Re: Minimizing PDF file size - scanned text
Thanks.
I tried a few different PDF printer drivers and it looks like most of them have separate settings for 3 different types of images: colour, grayscale and monochrome (which I assume translates to 24,8, and 2 bits colour depths?). Downsampling level and compression level can often be adjuste for each image type separately.
Now, when PWP Album prints to such printer, what output image type does it actually sent? Does it always sent 24bit image regardless of what is the bit depth of the input images going into the Album?
The practical issue is: if I Convert 24 bit image to either 8 bit or 2 bit in PWP, make an Album, then print it to a PDF printer driver, which of the 3 image types settings in the printer driver should I fiddle with?
Cheers!
I tried a few different PDF printer drivers and it looks like most of them have separate settings for 3 different types of images: colour, grayscale and monochrome (which I assume translates to 24,8, and 2 bits colour depths?). Downsampling level and compression level can often be adjuste for each image type separately.
Now, when PWP Album prints to such printer, what output image type does it actually sent? Does it always sent 24bit image regardless of what is the bit depth of the input images going into the Album?
The practical issue is: if I Convert 24 bit image to either 8 bit or 2 bit in PWP, make an Album, then print it to a PDF printer driver, which of the 3 image types settings in the printer driver should I fiddle with?
Cheers!
Maciej Tomczak
Phototramp.com
Phototramp.com
Re: Minimizing PDF file size - scanned text
Actually, monochrome (i.e. only black or white) is 1 bit per pixel.
Album sends 24 bit, 8 bit, and 1 bit images to the printer driver as such. It converts 48 bit images to 24 bits and 16 bit images to 8 bits just prior to printing. (Of course the conversion is within the print stream; the original image is not affected.)
Incidentally one should not assume text is monochrome -- antialiasing makes use of grayscale gradations to make curves appear smooth.
Kiril
Album sends 24 bit, 8 bit, and 1 bit images to the printer driver as such. It converts 48 bit images to 24 bits and 16 bit images to 8 bits just prior to printing. (Of course the conversion is within the print stream; the original image is not affected.)
Incidentally one should not assume text is monochrome -- antialiasing makes use of grayscale gradations to make curves appear smooth.
Kiril
Kiril Sinkel
Digital Light & Color
Digital Light & Color