Minimizing PDF file size - scanned text
Posted: March 18th, 2014, 2:48 am
I'm experimenting, but perhaps someone else has done it before and could offer a word of advice:
I'm making PDFs out of scans of printed pages. The pages are mostly text and line drawings. Black on white (although there is some gray shadings). I can make PDFs, no problem, but they are fairly big for what they contain. I'd like to minimize them, yet keep them readable. I've tried a few things so far, but I'm still not sure what works best:
- I cropped most of the unneeded margins
- I downsampled scans to some 1000 pixels per A4 page width
- I have converted 24 bit colour scans to 8 bit grayscale
- I have stretch the contrast so that the paper is featureless white and the fonts and lines are mostly black, not gray
- I tried to compress files as PNG rather than JPG and then print them to PDF, but the resulting PDFs are still quite big
Is there some other trick that could help to make the PDFs smaller?
I may also mention that after a PDF is constructed from such scanned images, I OCR it (in order to be able to search the PDF for words), but that doesn't change the file size (actually it increases it slightly by the size of the new text layer), because the underlying scanned images are still there. But since most of the pages are white, I was hoping that the pages could be compressed more dramatically than they presently are.
Cheers!
I'm making PDFs out of scans of printed pages. The pages are mostly text and line drawings. Black on white (although there is some gray shadings). I can make PDFs, no problem, but they are fairly big for what they contain. I'd like to minimize them, yet keep them readable. I've tried a few things so far, but I'm still not sure what works best:
- I cropped most of the unneeded margins
- I downsampled scans to some 1000 pixels per A4 page width
- I have converted 24 bit colour scans to 8 bit grayscale
- I have stretch the contrast so that the paper is featureless white and the fonts and lines are mostly black, not gray
- I tried to compress files as PNG rather than JPG and then print them to PDF, but the resulting PDFs are still quite big
Is there some other trick that could help to make the PDFs smaller?
I may also mention that after a PDF is constructed from such scanned images, I OCR it (in order to be able to search the PDF for words), but that doesn't change the file size (actually it increases it slightly by the size of the new text layer), because the underlying scanned images are still there. But since most of the pages are white, I was hoping that the pages could be compressed more dramatically than they presently are.
Cheers!