Downsizing PNGs to PDF in Pillow

My question is about tuning image quality & filesize in a PNG to PDF conversion.

I'm moving a reveal.js slide deck from my site into PDF. For a host of reasons I've ended up scraping the slides, thus producing 115 PNG files. Lotta slides. Pillow is handling the processing. 115 PNG into a single PDF.

Out of the box the final PDF comes in about 10M.

converted_pngs = []
seed_image = ''
i = 0
for png in pngs:
   img = Image.open(cwd + "/scrapes/" + png)
   rgb = img.convert('RGB')
   if i > 0: 
       converted_pngs.append(rgb)
   else:
       seed_image = rgb
   i=i+1

seed_image.save(pdf.fullpath, 'PDF', resolution=100, save_all=True, append_images=converted_pngs)

I've been through the some of the Pillow docs, and a few howto's. I have been toying with 3 levers in my efforts to shrink the PDF while also maintaining readability.

  • downsizing dimensions using resize()
  • setting the resample= parameter in resize()
  • changing the resolution= parameter in the call to size()

A sample of what I'm up to as of now:

RESIZE_FACTOR = 0.50
RESAMPLE_ALGO = Image.Resampling.LANCZOS
PDF_RES = 100

converted_pngs = []
seed_image = ''
i = 0
for png in pngs:
    img = Image.open(cwd + "/scrapes/" + png)
    w, h = img.size
    less_img = img.resize((int(w*RESIZE_FACTOR), int(h*RESIZE_FACTOR)), resample=RESAMPLE_ALGO)
    rgb = less_img.convert('RGB')
    if i > 0: 
        converted_pngs.append(rgb)
    else:
        seed_image = rgb
    i=i+1

seed_image.save(pdf.fullpath, 'PDF', resolution=PDF_RES, save_all=True, append_images=converted_pngs)

My results aren't great.

This process feels more an art than a science, making this question a bit soft. I guess I'm hoping for leads or general guidelines. Also,

  • are there other levers I should explore? besides the resizing and the resampling I'm doing?
  • are there other tools besides Pillow?
  • or should I go back to the PNG capture stage and look into any pyautogui settings before I do any more slogging in this processing stage?

Many Thanks,



Comments

Popular posts from this blog

Spring Elasticsearch Operations

Network Error and Timeout on Authorize.net JS

Object oriented programming concepts (OOPs)