How to add text to a PDF using Python?

How can I add text to an existing PDF using Python? I am looking for a solution that works on both Windows and Linux, but if necessary, Linux-only will suffice. I have considered using pypdf and ReportLab, but neither allows for editing an existing PDF. What are the best options for this task, and what additional modules do I need to install?

Keyword: Python edit pdf

Hey, I’ve worked a lot with PDFs, and for anyone looking to Python edit PDF files by adding text, a combination of ReportLab and PyPDF2 works really well. Here’s a straightforward approach:

  1. Use PdfFileReader to read your existing PDF (referred to as input).
  2. Generate a new PDF containing the text you want to add with ReportLab and save it as a string object.
  3. Load this string object with PdfFileReader (let’s call it text).
  4. Use PdfFileWriter to create a new output PDF object (output).
  5. Merge text.getPage(0) onto the pages you want to modify with .mergePage().
  6. Finally, add the modified pages to the output document with .addPage() and save it.

Here’s a snippet to get started:

from PyPDF2 import PdfFileReader, PdfFileWriter
from reportlab.lib.pagesizes import letter
from reportlab.pdfgen import canvas
import io

# Create a new PDF with ReportLab
packet = io.BytesIO()
can = canvas.Canvas(packet, pagesize=letter)
can.drawString(100, 750, "This is the added text!")
can.save()
packet.seek(0)

# Read the generated PDF and the existing one
new_pdf = PdfFileReader(packet)
existing_pdf = PdfFileReader(open("existing_pdf.pdf", "rb"))
output_pdf = PdfFileWriter()

# Merge the PDFs
for page_num in range(existing_pdf.getNumPages()):
    page = existing_pdf.getPage(page_num)
    page.mergePage(new_pdf.getPage(0))
    output_pdf.addPage(page)

# Save the result
with open("output.pdf", "wb") as output_stream:
    output_pdf.write(output_stream)

This method is great for basic text additions, but I’d love to hear other ideas for tackling this problem.

Hey @dimplesaini.230, that’s a solid method! If you’re exploring alternatives to Python edit PDF, I’ve been using pdfrw lately. It’s incredibly flexible for merging PDFs or adding text while keeping things efficient. Here’s how you can do it:

  1. Use ReportLab to create the text you want to overlay.
  2. Read the existing PDF with PdfReader.
  3. Overlay the new text on the desired pages using PageMerge.
  4. Save the result with PdfWriter.

This approach is lightweight and works beautifully for adding content to specific sections of a PDF. Here’s a quick example:

from pdfrw import PdfReader, PdfWriter, PageMerge
from reportlab.pdfgen import canvas
from io import BytesIO

# Create the text overlay
packet = BytesIO()
can = canvas.Canvas(packet)
can.drawString(100, 750, "This is the new text!")
can.save()
packet.seek(0)

# Read the existing and generated PDFs
existing_pdf = PdfReader('existing_pdf.pdf')
new_pdf = PdfReader(packet)
writer = PdfWriter()

# Merge the new text with each page
for page in existing_pdf.pages:
    PageMerge(page).add(new_pdf.pages[0]).render()
    writer.addpage(page)

# Write the output
writer.write('output.pdf')

I’ve noticed this approach gives a bit more control over positioning and rendering text. Let me know how it compares with your experience!

Both your approaches are awesome, but for anyone who wants a simpler and more direct way to Python edit PDF files, I highly recommend PyMuPDF (fitz). It’s a fast library with built-in tools for inserting text without needing to create separate PDF overlays.

Here’s a super simple example:

import fitz  # PyMuPDF

# Open the existing PDF
pdf_document = fitz.open("existing_pdf.pdf")

# Add text to the first page
page = pdf_document[0]
page.insert_text((100, 100), "This is the new text!", fontsize=12, color=(0, 0, 0))

# Save the modified PDF
pdf_document.save("output.pdf")

This method is particularly helpful if you’re dealing with small edits like annotations or adding labels to a PDF. Plus, you can easily customize font size, color, and positioning without additional libraries.

What do you think? It’s great to see how many ways Python can handle PDF editing!