Skip to content

Missing details while redacting #3375

@isri-ram

Description

@isri-ram

Description of the bug

I'm trying to redact a particular text. it's redacted but the nearby details are deleted.
`
import fitz
import re

class Redactor:

# static methods work independent of class object
@staticmethod
def get_sensitive_data(lines):

	""" Function to get all the lines """
	
	for line in lines:
	
		# matching the regex to each line
		if re.search(rf"{'Lot:'}(.+)", line, re.IGNORECASE):
			search = re.search(rf"{'Lot:'}(.+)", line, re.IGNORECASE)
			yield search.group(1)

# constructor
def __init__(self, path):
	self.path = path

def redaction(self):

	""" main redactor code """
	
	# opening the pdf
	doc = fitz.open(self.path)
	
	# iterating through pages
	for page in doc:
	
		# _wrapContents is needed for fixing
		# alignment issues with rect boxes in some
		# cases where there is alignment issue
		# page._wrapContents()
		
		# getting the rect boxes which consists the matching email regex
		sensitive = self.get_sensitive_data(page.get_text("text").split('\n'))
		for data in sensitive:
			areas = page.search_for(data)
			
			# drawing outline over sensitive datas
			[page.add_redact_annot(area, fill = (0, 0, 0)) for area in areas]
			
		# applying the redaction
		page.apply_redactions()
		
	# saving it to a new pdf
	doc.save('redacted.pdf')
	print("Successfully redacted")

path = '../sample files/Property_Information_Report.v7.pdf'
redactor = Redactor(path)
redactor.redaction()
`
image

How to reproduce the bug

Any help would be appreciated.

PyMuPDF version

1.24.0

Operating system

Windows

Python version

3.12

Metadata

Metadata

Assignees

No one assigned

    Labels

    not a bugnot a bug / user error / unable to reproduce

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions