Amazon’s New System for De-identifiying Medical Images


Amazon not long ago introduced a new system that can mark included protected health information (PHI) in medical photos and redact the PHI automatically to make patient no longer identifiable from the images.

Medical images typically contain the PHI of patients like names, birth dates, and related details. The PHI appears as plain text in the medical images. Medical images may be used for research, but it is first mandatory to get patient authorization. An option is to take away all identifying PHI from the images. This entails a manual process and can be an expensive and time-consuming process, especially when de-identifying a lot of images.

Amazon’s new Rekognition machine-learning service makes identifying the plain text in images and extracting it into a text file easier. After that, the text is inputted into the Amazon Comprehend Medical to find out if the text contained any PHI. Using Python code makes redacting PHI in the images very quick. This system can work with images having PNG, JPEG, and DICOM formats.

The system obtained a confidence score that reflects the precision of the identified entity, which is the basis of evaluations to make the proper identification of PHI. The user can specify the confidence level from 0.00 to 1.00. A confidence level of 0.00 will mean the redaction of all text identified in the images.

According to Amazon, the system helps healthcare organizations to de-identify a lot of images fast, efficiently, and affordably. Amazon added that the system can be used for batch processing a lot of images. Furthermore, a system with a Lambda function may be setup to quickly edit the PHI from new images that are loaded to an Amazon S3 bucket.