How to Harness AI/ML to Analyze Vast Customer Datasets While Adhering to Data Privacy Laws

Deep and actionable insights can be extracted by applying AI/ML to the vast customer data that is collected by biopharmaceutical companies, healthcare facilities, pharmacies, and others in the healthcare industry. These insights can enable biopharmaceutical companies to bring the latest product information to physicians and engage with patients to improve compliance with treatment regimens and outcomes—and to improve healthcare efficiency.

In this digital age, many biopharmaceutical companies engage healthcare providers through email and have this contact information on their CRM systems. This information includes the time, date, subject lines (short or long, informative or succinct), nature of the email text, and HCP responses, including when the email was opened or not, whether the links were clicked for more information, the type of information or follow-ups requested, and whether the HCP returned to open emails after a period of disengagement.

Without AI/ML, it would be very difficult to analyze such rich data with complex structure at the HCP level. Leveraging AI/ML, we can extract the patterns of HCPs’ responsiveness to various types of emails and discover insights as to when and how we should seek to engage with them. For example, through an AI/ML algorithm we developed to analyze such data for a biopharmaceutical company, we found that some physicians responded well to emails sent to them on Friday afternoons. While this may not be intuitive, further research validated the fact that many physicians were busy taking care of patients during weekdays and only opened email from biopharma companies on the weekend.

Adhering to Data Privacy Regulations

In some countries, physician-level prescription data is available, but it must be accessed through a third party. In this situation, biopharmaceutical companies could work with the third party to develop and train the AI/ML algorithms to analyze the data at physician level, and then send insights to the biopharmaceutical company at group level while ensuring the group size complies with domestic privacy regulations. For example, some countries require the group to include at least five physicians when sharing physician-level information. Alternatively, the third party that has access to the physician-level data could use AI/ML to generate synthetic data to mimic the characteristics of real physician-level data and send the synthetic data to the biopharmaceutical companies to train the AI/ML algorithm and extract the insights.

Under contemporary privacy laws, patient-level data, i.e., personal health information (also referred to as “protected health information”), can be used under certain conditions. One condition is if the consent of the patient has been obtained for that use. In some cases, obtaining consent is not practical. If we want to build an AI/ML model from a large hospital database spanning a decade, going back and obtaining retroactive consent from all of the patients is not realistic. Some patients may not be reachable because they have moved or died, and contacting remaining patients and explaining to them the purpose of the study would be challenging, time-consuming, and expensive. It is also likely that not all patients will consent, and convincing evidence shows that consenters and non-consenters differ in a systematic way in their characteristics, resulting in a consenting subset that will be a biased sample of the dataset.

De-identification of Data

Another way to address this issue is to de-identify the dataset. De-identification is the process of making the personal health information nonpersonal. Because it is no longer personal information, it can be used in AI/ML projects without having to obtain consent. While this is a huge benefit of de-identification, the de-identification process itself must be performed well to justify such an exemption from the obligations of privacy regulations. The act of de-identification itself is generally considered to be a legitimate use of information, and therefore does not require consent.

Performing de-identification well means that generally accepted methodologies that assess the risk of re-identification can demonstrate that the resultant risk is below commonly used risk thresholds or there are strong precedents for what are deemed acceptable risks. Another key consideration is the context in which the AI/ML projects are going to be run and what security and privacy controls are in place. The stronger these controls are, the lower the risk of re-identification. Three important considerations when de-identifying data for an AI/ML project are: Data utility, transparency, and ethics. The first one is a technical issue, and the latter two pertain to governance.

Whenever data are transformed, the data’s utility is affected. Many transformation choices can be applied to a dataset, and those choices are determined by the algorithms that are used. De-identification algorithms matter. Good algorithms will apply sufficient transformations but still retain high data utility.

It is important to be transparent about data uses and what kinds of decisions will be made from AI/ML models. Even though consent cannot be obtained in advance, the fact that patient data is being used for secondary purposes should be communicated to the patient population in a reasonable manner.

The final consideration is the need to ensure that the AI/ML models are developed and used in an ethical and nondiscriminatory manner, or not in a way that disadvantages a particular group of patients. Typically, an ethics review process is put in place to oversee model development and use. The review process does not have to be complex but should be proportionate.

The Ultimate Benefit to Patients

When these aspects are addressed well, AI/ML can be applied to analyze longitudinal patient data and extract valuable and actionable insights. AI/ML can identify patterns in the patient journey to uncover patient populations that may benefit from certain products. AI/ML can identify an undiagnosed patient with a rare disease and predict a next line of treatment for patients who do not respond to a current regimen. AI/ML can compare patients’ characteristics, compliance, and responses to different treatment regimens and identify the treatment options that may improve patient care. The list goes on.

AI/ML has enormous potential to improve patient care and healthcare efficiency. Biopharmaceutical companies and their partners must take extra steps to ensure that their secondary uses of the data are consistent with current privacy regulations, which includes implementing appropriate technical and administrative controls and possibly de-identification. Finally, when extracting insights using AI/ML, care must be taken to ensure the insights are consistent with the ethical expectations for the uses of data (e.g., minimizing bias), and that these AI/ML extracted insights are used in a socially responsible way and have a positive impact on patients and society.

  • Dr. Yilian Yuan

    Dr. Yilian Yuan is the Sr. Vice President of Global Data Science and Advanced Analytics at IQVIA. She leads a global team of data scientists and advanced analytics experts, combining the local market knowledge with advanced AI/ML and analytics skills, deep healthcare data expertise, and industry knowledge, to help clients address a broad range of business and industry challenges. Dr. Yuan has an extensive background in predictive analytics and machine learning methods and their applications in the pharmaceutical industry. She also has a wide range of experience analyzing real-world patient longitudinal data to provide actionable insights for improving patient care and for pharma clients to improve business performance.

    • Dr. Khaled El-Emam

      Dr. Khaled El-Emam is the Founder at Privacy Analytics, an IQVIA company that develops solutions for the anonymization of health information. He is also a professor at the University of Ottawa, where he runs a health informatics research lab focusing on data protection technologies to facilitate the sharing and secondary uses of health data.


      You May Also Like

      The Power of TV in Pharma

      Television continues to dominate pharmaceutical marketing, accounting for more than 70% of DTC spend. ...

      Doctors Make Emotional Decisions Too—Even When It Comes to Prescribing

      For many years, pharmaceutical companies relied heavily on data to market products. Sales reps ...