Intro

The in-house company designs and manages the physical and digital infrastructures and at the same time the aggregated purchasing cycle of regional public administration bodies.

This company intended to develop a project to enhance the health data at its disposal in order to develop a risk model for the onset of serious diseases on specific categories of patients. For the purpose of carrying out the project, DVC provided the company with technological and legal support in defining a process aimed at reusing the available datasets in accordance with current legislation, also thanks to data synthesis solutions

Challenge

The company had the objective of reusing datasets composed of health data to develop a machine learning model aimed at estimating the probability of risk of the onset of serious diseases. The current regulations on privacy make it particularly difficult to reuse this type of data. In addition, normal data anonymization techniques tend to diminish some statistical properties of the starting dataset, thus making it more difficult to develop reliable predictive models.

Solution

DVC intervened by offering support to the company in two directions:

  1. from a technological point of view, offering the necessary technical solution to allow the anonymization of available health data, ensuring the permanence of the statistical properties of the original dataset. This solution consists of the data synthesis technique, which - thanks to the use of AI solutions - allows the generation of artificial data starting from a real dataset whose statistical properties remain unchanged;
  2. from a legal point of view, supporting the company in fulfilling the various obligations established by current legislation on the processing of health data (e.g. drafting of the DPIA and the related preliminary assessment).

Results

The company has developed the first phase of the pilot project, which involves the synthesis of a previously anonymized dataset in order to evaluate the goodness of the technological solution offered. This pilot project has yielded positive results thanks to the validation of the results carried out by a third party, demonstrating the total reliability from a statistical point of view of the proposed synthesis techniques. In a second phase of the project, this technology will be applied to a real dataset containing health data for the purpose of training models of Machine learning increasingly reliable and performing.

Are you ready to transform the Data in value for your business?