Privacy-preserving artificial intelligence: training on encrypted data


In the era of Artificial Intelligence (AI) and big data, predictive models have become an essential tool across various industries including healthcare, finance and genomics. These models rely heavily on the processing of sensitive information making data privacy a critical concern. The key challenge lies in maximizing data utility without compromising the confidentiality and integrity of the information involved. Achieving this balance is essential for the continued advancement and acceptance of AI technologies.
Machine Learning Tech Lead at Zama.
Collaboration and open source
Creating a robust dataset for training machine learning models presents significant challenges. For instance, while AI technologies such as ChatGPT have thrived by gathering vast amounts of data available on the internet, healthcare data cannot be compiled this freely due to privacy concerns. Constructing a healthcare dataset involves the integration of data from multiple sources including doctors, hospitals and across borders.
The healthcare sector is emphasized due to its societal importance, yet the principles apply broadly. For example, even a smartphone autocorrect feature, which personalizes predictions based on user data, must navigate similar privacy issues. The finance sector also encounters obstacles in data sharing due to its competitive nature.
Thus, collaboration emerges as a crucial element for safely harnessing AI’s potential within our societies. However, an often overlooked aspect is the actual execution environment of AI and the underlying hardware that powers it. Today’s advanced AI models necessitate robust hardware, including extensive CPU/GPU resources, substantial amounts of RAM and even more specialized technologies such as TPUs, ASICs, and FPGAs. Conversely, the trend towards user-friendly interfaces with straightforward APIs is gaining popularity. This scenario highlights the importance of developing solutions that enable AI to operate on third-party platforms without sacrificing privacy, and the need for open-source tools that facilitate these privacy-preserving technologies.
Privacy solutions to train machine learning models
To address the privacy challenges in AI, several sophisticated solutions have been developed, each focusing on specific needs and scenarios.
Federated Learning (FL) allows for the training of machine learning models across multiple decentralized devices or servers, each holding local data samples, without actually exchanging the data. Similarly, Secure Multi-party Computation (MPC) enables multiple parties to jointly compute a function over their inputs while keeping those inputs private, ensuring that sensitive data does not leave its original environment.
Another set of solutions focuses on manipulating data to maintain privacy while still allowing for useful analysis. Differential Privacy (DP) introduces noise to data in a way that protects individual identities but still provides accurate aggregate information. Data Anonymization (DA) removes personally identifiable information from datasets, ensuring some anonymity and mitigating the risk of data breaches.
Sign up to the TechRadar Pro newsletter to get all the top news, opinion, features and guidance your business needs to succeed!
Finally, Homomorphic Encryption (HE) allows to perform operations directly on encrypted data, generating an encrypted result that, when decrypted, matches the result of operations performed on the plaintext.
The perfect fit
Each of these privacy solutions has its own set of advantages and trade-offs. FL, for instance, maintains communication with a third-party server, which can potentially lead to some data leakage. MPC operates on cryptographic principles that are robust in theory but can create significant bandwidth demands in practice.
DP involves a manual setup where noise is strategically added to the data. This setup limits the types of operations that can be performed on the data, as the noise needs to be carefully balanced to protect privacy while retaining data utility. DA, while widely used, often provides the least privacy protection. Since anonymization typically occurs on a third-party server, there is a risk that cross-referencing can expose the hidden entities within the dataset.
HE, and specifically Fully Homomorphic Encryption (FHE), stands out by allowing computations on encrypted data that closely mimic those performed on plaintext. This capability makes FHE highly compatible with existing systems and straightforward to implement thanks to open-source and accessible libraries and compilers like Concrete ML, that have been designed to give developers easy to use tools to develop different applications. The major drawback at the moment is the slowdown in computation speed, which can impact performance.
While all the solutions and technologies we discussed encourage collaboration and joint efforts, with its increased protection for data privacy FHE can drive innovation and facilitate a scenario where no more trade off is needed when it comes to enjoy services and products without compromising personal data.
We’ve featured the best encryption software.
This article was produced as part of TechRadarPro’s Expert Insights channel where we feature the best and brightest minds in the technology industry today. The views expressed here are those of the author and are not necessarily those of TechRadarPro or Future plc. If you are interested in contributing find out more here: https://www.techradar.com/news/submit-your-story-to-techradar-pro
In the era of Artificial Intelligence (AI) and big data, predictive models have become an essential tool across various industries including healthcare, finance and genomics. These models rely heavily on the processing of sensitive information making data privacy a critical concern. The key challenge lies in maximizing data utility without…
Recent Posts
- Rivian’s new Dune edition lets you channel your inner Fremen
- Here’s when and where you can preorder the new iPhone 16E
- The Humane AI Pin debacle is a reminder that AI alone doesn’t make a compelling product
- This 1.9-pound smartphone’s massive battery offers six months of standby
- Movie sales – including 4K Blu-ray – fell again last year, but if you’re going streaming only, you’re massively missing out
Archives
- February 2025
- January 2025
- December 2024
- November 2024
- October 2024
- September 2024
- August 2024
- July 2024
- June 2024
- May 2024
- April 2024
- March 2024
- February 2024
- January 2024
- December 2023
- November 2023
- October 2023
- September 2023
- August 2023
- July 2023
- June 2023
- May 2023
- April 2023
- March 2023
- February 2023
- January 2023
- December 2022
- November 2022
- October 2022
- September 2022
- August 2022
- July 2022
- June 2022
- May 2022
- April 2022
- March 2022
- February 2022
- January 2022
- December 2021
- November 2021
- October 2021
- September 2021
- August 2021
- July 2021
- June 2021
- May 2021
- April 2021
- March 2021
- February 2021
- January 2021
- December 2020
- November 2020
- October 2020
- September 2020
- August 2020
- July 2020
- June 2020
- May 2020
- April 2020
- March 2020
- February 2020
- January 2020
- December 2019
- November 2019
- September 2018
- October 2017
- December 2011
- August 2010