Privacy-preserving artificial intelligence: training on encrypted data
In the era of Artificial Intelligence (AI) and big data, predictive models have become an essential tool across various industries including healthcare, finance and genomics. These models rely heavily on the processing of sensitive information making data privacy a critical concern. The key challenge lies in maximizing data utility without compromising the confidentiality and integrity of the information involved. Achieving this balance is essential for the continued advancement and acceptance of AI technologies.
Machine Learning Tech Lead at Zama.
Collaboration and open source
Creating a robust dataset for training machine learning models presents significant challenges. For instance, while AI technologies such as ChatGPT have thrived by gathering vast amounts of data available on the internet, healthcare data cannot be compiled this freely due to privacy concerns. Constructing a healthcare dataset involves the integration of data from multiple sources including doctors, hospitals and across borders.
The healthcare sector is emphasized due to its societal importance, yet the principles apply broadly. For example, even a smartphone autocorrect feature, which personalizes predictions based on user data, must navigate similar privacy issues. The finance sector also encounters obstacles in data sharing due to its competitive nature.
Thus, collaboration emerges as a crucial element for safely harnessing AI’s potential within our societies. However, an often overlooked aspect is the actual execution environment of AI and the underlying hardware that powers it. Today’s advanced AI models necessitate robust hardware, including extensive CPU/GPU resources, substantial amounts of RAM and even more specialized technologies such as TPUs, ASICs, and FPGAs. Conversely, the trend towards user-friendly interfaces with straightforward APIs is gaining popularity. This scenario highlights the importance of developing solutions that enable AI to operate on third-party platforms without sacrificing privacy, and the need for open-source tools that facilitate these privacy-preserving technologies.
Privacy solutions to train machine learning models
To address the privacy challenges in AI, several sophisticated solutions have been developed, each focusing on specific needs and scenarios.
Federated Learning (FL) allows for the training of machine learning models across multiple decentralized devices or servers, each holding local data samples, without actually exchanging the data. Similarly, Secure Multi-party Computation (MPC) enables multiple parties to jointly compute a function over their inputs while keeping those inputs private, ensuring that sensitive data does not leave its original environment.
Another set of solutions focuses on manipulating data to maintain privacy while still allowing for useful analysis. Differential Privacy (DP) introduces noise to data in a way that protects individual identities but still provides accurate aggregate information. Data Anonymization (DA) removes personally identifiable information from datasets, ensuring some anonymity and mitigating the risk of data breaches.
Sign up to the TechRadar Pro newsletter to get all the top news, opinion, features and guidance your business needs to succeed!
Finally, Homomorphic Encryption (HE) allows to perform operations directly on encrypted data, generating an encrypted result that, when decrypted, matches the result of operations performed on the plaintext.
The perfect fit
Each of these privacy solutions has its own set of advantages and trade-offs. FL, for instance, maintains communication with a third-party server, which can potentially lead to some data leakage. MPC operates on cryptographic principles that are robust in theory but can create significant bandwidth demands in practice.
DP involves a manual setup where noise is strategically added to the data. This setup limits the types of operations that can be performed on the data, as the noise needs to be carefully balanced to protect privacy while retaining data utility. DA, while widely used, often provides the least privacy protection. Since anonymization typically occurs on a third-party server, there is a risk that cross-referencing can expose the hidden entities within the dataset.
HE, and specifically Fully Homomorphic Encryption (FHE), stands out by allowing computations on encrypted data that closely mimic those performed on plaintext. This capability makes FHE highly compatible with existing systems and straightforward to implement thanks to open-source and accessible libraries and compilers like Concrete ML, that have been designed to give developers easy to use tools to develop different applications. The major drawback at the moment is the slowdown in computation speed, which can impact performance.
While all the solutions and technologies we discussed encourage collaboration and joint efforts, with its increased protection for data privacy FHE can drive innovation and facilitate a scenario where no more trade off is needed when it comes to enjoy services and products without compromising personal data.
We’ve featured the best encryption software.
This article was produced as part of TechRadarPro’s Expert Insights channel where we feature the best and brightest minds in the technology industry today. The views expressed here are those of the author and are not necessarily those of TechRadarPro or Future plc. If you are interested in contributing find out more here: https://www.techradar.com/news/submit-your-story-to-techradar-pro
In the era of Artificial Intelligence (AI) and big data, predictive models have become an essential tool across various industries including healthcare, finance and genomics. These models rely heavily on the processing of sensitive information making data privacy a critical concern. The key challenge lies in maximizing data utility without…
Recent Posts
- Buying your dad a tech gift or gadget for Father’s Day? You may want to wait until Prime Day, if possible
- Which Amazon Fire Stick do I need? A simple guide to the key differences
- Stellar Blade’s slick-looking sequel is officially called Blood Rain
- How much data does your favorite messaging app collect? New study shows 90% of messaging apps now include AI that puts privacy at risk
- Super Yooka-Laylee Kart looks like an old-school Mario Kart for the modern age
Archives
- June 2026
- May 2026
- April 2026
- March 2026
- February 2026
- January 2026
- December 2025
- November 2025
- October 2025
- September 2025
- August 2025
- July 2025
- June 2025
- May 2025
- April 2025
- March 2025
- February 2025
- January 2025
- December 2024
- November 2024
- October 2024
- September 2024
- August 2024
- July 2024
- June 2024
- May 2024
- April 2024
- March 2024
- February 2024
- January 2024
- December 2023
- November 2023
- October 2023
- September 2023
- August 2023
- July 2023
- June 2023