The CLEANUP Project

Welcome to the public webpage of the CLEANUP project! CLEANUP is a four-years research project funded by the Research Council of Norway. The goal of CLEANUP is to develop new machine learning methods to automatically anonymise (or at least strongly de-identify) text documents containing personal data, such as electronic health records, court rulings or chat-based interactions with customers.

The project brings together a consortium of researchers from machine learning, natural language processing, computational privacy, statistical modelling, health informatics and IT law. In addition, partners from the Norwegian public and private sector (covering the fields of insurance, welfare, healthcare and legal publishing) contribute to the project with their data and domain knowledge.

Oh, and if you were wondering what CLEANUP stands for : it's "Machine Learning for the Anonymisation of Unstructured Personal Data" (yes, we were a bit creative with the acronym).

News:

[2023-10-22]	Our latest work, Neural Text Sanitization with Privacy Risk Indicators: An Empirical Analysis (currently in submission) is now available on arXiv! This journal paper offers an in-depth analysis of our sanitization approach and proposes several privacy risk indicators.
[2022-11-01]	Our paper, Neural Text Sanitization with Explicit Measures of Privacy Risk, is accepted as a long paper to AACL! The paper presents a novel approach to text sanitization based on estimates of disclosure risk, which allows us to directly control the trade-off between privacy protection and data utility.
[2022-05-01]	We present a new, carefully curated dataset for privacy-enhancing NLP: the Text Anonymization Benchmark (TAB). See our paper recently published in Computational Linguistics for details.
[2021-09-10]	One of our master students, Torbjørn Dahl, is working on reference resolution on de-identified texts in collaboration with Lovdata.
[2021-05-06]	Our position paper on text anonymisation has been accepted to ACL 2021, one of the top-tier conferences within NLP. See current version here.
[2020-11-01]	Our PhD research fellow Anthi Papadopolou has just started her PhD on neural models for text anonymisation. Welcome onboard!
[2020-04-30]	The official website of the CLEANUP project is now up and running!
[2020-02-01]	The CLEANUP project has now officially started!