The CLEANUP Project

Welcome to the public webpage of the CLEANUP project! CLEANUP is a four-years research project funded by the Research Council of Norway. The goal of CLEANUP is to develop new machine learning methods to automatically anonymise (or at least strongly de-identify) text documents containing personal data, such as electronic health records, court rulings or chat-based interactions with customers.

The project brings together a consortium of researchers from machine learning, natural language processing, computational privacy, statistical modelling, health informatics and IT law. In addition, partners from the Norwegian public and private sector (covering the fields of insurance, welfare, healthcare and legal publishing) contribute to the project with their data and domain knowledge.

Oh, and if you were wondering what CLEANUP stands for : it's "Machine Learning for the Anonymisation of Unstructured Personal Data" (yes, we were a bit creative with the acronym).



We are currently working on releasing a richly annotated dataset for text anonymisation based on cases from the European Court of Human Rights.


One of our master students, Torbjørn Dahl, is working on reference resolution on de-identified texts in collaboration with Lovdata.


Our position paper on text anonymisation has been accepted to ACL 2021, one of the top-tier conferences within NLP. See current version here.


Our PhD research fellow Anthi Papadopolou has just started her PhD on neural models for text anonymisation. Welcome onboard!


The official website of the CLEANUP project is now up and running!


The CLEANUP project has now officially started!