Experts are working on new privacy tools to protect government databases. 

CSIRO’s Data61, the digital specialist arm of Australia’s national science agency, the NSW Government, the Australian Computer Society (ACS) are working on a new privacy tool that can assess the risks to an individual’s data within any dataset, so that targeted protection mechanisms can be put in place.

Such assessments are undertaken by data and privacy experts, but they will soon be able to rely on computer models to validate this work.

Known as the Personal Information Factor (PIF) tool, the software uses a data analytics algorithm to identify the risks that sensitive, de-identified and personal information within a dataset can be re-identified and matched to its owner.

The early version of the tool is already being used by the NSW Government to analyse datasets tracking the spread of COVID-19 in the state since March 2020 and apply appropriate levels of protection before this data is released as open data.

Dr Ian Oppermann is the NSW Government’s Chief Data Scientist.

“There’s no other piece of software like the PIF tool,” Dr Oppermann says.

“It was developed through a long and very collaborative process involving many state, Commonwealth and industry colleagues. CSIRO's Data61 really brought it to life and made it usable.

“Every day, it helps us analyse the security and privacy risks of releasing de-identified datasets of people infected with COVID-19 in NSW and the testing cases for COVID-19, allowing us to minimise the re-identification risk before releasing to the public.”

Dr Oppermann says COVID-19 has amplified public awareness of the need for data privacy.

“Given the very strong community interest in growing COVID-19 cases, we needed to release critical and timely information at a fine-grained level detailing when and where COVID-19 cases were identified,” Dr Oppermann said.

“This also included information such as the likely cause of infection and, earlier in the pandemic, the age range of people confirmed to be infected.

“We wanted the data to be as detailed and granular as possible, but we also needed to protect the privacy and identity of the individuals associated with those datasets.”

Project lead researcher and Senior Research Scientist at CSIRO’s Data61, Dr Sushmita Ruj, says it is a powerful tool.

“Having studied other privacy metrics, the team concluded a one-size-fits-all approach to estimating the re-identification risks of unique applications of data can be significantly improved upon,” Dr Ruj said.

“The evolving approach to a PIF takes a tailored approach to each dataset by considering various attack scenarios used to de-identify information.

“The tool then assigns a PIF score to each set.”

If the PIF is higher than a desired threshold, the program makes recommendations on how to design a more secure and safe framework to certify the dataset is safe to be publicly released.

The PIF tool is also being used to examine other data sets before public release in areas such as domestic violence data collected during the COVID-19 lockdown and public transport usage.

The tool will continue to be developed and is expected to be made available for wider public use by June 2022.