Sunday, May 29

Data we have made available online can be harmful. Like a tweet or a Facebook post.

  • Olav Lysne

    Director SimulaMet and Professor Oslo Met

  • Inga Strümke

    NTNU and SimulaMet

  • Michael Riegler

    SimulaMet and the University of Tromsø

A lot of insight can lie only in small amounts of data about a person, according to the post authors.

Data you have voluntarily shared may reveal information you did not intend to share.

This is a debate post. Opinions in the text are at the writer’s expense.

The Ministry of Justice has in one consultation note proposed that the Police Security Service (PST) should be able to store data from open sources on the Internet for 15 years. This is data we have made available online, such as a tweet, a post in a comment field or a Facebook post.

Even if the data is published voluntarily, they can have a significant potential for harm if one takes over the power that lies in modern data analysis.

Basically, we are faced with a classic privacy dilemma.

On the one hand, it’s scary if a secret service can sit on all your online activity 15 years back in time. We know little about the effect it will have on democratically desirable statements of opinion. But research shows that surveillance has one cooling effect on public discourse.

On the other side is Norway’s need for protection. We know that both espionage and influence and terrorist operations are carried out in Norway. PST is set to handle this. The Ministry of Justice argues that such collection and storage is necessary for PST to be able to do its job.

Olav Lysne (fv), Inga Strümke and Michael Riegler.

Insight from small amounts of data

We will not take a position on whether storage is a proportionate response to the challenges PST faces, but rather use the consultation memorandum as an example of a widespread underestimation in privacy discussions: Open data has a surprisingly large potential for harm.

Modern data analysis, including machine learning and artificial intelligence, increases this potential running.

An example illustrates how much insight can lie only in small amounts of data.

Suppose we know that a person is a woman, 30 years old, childless, has been on sick leave for a week and has just stopped sniffing. Then reflect on the chance that the person is pregnant.

We have listed five individual facts, each of which increases the probability slightly. With about 20 such facts, one could statistically estimate this with great certainty.

Even more interesting is that this was calculated from data that does not deal with pregnancy, and which can easily be found in open sources.

Without knowing it, and without consent, this person may have posted open information that after an analysis tells about something she wanted to keep to herself.

Estimate mental health status

Artificial intelligence has taken such analysis possibilities light years further.

The field is in the development phase, and the full potential is unknown. But we already see the possibilities: Mental health status can be estimated when using data from social media. Photographs can be used for estimation sexual orientation and political orientation.

We do not know what will be possible in the future, but can state that data you have voluntarily shared will be able to reveal information you did not intend to share.

Control of the analysis methods

Existing privacy legislation deals with what data explicitly contains. It has to a small extent taken into account the potential of modern analysis methods to find information we do not know that the data contains.

The consultation note from the Ministry of Justice suffers from the same weakness: Attention is given to control of data sources. The examples above show that control of the analysis methods PST can use is just as important.

The consequence should be regulation of what analyzes PST can do on stored data from open sources, so that they can not freely extract personal information that the person has not shared. The potential for damage in analysis methods must be assessed on an independent basis.

Olav Lysne is a member of the EOS committee, but writes here only in his capacity as professor of informatics. The EOS Committee’s view on whether PST should be able to store openly available information is stated in the Committee’s consultation statement.

Leave a Reply

Your email address will not be published.