The corona crisis has all of us firmly under control. Doctors are fighting for lives in hospitals, governments around the world are sending their populations home – and thereby bringing down all social life and the economy. At the same time, researchers are pressing ahead with potential vaccines, while societies wonder what medium- and long-term coping strategies can be.
Protect user data as best as possible – and leave it on your own smartphone
© shutterstock / trismegist
In this search for solutions and options for action, decision-makers and representatives of society and business also look to technology as a beacon of hope. Because: In order to prevent unnecessary suffering, we have to contain Covid-19 as quickly as possible and consider all the resources available to us. Medical options for action such as the development of drugs and vaccines take a lot of time – but we don’t currently have them!
Technology as a beacon of hope
Measures such as social distancing are showing initial success. However, they raise the question of whether this can be sustained for as long as epidemiologically necessary for psychological, social and economic reasons. The key question is therefore not whether it is ethically justifiable to use technical possibilities to overcome the crisis – but rather whether we can ethically justify not using them. Ergo: So we have an ethical responsibility to use all the options we have – including and in particular approaches such as big data and artificial intelligence.
Here, however, follows a big but, like the EU Commissioner Věra Jourová
in this post
To put it succinctly: “Even in such an exceptional situation, the principles of data protection must be respected”. Even if we now have to look for pragmatic technical solutions quickly, we must not slide from one global crisis to the next. Numerous personalities and experts like that
Whistleblower Edward Snowden
, the renowned
Historian Yuval Harari
Computer science professor Michael Huth
have already declined this in detail.
Technology flanked by social aspects
Even if data analysis and artificial intelligence are the hope of the Corona crisis, it must be made clear: Technologies cannot be left alone. Various approaches that are already in use worldwide illustrate this problem: For example, collects
with compulsory apps data to classify citizens according to their potential infection status. Googles
links access to testing facilities in California to data sharing and
misappropriates tracking technologies from its domestic intelligence service.
Rather, the technological applications must be flanked by political, social and legal provisions: which data may be collected, how long may they be stored where and for what purposes they may be used. Will the use of the application and data also end with the end of the corona pandemic? How voluntary or compulsory is the commitment for citizens?
Technological possibilities for more data protection
There does not have to be a trade-off between data protection and health – even if there is
in public debate
is often portrayed as almost insurmountable. Because the good news is that there are already proven methods for implementing data protection technologically. And Europe in particular has enormous potential here thanks to intensive research into privacy technologies.
There are mainly three common methods of reconciling data usage and data protection: anonymization, homomorphic encryption and edge computing (especially federated learning). All three approaches differ in terms of their degree of technological maturity, their scalability and the degree of possible data protection.
The most popular method for data anonymization is what is known as differential privacy. Privacy should be preserved by changing the original data – be it by deleting personal data or adding artificial data (so-called noise). The advantage: low complexity, a correspondingly high degree of maturity and a wide range of possible applications.
However, the anonymization of data carries the risk that personal data will be recalculated. This can often even be implemented using relatively simple methods – for example, by linking differently anonymized databases with one another. A prominent example a few years ago was the
De-anonymization of Netflix users
using the Internet Movie Database. Differential Privacy, although often used, therefore offers one compared to other methods
lower level of data protection
In many cases, anonymization can serve as a good bridging technology. However, one should also act very carefully here and always maintain a high standard. De-anonymization methods are also constantly improving, so that even the best anonymizations are often ineffective within a few years.
Candidate number two is the so-called
. Data is already encrypted on the end devices (e.g. smartphones) in such a way that it can still be analyzed but not decrypted. The data can then be stored and analyzed centrally on an external cloud.
However, the computing resources required for calculations on homomorphically encrypted data are enormous. With large amounts of data or more complex calculations, the limits of current technical feasibility are quickly exceeded. In addition, cryptographers disagree about how much the
Privacy can really be protected
However, this method raises high hopes worldwide, so that large sums are invested in the research and development of scalable variants. It will be some time before this method can be used widely – and it will probably be used in parallel or in combination with the third alternative, edge computing.
Edge Computing (especially Federated Learning)
The third approach takes a different route. While anonymization and homomorphic encryption change the data in a certain way,
With edge computing, the data remains with the user
. Instead of bringing the data to the algorithm, the algorithm is simply brought to the data.
Pan-European Privacy Preserving Proximity Tracing
Initially such a decentralized approach: Algorithmic systems should effectively help fight corona through peer-to-peer analyzes directly on smartphones – and according to their own statement, at the same time
protect user privacy
. However, there are currently internal discussions about the further direction of the project and the question of
whether a decentralized or centralized approach should be chosen
Federated learning goes one step further than edge computing. Small AI models are trained on the individual end devices. In the second step, all of these AI models are merged, while the actual user data remains on the devices. This creates a much more complex AI than if only data had been trained centrally.
Edge computing in general and federated learning in particular
maintain privacy and are highly usable and scalable
. In addition, large amounts of data no longer have to be sent and the method can also work asynchronously.
Another advantage is that there is indirectly greater protection against attacks and data manipulation. Because instead of a central system, possible attackers would have to target a large number of end devices at the same time.
The main disadvantage so far is the greater development effort. In addition, the method cannot be used in every case – for example, if user data is already collected centrally.
Health and tracking data are extremely sensitive data. Because data such as is collected to combat the corona pandemic does not only provide information on bodily functions and contacts with (potentially) infected people. Analyzes make it possible to draw conclusions about the psyche and all areas of life of individuals – and in the worst case, use this information.
It is therefore imperative that cautious action be taken in spite of the emergency. Europe can and should refer to the enormous potential for research and development on data protection.
A decentralized approach of the already mentioned PEPP-PT seems to offer an interesting possibility that needs to be observed. The basic calculations are carried out directly on the individual smartphones, while constantly changing pseudonymous IDs are generated for each user. This is intended to enable users to find out whether they have had contact with an infected person while protecting their privacy. However, so far these calculations are only based on the tracking data of the users.
Either way, we should keep an eye on the PEPP-PT project
Theoretically, there are additional uses of AI through the combination, for example, with health data – provided data protection is maintained. Using federated learning, such highly precise AI models could be trained directly on the users’ devices. In this way, the user could receive warnings about his or her state of health – without the data ever leaving the end device. In this way, users could receive medical support from a digital doctor – including built-in confidentiality.
Corona data protection crisis: are we sliding into the next global crisis?
Report: Corona app from Deutsche Telekom is unsafe
Covapp: The source code of the Charité’s Corona app is now open source
Covid-19-Sounds-App: recognize Corona by breathing sounds
Corona data donation app from the Robert Koch Institute available