Privacy in Europe – GDPR vs. information freedom? – WS 09 2020
Does GDPR have an overreaching effect on information freedom?
The goal of our discussion is to highlight some areas where GDPR seems to limit justified processing of personal data. We will discuss issues of freedom of information, the WHOIS-registry and decentralized peer-to-peer applications. We will also take a look at the burden to innovation in countries outside the European Economic Area where the GDPR does not apply directly and that have no adequacy decision. These countries suffer of serious limitations to participation in the digital economy due to third country rules that limit transfer of personal data.
The format would include some interactive discussions followed by comments of field experts. The key is to give a variety of views and opinions on a wide spectrum of data protection regulations.
- Civil society and its role for information freedom and digital apps usage (Robert Guerra, ICANN)
- Harmonizing DNS Registration Data Policies with GDPR and other privacy regimes (Steve Crocker)
Presentation: A Framework for Harmonizing DNS Registration Data Policies
- Sub-topics: Artificial intelligence (speaker tbd)
- GDPR’s difficulties of new EU members along with non-EU CoE members (Lilijana Pecova)
Links to relevant websites, declarations, books, documents. Please note we cannot offer web space, so only links to external resources are possible. Example for an external link: Website of EuroDIG
Until 27 April 2020.
Please provide name and institution for all people you list here.
- Marina Shentsova
Organising Team (Org Team) List them here as they sign up.
- Eva Christina Andersson
- Amali De Silva-Mitchell
- Susan Marukhyan
- Federica Casarosa
- Raphael Beauregard-Lacroix
- Zoey Barthelemy
- Aleksandra Ivanković
- Steve Crocker
- Anastasiia Korotun
- Robert Guerra
- Steve Crocker
- Lilijana Pecova
- Robert Guerra
- Andreas Maier
The moderator is the facilitator of the session at the event. Moderators are responsible for including the audience and encouraging a lively interaction among all session attendants. Please make sure the moderator takes a neutral role and can balance between all speakers. Please provide short CV of the moderator of your session at the Wiki or link to another source.
Trained remote moderators will be assigned on the spot by the EuroDIG secretariat to each session.
- Cedric Amon, Geneva Internet Platform
Current discussion, conference calls, schedules and minutes
See the discussion tab on the upper left side of this page. Please use this page to publish:
- dates for virtual meetings or coordination calls
- short summary of calls or email exchange
Please be as open and transparent as possible in order to allow others to get involved and contact you. Use the wiki not only as the place to publish results but also to summarize the discussion process.
- Privacy regulations such as the GDPR are based on the assumption that the rule of law is respected. However, laws are not applied in the same way across borders and so policies must be harmonised across jurisdictions.
- There are important tensions between freedom of information and privacy protection arising from compliance to the GDPR and other privacy regulations. A fresh look at registration models is needed to enable access to data and information that abides by privacy rules and enables sustainable sharing of information.
- The GDPR is protecting data to varying degrees depending on the area of application. There is a need for additional safeguards for medical data.
Find an independent report of the session from the Geneva Internet Platform Digital Watch Observatory at https://dig.watch/resources/privacy-europe-gdpr-vs-information-freedom.
Provided by: Caption First, Inc., P.O. Box 3066, Monument, CO 80132, Phone: +001-719-481-9835, www.captionfirst.com
This text, document, or file is based on live transcription. Communication Access Realtime Translation (CART), captioning, and/or live transcription are provided in order to facilitate communication accessibility and may not be a totally verbatim record of the proceedings. This text, document, or file is not to be distributed or used in any way that may violate copyright law.
>> NADIA TJAHJA: So, hello, everyone. My name is NADIA TJAHJA and I am your studio host. And joining me in the studio is Al (inaudible).
>> Hi. I am happy to see you again and let’s start with the discussion.
>> NADIA TJAHJA: EuroDIG is all about the dialogue. We want to make sure everyone has a chance to participate and we want to have a little bit of organization to make sure we can continue doing this. So when you enter the room, you can see your name in the participation list. Please write your full name so we know exactly who you’re talking to. You can do this by clicking on more next to your name where you can rename yourself. Please raise your hand to ask a question and the remote moderator will give the floor to you. If you have any need for any assistance, we can help you and we will unmute you for you. When you do have the floor, please switch on your video so we know who we’re talking to and then we can have a great discussion further. Without further adieu, I would like to give the floor to the moderators of workshop 9 privacy in Europe, GDPR. At Geneva micro labs, you have the floor.
>> MARINA SHENTSOVA: Hello. It’s nice to meet everyone. Here about to start our session WS9 and I’m the focal point of the session Marina Shentsova. The topic of our session is privacy in Europe versus information freedom. So far we have four speakers who will give you presentations on various topics starting from data protection in (inaudible) EU and United States for using GDPR protection act. So I would like to give the floor to our second moderator. He will give some short introduction as well and then they’ll continue the presentations. Thank you.
>> JÖRN ERBGUT: Thank you, Marina. I would like to open the session with a quote from a study that has been done by the European Parliament. European internet. A European firewall Cloud internet with first digital ecosystem in your base on data and innovation. It would drive competition and set standards similar to what has happened in China in the past 20 years. It’s foundations such a European Cloud, transparency and data protection. So it proposes a (inaudible) in order to post the data protection, but, of course, it doesn’t force the information freedom. And this is some frightening thoughts that I would like to put in front of our four exciting talks and our very exciting discussions coming up. And the first speaker will be please, Marina.
>> MARINA SHENTSOVA: Yes. I would like to introduce our first speakers. He’s our first – you can see he’s a member of the – he’s also head of PRIVATERRA. Thank you.
>> ROBERT GUERRA: Hello, good morning. I really thank all of you for inviting me to this panel today. As mentioned earlier, I’m going to just focus my comments on Civil Society and kind of NGO in regards to the issues of information freedom particularly on an issue related to freedom of expression and human rights online. How does that apply to GDPR and how does that apply to this session?
So what I’m going to go through a couple different things in my short presentation and then really look forward to the discussion later is a couple things that were all on the same page. It is really important to understand how the internet has evolved over time. What I will mention is that really the values that we’re seeing really depends on the moment in time we’re in. Will talk about some issues related to openness and some of the challenges that we’ve seen in regards to the GDPR application and then some issues for the additional speakers to speak about. Next slide, please.
So as we all know as I mentioned earlier, the internet has gone through a variety of different phases. First one is more or less when it was created in the early 2000s. A lot of governments ignored internet. It growed and extended from one country to the other, but legislation and controls were scarce. There is discussions around the declaration of independent of cyber space. That’s how it was referred to back then that anything on the internet would go and be independent from the rest of the countries. Governance ignored the internet and information flowed. It wasn’t being used by as many people as today, but it was there. In the early 2000s, 2000 through 2005, they worked this and referred to phase 2 access denied when certain governments start to block websites. They don’t manipulate content. So websites would not be available. We go to the next slide, please.
We get to a third phase, which is one that starts to become a bit more problematic. Instead of censorship is that information starts. There is the first really reports of content being shaped. Troll farms appearing in parts of the world that try to shape the discourse. Something very common today, but it was seen in the early 2000s and we see the emergence of cyber attacks directed against countries at the time. We see that a lot now, but this has really started back then. Stage 4 ors 4th phase was access contested. We see not only the emergence of fake news and other parts of the world something prevalent today, but also governments coming online and starting to contest the open decentralized nature of the internet. Forms and standards start being proposed by governments that try to change the fundamental nature of an open architecture of the internet. Next slide.
We get into two phases where we find ourselves now is with the reaction to the Edwards known revelations in 2013. We see really a recognition of the extends of surveillance and surveillance taken place online and a reaction to and an empowerment of initiatives that try to remove governmental control of key aspects of the internet. We see the initiative in 2014. We see the Iana Stuartship transition which is one of the authorities that the U.S. government controlled over ICANN occurring, but also we see in reaction to the Edwards known revelations, the discussions that were taken place at the European parliament really catalyzing and going forward. In a recognition and seeing the privacy issues that were taking place, what was a discussion that was kind of parked at the European parliament really take off and enabling the GDPR to not only pass the parliament, but be the powerful, regional and key regulation that it is today. But also in 2016, we also see the issue of social media manipulation, fake news and elections and see the election of the U.S. Pesident that we have today.
And then stage 6 they would say is we also have and it was happening before national sovereignty on the internet and information sovereignty that was taking place in China, Russia and Iran starting to be discussed in the U.S. about walkings internet, being able to shut down the internet and as was mentioned just now, also perhaps Europeans (inaudible) themselves off as well.
We have an area – we have a part of our left now of COVID‑19, which is real physical firewalls amongst countries and an era of economic uncertainty. Next slide.
Next slide, please. And so what are some of the knows? So going through a little bit of the history and also seeing the timing and knowing that there are other speakers as well. What are some of the challenges with GDPR implementation? We’ll talk more about this later, but it makes assumptions that the rule of law will work, it will be the right balance between the human rights and privacy, but what happens when that doesn’t take place? Next slide.
There is two examples. I’ll just go through the first two and then the second one is we have now some cases in 2018 in Birmania and notice 2020 in Hungary where investigative journalists who are working on issues exposing corruption by either a ruling party or exposing the wealth of individuals in the country. Then GDPR being used and to try to shut down and try to sensor information that’s online. This follows a terrible issue that we have in China where had we known earlier November or earlier that COVID‑19 was – there was infectious virus taking place in China, then we would have been able to act earlier. Then we reelized there are some issues. Before we start proceeding to an issue where we can do controls in Europe or setting off or creating a firewall, we have to recognize that the balance isn’t there yet and we have to address the issues of how can we make sure that freedom of expression, human rights is maximized online. So with that and recognizing the time that there are some other speakers, I’ll stop now and maybe take some other questions for discussion later. Going to the next speaker. Thank you.
>> Thank you, Robert. Do we already have some questions?
>> MODERATOR: At the moment, we’re not having questions.
>> ROBERT GUERRA: If I may, I am seeing a question in the chat. I wanted to make sure I am speaking in my own personal voice representing Privaterra and the work I’ve been doing to Civil Society and not speaking as a – in regards to ICANN or anything like that. I am a member of the stability advisor committee, but I am not speaking in that capacity.
>> MODERATOR: Let’s give you one question. Do you think the danger of having Chinese internet in Europe is real?
>> ROBERT GUERRA: I think as you mentioned earlier, I think it’s very scary because people that have been doing work in regards to Chinese firewall is a vision of the internet that is controlled where a pre‑posting censorship takes place. So we’re talking about a very different type of interet. Whether pervasive surveillance and information controls and there’s a lack of privacy. So the Europeans want to be seen as a model of the best practices going to Chinese rout would be the wrong way to go. And so now is it possible? I think in an area of nationalism, I think it’s there and so I think we need a discussion to recognize what that means and maybe find a third way. Instead, I want to promote the global level of privacy instead of trying to restrict privacy only to Europe and then not do so internationally.
>> MODERATOR: Thank you. We will be – we’ll have you for the discussion later on and available as well. So please prepare your questions.
Next we will focus a bit more on one issue where GDPR has created a lot of trouble. Please, Marina, introduce our next speaker.
>> MARINA SHENTSOVA: Second speaker is professor Steve Crocker. He’s a co‑founder and developed the protocols and laid the foundation for place internet. He’ll be present in the topic harmonizing DNS registration related to GDPR and others. So Steve, welcome to start the presentation.
>> STEVE CROCKER: Thank you very much, Marina, and hello, everyone. It’s a pleasure to follow Robert who laid out a very nice sequence history of the evolution of controls on the internet. I’ve been spending time since I left the ICANN board 2 and a half years ago, focused on a very specific issue that I spent a lot of time on when I was involved with ICANN which is the who is system now referred to DNS registration data. It’s a very old system and there have been repeated attempts to harmonize and rationalize the tradeoff between accuracy and privacy and control of who gets what information. And the GDPR forced an abrupt and crude curtailment of access which puts things in the direction of privacy which makes forces very happy, but other groups like law enforcement and essential property not so happy. As I said, I have been watching this for a very long time. It’s not a new problem not even in terms of a couple years. The problem goes back measured in decades. And it’s been a bit of a frustration. So I decided to spend some time taking a much deeper fresh look at the subject. Gathered together quietly and privately as other people who are interested in the same issues and concerns. We have been building a framework for expressing desired as well as actual policies. I will take you a bit through that. I mainly want to share with you the nature of the work and not try to go into all details. That will come later, but in the amount of time and with the audience here, just the idea of what we doing as opposed to all the details. Next slide, please.
So there is a picture of what the registration data looks like. Over on the far right, you have registrants symbolized by RT. You have a registrar and provide a mixture of DNS records and contact information and account information. Some of that goes up to the registry and it then made the DNS available and the public intervote for users symbolized in the top left.
And then you have people who want to know something about the registrations. You have various requesters and prior to GDPR, it is estimated that there were something on the order of 5 billion queries per month just for registration data across the 400 or more than 400 million registrations across all 2,000 plus top level domains. Next slide, please.
In a new framework for expressing policies, we take given the privacy as fundamental and therefore any information collected should have a rationale. All of the access to that data should fit into a explicit model of who is getting access, and has everyone agreed that that’s the proper thing. GDPR is very important. It’s made a very big impact in part because of the very strong finds that are associated with violation of it. But it is in fact just one privacy regime. There are others around the world and more to come. That’s the bad news. The good news, I think, is that the underlying principles are not all that complicated and one (inaudible) tried to satisfy them in the spirit of it as opposed to litigate every single detail; however, in looking closely at how all this works, we have come to understand that there is not just one policy that’s enforced bat given time. There’s a multiplicity of forces of policies some of which are hierarchical because you have governments in ICANN for the area that it controls and registries all of which have their own policies. So a given registrar may have to satisfy its own policy, but the policies of the other bodies that it is beholding to. You may have different policies for different classes of registrants. That’s one area complexity. And the other is a lot of operations across national boarders and it’s not so easy to fix these policies within a given national border. So echoing the thing that Robert said. Building of these firewalls around regions is likely to create problems not just solve problems. So someone will proceed a bit more delicately. Next slide.
I mentioned we got a little group. It’s working an ad hoc fashion ever focusing on a subject and not trying to assert any authority or adhere to any other principles except serious analysis. We’re close to being able to grow publicly and when we do, the watch word will be that people have to evaluate what we’re providing on its own merits as opposed to credentials or authorities we don’t have.
Project has three parts. We have been spending our time on the first part which is building and socializing a model and framework. And then about we unveil this, we’ll try to facilitate not control, but be helpful in policy discussions. And then maybe eventually who is data the registration data directory service will have some positive aspects as a platform and other things to be built O. but that’s far down the road. Next slide, please.
I mentioned that there’s a multiplicity of policies. The yellow commentaries that are on the slide are representation of the multiplicity of policies in a hierarchical fashion and not showing would be interactions in a horizontal fashion competing policies and contrasting policies in different regions across different registries and so forth. Next slide.
Actually, back up for a second if you will. I’m sorry. One key thing in the current era relatively new is the idea of controlling the access of who gets access to the information. So in the bottom left portion of the slide, you see a representation of clearing houses and associations that mediate who can request what information and what information can they request. And so that’s a process that involves implicitly various kinds of governance. So what’s going do emerge and has not yet clearly emerged, but what is going to emerge is a whole set of governance mechanisms some of which will be formal and less formal. We’ll be creating bureaucracies rather more than we might have expected. I don’t know any way around that. I think the right thing is to recognize it and grab hold of it and manage ourselves in a proper fashion. Next slide.
Very briefly and I won’t spend a lot of time on this. Our model is that is a public common dictionary of data elements and then a given collection policy describes which data is collected, whether or not it is optional to provide it, whether or not it is required and a labeling in a couple of dimensions, what level of validation and sensitivity, sensitivity covers the usual distinction of public versus private. We have come to understand that there’s degrees of privacy in a sense and then mechanisms that are usually referred to as clearing houses that are the areas that bring together requesters on the one side and the collectors and holders of information on the other side and have rules about who can have access to what. And then further, the requesters actually typically make standardized requests not ad hoc requests each one is different. So having a way of expressing what those standard templates look like is another very important part of what we’ve been doing. Next slide, please.
This goes into a little bit more detail. I show this not so too much take you through all the details, but just to give you an expectation what’s involved. Next slide.
Let’s see. Yes. So here’s what a query looks like and a requester will send in a query. It will have two parts to it. Identification of who the requester is and what the credentials are and then a list of what categories of information they are expecting to get and what level of sensitivity they’re authorized for and so forth. And then they will go through an acceptance process. We’re spending no time at all on that because almost everybody else is focused in that area. But then we’re very focused on the details of what data elements are returned for a successful request. If it’s an unsuccessful request, you don’t get anything back, of course. This slide shows the areas where we’re focused on. Next slide, please.
This is just a quick look at what underneath the covers the data dictionary looks like on the left in pale yellow and what the collection and labeling policy looks like in the green area on the right. Next slide.
And this is a representation of what queries look like. The idea is that the elements in orange are pre‑determined and then a given request is completed by filling in the areas that are flagged in blue and then processing proceeds from there. All of this is a consensual framework. I want to emphasize that. It’s not an implementation or specification. It is not anything that can be imposed. This is a way of characterizing a policy. There’s no guarantee that all of the things that can be expressed in this model can be implemented directly or to sit in a different way. We’re not limiting ourselves to only particularly holding implementation paths. We’re much more concerned with matching the dialogue of what people want. That leads to the real possibility that people can express something and even agree that may want that and then there may be a difficulty in translating that into implementation. A particular example is some kinds of requests say please show me all of the registrations that involve any mention of the following person or the following e‑mail address. So that is implying a search across the database as opposed to just digging in and getting the information related to a particular domain name register. That’s a domain name easy to express and not so easy to implement. But we include it as one of the things that we heard and therefore rather than taking a position that we won’t allow you to say it, we defer the issue of implementation and want to stay focus on the making sure it’s possible to say whatever you want. Along that vein, we want to make sure that people can say what they are thinking about and not just what they agreed to, being able to compare competing proposed policies, being able to compare policies up and down a hierarchy, et cetera. Next slide, please. And we’re done. Thank you.
>> MODERATOR: Thank you, Steve, for this quick run through the painful process of GDPR compliance for who is NDMS look up. I see we already have some comments. A question from (inaudible). Could we give her the floor? While you’re waiting for her, let me answer a question. Is there a human element that does any review of the requests that come in to insure they meet legal requirements to disclosure? Yes. One of the elements in requests that I didn’t dwell on as I showed it to you are certain requests are expected to be processed automatically and other requests may have a manual review process, which implies that may be rejected if it comes out to be negative. And so what we’ve done is provide a way of saying if everyone agrees that the following kind of request has meets all of the requirements and can be processed automatically, then we can still market as that. And if not, then it’s subject to manual review and would leave it at that. So that’s one of the elements inspect structuring requests.
>> MODERATOR: Okay. We have a show of hands. Elizabeth Behsudi raised her hand. Can we have (inaudible) stage?
>> That is why we could not unmute her. Elizabeth, you have the floor.
>> ELIZABETH: Hi. Elizabeth with the (inaudible). We address just this more macro level this very issue which is the tension between the cross border internet and national jurisdiction. Thanks for a really great presentation, Steven. My question is with regard – it is some what related to Jim’s question, which is with regard to the clearing house model, I would imagine that you would have those clearing houses function as accreditation bodies and the – probably maybe one barrier to entry or one concern that operators will have is whether or not those accreditation bodies are properly vetting the requests and in the event that something falls through the cracks and information is disclosed that should not have been. Whether those bodies will make hold the operator. Thanks.
>> STEVE CROCKER: Yeah. A number of those kinds of issues. As I said, this is going to give rise to a number of governance mechanisms and it will have their hands full with these kinds of questions. The basic idea from our point of view is that participation in a particular clearing house involves trust on both sides, both the requesters and the collectors that the rules that have been agreed to are need followed. When they’re not, there’s internal discipline issue really. But I can easily imagine that and fully expect that a given collection group might be a member of more than one clearing house and that a given requester might be more than one member of clearing house. It is mostly focused as best I can tell on the contracted parties, the GTLBs. But that’s only half of the top level domains that doesn’t include the numbers community either. So I’m expecting that there’s a wide variety of clearing houses, some of which are at the low end. They are collecting body might have business arrangements and they might not want to use the same terminology, but it fits into the same framework. So the associated with each of these are going to be emerging and evolving over time with experience fairly detailed questions of how do you trust and how do – and what discipline is enclosed and how do you measure? How do you detect violations? Just to nail that point, law enforcement agencies, the first reaction is if they come and ask for something and they have the proper paperwork, you have to give it to them. But the next piece of that dialogue almost always is well, that’s if it’s your country. What happens if it’s from a law enforcement agency and some other part of the world? So I’m not envisioning that all law enforcement agencies will have access to everything everywhere, at least not in the immediate future. So it’s much more distributed and nuanced and variable set of arrangements. Thank you.
>> MODERATOR: I – not a question. When you look at land registries, the access is regulated in quite a diverse base. In Germany, you need do have a legitimate reason to look up a land registry. Whereas in Denmark, it is completely open. It’s public. Everybody can see who owns what piece of land. And even they can see the mortgages on land and we know Denmark is part of the EU and GDR applies. It is possible to have registries. You need to have an interest for a website where you put the name of the company or some responsible person on the website. So we have some legal requirements to have this kind of information publicly available and regarding the example of Denmark, we see that it’s completely fine with GDPR to have this information publicly available for everybody worldwide. So the land registry of Denmark is (inaudible) just because it is a law saying this land registry is public. Couldn’t this have been a possibility to have an easy solution and for the (inaudible) as well?
>> STEVE CROCKER: Well, it’s easy for each country that wants to pass such a law what happens if one country says we’ll follow the Denmark model and order countries says we’ll follow the German model with respect to control of the information. As I’ve said, our focus is trying to provide tools for making it easy to express these details and differences with precision and clarity in order to bring to the surface the kinds of questions that you’re asking and help focus discussions as opposed to having them go on endlessly in a less detailed way without reaching any conclusions. I would expect it will help partition the problems so that the easy problems can be disposed of and then you can spend time over the years wrestling with the knottier issues.
>> MODERATOR: I would propose to switch over to the next speaker and then we have a final round of discussion with all the speakers. So we are now approaching the topic of data used for artificial intelligence. Marina, could you please introduce our next speaker?
>> MARINA SHENTSOVA: We have our speaker professor Andrea Maier. He is the head of the pattern recognition lab. He’s also member of the European time machine consortium. And as mentioned, his topic is data for artificial intelligence. So please, Andrea, you are driven the floor.
>> ANDREA MAIER: I want to talk about the effects of machine learning and medical data and that we need lots of data in order to build better systems. I also want to talk a bit about (inaudible). So if you go to the next slide, then we can see there’s a small teaser. We work quite a lot with networks and you can say it’s been around for many years ago, but if you get into the users in the next slide, you can see this is not only – you have to click a couple times. There’s big companies like Google, Microsoft, IBM of course, but also Netflix, Xerox and other companies. There’s two more coming up. So they all invest in this technology and particularly these big tech companies, but Daimler and autonomous driving that are interested in the techniques. The GDPR is very high relevance when you apply for the (inaudible) and just acquire data from pedestrians and so on. So if you go to the next slide, you can see one reason why companies very much interested in gathering data is because they can build better assistance. With the deep learning methods, you can see from 2010 to 2015, the image metric and admission knowledge where you assign labels of up to 1,000 classes to images taken from the internet. The error rate was in the range of almost 30% in 2010. By developments in the field and access tool amount of data actually, graphic contest colorful ingredients, you can demonstrate that you can solve this task of image recognition assigning and label to one image. You can perform today with and performance almost as good or as good as a human labeler. So this is approximately 5% error rate on that data set and in 2015, this error rate was reached. So we can perform image recognition and image processing now with accuracy that is close to human performance. This is not only true for images. We have seen speech recognition has dramatically improved and you’re now talking recognition rates of 99.7% when you look at automatic speech recognition.
So if you go to the next slide, you can see that the systems that have been employed here are getting bigger and bigger. The deep networks get even deeper. In the top row, you see one example from 2000. This was published in 1998. And over the years now, there’s a breakthrough. You see that we both have deeper networks. This is a 27 layer model from the 2014 that was one of the windows of the challenge and today, we can construct metrics that have up to more than 100, even to 1,000 layers using special techniques. So the metrics and systems are extremely complex. They have millions of free parameters that have to be determined in order to do so, you need data. If you go to the next slide? In medical, we have quite a problem with acquiring this data because GDPR and pate health records have high obstacles in going towards large data collections and there, of course, there are reasons for that. If you go to the next slide, you can see that also in medical data, we have the same effect. If you have access to more data, then you can build much better systems. This is now shown for (inaudible) side, processing. This is for detecting mitosis. It sells under the division and if you have high power fields images, then you can see you get a rather bad accuracy, but if you go towards complete data set that has more than 20,000 mitosis, these are really rare ones. So in total, more than 200 cells have been annotated by hand in this data set in order to achieve and performance. We have (inaudible) of over 80%. This is already in the ballpark of the human annotators.
Now, you see that big data is a huge issue. We need much data to boot good systems. If you go to the next slide, you will see that the data quality is key. So if you have large amounts of data and you take them from somewhere and take the patient records as has been done in this example, and I am sorry for the (inaudible) large number of numbers, but this is from the NIH. It has more than 100,000 chest X rays. The data is available for download. We had a look into this data and we had asked three radiologist to annotate a small subset of this data set and give the labels and the topologies. You can see that for each row. On average, we had to remove more than 30% up to 60% of the data in order to get labels. So if you have data that is difficult do assess, there is also a large amount of uncertainty in the data. If you’re not looking into this uncertainty correctly and identify the borderline cases, then you can also not build a very good automatic system for this. So data quality is absolutely key. Just taking data from somewhere and uploading it on the internet will now allow people to build consistence and no one could build on power system with the human experts so far because in this data set, we don’t know what the benchmark human performance would be. So you need very extensive evaluations if you go ahead.
So what I want to control in some ways how to get patient data and so far I’m aware of certainly three different approaches. If you go to the next slide, you see that the first one is government controlled central database. Again here some example from Denmark. You can explore the Danish health data set. There is many different health data records that are being available, they’re able to research and you can research with them. But this requires to pull off the data from central repositories and while I think which is a really nice approach to get a lot of data, I have some doubts with respect to the data quality. So you really want to make sure you understand how this database was generated in order to make good use of this data and you can also see what we have (inaudible) data that is being shared right now. We see that many countries, for example, decided on how to label a case, a Corona case or not in a very different way. If you take the data and tried to make predictions from them, you get different answers depending on how the data was actually acquired.
A second option would be, of course, data donation. And they must admit that I myself involved in medical data donors. So this is GDPR compliance. We ask for consent after the evaluation. You go to the hospital, you have an example. This data is generated and you have your final outcome, you get asked whether you want to donate your data and then this would be unloaded to our service. We take care of the data processing, the annotation and then create systems. This also has nudge because you potentially need the annotation. You need the permission to share this in with the annotators. So there’s quite a few efforts you have to take in order to make this work. And, of course, you need the consent in addition because we’re talking here about image data. You also need an agreement from the hospital because they have image rides to all of the data sets. So we don’t have to ask just the parent, but we also have to have the agreement. This is a very difficult part and it requires a lot of effort to generate a data. If you go to the third option that I still have for some tasks, but of course not for all of them, you can find auxillary data sources. We look at different cells. And here we actually assure data that was collected from animals. This is a K9 scan where we were looking into the tumor of the dog. Here we ask do annotations and share the data because there is no actual patient information disclosed if you show this data. And here the key problem is, of course, whether you are age to do the transfer because the pathologies may be similar, but yet cell sizes may be different. You somehow have to find a way how to learn from this large data sources that don’t want to have the problem with the data protection and how to actually make use of them for the assessment of human disease.
So you see there are many problems and I can’t present one final solution to them. And there is sometimes GDPR may be an obstacle and best ways around it. But all of the approaches have their own difficulties. This already brings me to the end of my presentation. So I would like to thank you. Can you go to the next slide?
I would like to thank you for your attention. If you find this interesting, if you want to engage in offline discussions, you can also find me on social media. I will share the slides here in the chat later on. I am looking forward to the discussion. If you have any questions, I would be happy to take them.
>> MODERATOR: Thank you, Andrea. I will ask the remote moderator to take over.
>> MODERATOR: Thanks a lot. It is now time for questions. We encourage you participate in your full. And third is after this meeting, we can discuss further at forum. And I collect a question from Jim. And his question is: Is there a human element that does any review on the requests that come in to insure that it meets legal requirements for disclosure?
>> STEVE CROCKER: I think that question was directed at me and I think I answered it at the time. The answer is requests can be marked either as processed automatically or requiring manual review. That’s an agreed upon attribute at the time the requests template is set up. So the answer is yes. We have a way of including that as a requirement in the specification of requests.
>> MODERATOR: Okay. Great. Thanks, Steve. We do have another question. His question is: How is data such as brain waves managed?
>> Brain waves?
>> MODERATOR: That’s his question.
>> Andre yes: Let me think about that. For example, a brain scan? It is already a huge problem because if you have the surface of the brain, you can actually identify the person only from the structure. So from the brain itself, from the surface, you can derive a foot print that is almost as good to identify a person as a facial scan. So which also related to the structure of the head and the skull is very rigid.
Now with brain waves, one may argue whether this is patient data or not. If you just have the brain waves and you don’t have any context, the key question here and I think in all of the medical data is whether you are able to identify a person from this data set or not. So we did some experiments in looking into this unfortunate not the brain waves. We know that brain surfaces are able to identify persons unless you have identical twins. We have difficulties abdominal scan. So if you lie down, then every time you stand up and lie down, your organs are shifted around. So this is not as successful. We have some evidence also in the chest X ray data. You kind of can guess who that person is. But it’s not like it’s a huge probability that you can reidentify somebody with a chest X ray image. But this means that the method we have been using was not age to identify the person at a high rate. It didn’t mean there might be methods developed in the future that cancels the problem much better. So this is also a problem here. So, of course, we want to make sure that the person cannot be identified. Otherwise you have to follow the GDPR guidelines. Then it’s really patient data.
>> MODERATOR: Thank you for answering this question. I also received a question from (inaudible) from the Council of Europe. His question is: Due to medical data, can you really normanize the data at the same time leave enough data to process? The third question is what with the case in data such as bone mar row. That’s quite specific.
>> Andreas: Let me start with the first question is how unique is the data. I already hinted at that. We can tell you how good we can break anonymity with state methods. In other data, we have not succeeded yet. If you have data that is essentially revealing the identity, for example, speech, so from a speech signal, you can very easily find which person has been saying this. So speech (inaudible) allows the head skin or facial skin. In these cases, you have to make sure you work GDPR compliant so that you obey the data protection and you can’t anonymize the data sets. There might be ways of anonymizing data using random formations, pseudonyms. You apply to a volume, to an image such that you cannot identify the person from the changed image.
Now the main question that is also asked is whether you introduce the anatomization, will you spoil the actual results? So will a method trying to detect the disease pick up the method that has been trained to anonymizes data and this is, of course, a huge risk. So do you want to make sure that if you have such methods that they don’t spoil the subsequent processing? And here the last question is: Bone mar row. So this depends on what data you actually acquiring from the bone mar row, what you are sampling? If you’re sampling a biopsy stance and then image analysis of the bone mar row. Then I think it’s not very likely that you can identify the person. Although, if you look at cell structures, for example, we have seen someone preliminary experiments that you may be able to identify a certain type of cancer and a particular kind of cancerous growth and link it back to that specific tumor. So you potentially also identify certain tumors and certain types of cancer that are then, of course, identifying a certain person from image data. So this is all a huge problem. And you have to be very careful with these data sets. By the way, another approach that I haven’t shown in the slides here is federated. So in federated learning, you would leave all the data in the hospital and you would only send the model for training towards the hospital training the model there such that the data never leaves and you pull off the train model from the hospital. But this one also has problems because if you – if you look at the trained model stance and metrics pick up a lot of things, there are inversion techniques that can try to obtain certain samples from the training data. So also this may be problematic in terms of data protection. I don’t have a final answer yet.
>> MODERATOR: Thank you for answering this question. Then we super one more question from Robert Guerra and his question is: What is the error rate of AI processing ever radiology and better images? Can they be reduced by mandating a higher quality image being produced at a source?
>> Andreas: Imaging is a huge problem. I talked about label quality. The other one is not data quality and this is indeed, true. If you standardize the acquisition process, for example, if (inaudible) would take this biopsy sample, sliced them up and stained them, then you scan them. Now, the entire process of generating the data has huge degrees of variations. And there is good reasons to believe that the staining procedure that is different in various hospitals is a limiting factor to share automatic diagnosis systems between different sites. And there’s big companies working on automatic scanning systems and they are also working on automatic staining systems that used to dive and put in the correct colors in a standardized way. And the same is true for radio logical images because you can adjust a lot of parameters with respect to image quality and these are typically done to a particular physician, to a particular diagnosis and all of the parameters are set for that specific purpose of this image. And they are set in a way that the intervention or person diagnosis is happy with that particular image quality. This is also a limiting fact offer for sharing ah I solutions because they have to learn to cope with these large variations. We can see in several applications that AI solutions achieve results that are on par with human experts. But depending on the difficulty of the diagnosis and how clearly it can be made, the rates then also go down. So if you have cases that are difficult to diagnose, I don’t want to generalize, but if you have a feeling for that, things that are also difficult to diagnose for medical doctor report with a high degree of likelihood and also difficult to diagnose from AI system. So if you – I think the first thing is where AI will be very successful is in bred and butter tasks and tasks that are done every day that are massively done every day and these tasks are likely to be (inaudible) first for solutions.
>> MODERATOR: Thank you very much. And also for this answer. I’m closing the queue for the moment. And after that, I’m collecting again questions. And now I’m giving back the floor to Yern.
>> Thank you for the Q&A which gave some very important things and we will continue the Q&A after the next speaker. When we have seen how important data is for AI and I imagine some minority data might be missing and therefore the diagnosis system might miss out on specifics ever minority group, we can see the issue that is arising and information freedom and data protection. Now we come to another issue than Europe is not just the EU, but we also have countries outside the EU that cope with some specific provisions of GDPR that the transfer to third countries that have not an equivalent level of protection is having a large barrier under the GDPR. To see what countries do that cope with these issues, please, Marina, introduce our next speaker.
>> MARINA SHENTSOVA: Last but not least is Lelijana Pecova from Macedonia. And she will not be making a presentation, but start a topic. On the discussion, The topic is GDPR difficulties of new EU members along with non‑EU COE members. You are welcomed to start your short topic.
>> LELIJANA PECOVA: Hi, everyone. So allow me briefly to say a few things on data blay and developing region. You mentioned non‑EU members and the cases we had in the previous period. So as non‑EU members, but Council of Europe members on a policy level, we are playing with both instruments. So recently, we have adopted the new law that is complained to GDPR and signed the protocol over the Council of Europe. Yet that was done also from Serbia, neighboring countries in the region and on a policy level, it’s kind of that we are following what other EU members are doing, but basically on enforcement level and implementation level, I have had many difficulties to explain the extra territorial effect, the exchange of data and, of course,s transparency accountant prince pills of the GDPR. So far, we are only speaking about and sorry to say, we are only speaking about the privacy policies, cookies, concept, very basic levels. So what I’m doing is now actually enabling the environment around need to understand what really GDPR means.
So far, I know specific technical and organizational efforts to introduce specific sectoral measures and provisions to implement GDPR requirements. And I can only say that financial and insurance sector in the country are much closer to the GDPR compliance, but we failed in the medical data protection region wise. As I saw that medical data are really important in this very difficult period for all of us, and it was situation that the region really developing region did not answer very efficiently in this specific GDPR regards. So the decision of the requirement of Montenegro with the consent of data protection (inaudible) the public with a list of people who are in isolation was bizarre. Not only they did encourage lynching, hate speech, but it gave the public the role of the prosecutor which actually was an (inaudible) to our meet some of the media in the country to say that we need release public (inaudible) available due to the public health awareness. So we had a map with information, but with the people who were infected by locations, by cities, we had information on gender and age, but we did not thank God have the information on personal data.
COVID‑19 changed the rules of the game, but also presented as Civil Society sector specifically to present GDPR,s EU regulation and the council of instruments and all of the links you provided here on the chat about the guidelines and how to protect data, but also put forward several issues and concerns. Legitimacy and legality as the example of the mainstream of health in Serbia published health data of persons who died from COVID or a recorded medical conversations by the medical nurses in the hospital in Bosnia, profiling as in commission forward by epidemiologist. Doing the profiling and map think the potential patients or cross border corporation ever sharing medical data. And I would put here a note of concern or a challenge how do we actually react and respond to further sharing cross border data, cross border corporation in sharing medical data ahead of summer vacations between north Macedonia and Greece. Greece as in EU member country.
Law awareness of data breach notification policies is a must. Access to own data and balance to the right of public to know. So, even today, I’m having a harsh discussion on social media about policy done with information ever one women who gave birth and was infected with Corona virus, but balancing with the media the right for the public to know this information who the person is. So there is no separate privacy law in our legislation. And I’ve always discussed this on a level of privacy protection. Privacy inputs assessments are rarely done. So the question would be I’m asking myself as well: Do we need a privacy framework as a legal framework in this region in order to upgrade the understanding? But also I would refer to the quotes of the EU firewall. There is only one thing I am happy about there this quote. The data protection is described as a principle, not as a legal instrument. Is data protection a democratic value or is it justness model taken out of from the GDPR? So where do we put the redlines and how do we make the balance and how do we urge to have data privacy standards not only as a GDPR as I said a business model?
And the last presentation the medical data is very much concerned in this period. You mentioned, Andreas, the consent to give an individual collection or donation of data, but what about consenting for the data sets when they – when that data is extracted as a member of a specific minority group? So what would be done the procedure and I really even need, you know, a double check on the GDPR requirements on this issue. I would stop here in order to make the floor open for discussion because I think as we raised a lot of questions and dilemmas. Thank you.
>> MODERATOR: Thank you very much. I already see that we have a couple of questions for you, but also please feel free to ask questions to the other three speakers. I give over to (inaudible) to moderate the Q&A. Thank you.
>> MODERATOR: Thank you very were. Yes indeed. We have received some questions and I will start with a question from Robert Guerra. His question is: Has there been any case where inadequately protected medical and/or genetic data has been used to discriminate individuals such as for insurance purposes?
>> for insurance purposes, no, but as far as I know. But to discriminate individuals and to put them on a minial inch, yes. Not only in my country North Macedonia, but it actually happened in Serbia and discrimination level has been to even use ethical minorities groups. As I mentioned, and I’m seeing already the next comment in compliance with the GDPR in Armenia. Policies are not sufficient to cure the situation. That was my question, Mike, as well because we are adopting the laws, we are the parliament is doing their job, but it is the implementation level, the enforcement level is far behind of what we can call a success. I urge also the EU members states to be very clear and to have consistent decisions in how we are implementing the GDPR because we have seen a lot of inconsistencies in several provisions and measures and you have mentioned couple of them in the previous presentations. So what we need as a non‑e U member country because we are members of the Council of Europe, we are following both instruments, but we would much likely go forward for the EU regulation. Consistency in implementing the GDPR provisions would be a good way forward.
>> MODERATOR: Thanks a lot for answering this question. Our next question is from Emily (inaudible) Mitchell. Than question is: What can be done to support medical collaboration? Who is willing to pick up that question?
>> Andreas: I think we have to collaborate. Medical collaboration is probably a mighty center study. Is that what you are referring to?
>> MODERATOR: I’m asking Emily to unmute himself and to clarify this question.
>> Hi, there.
>> pick up that question.
>> Yeah. Just a very general question. We’re in COVID and all of us are just –
>> (inaudible) clarify medical collaboration.
>> MODERATOR: I hear there’s a delay on the line. I will ask Emily just to speak fully and ignore all the background sounds.
>> You’re unmuted. You need to unmute yourself, Emily.
>> Yes. Thank you, yes. Is a delay. I just – this is a very general question because all of us as concerned citizens, we want, you know, some rapid solution to COVID‑19 and just wondering how if there are any special agreements that can be put in place, for instance, that will sort of enable (no sound) –
>> Yes, sorry. I guess the delay is too much to have a good conversation on this line. So I’m ask if there are any other questions on the line? If Emily can write in the chat a clarification. At the moment, I don’t have any questions in the line. If people do have questions, please raise your hand to support some interaction.
>> Maybe I can add one question I was trying to start before the – we had the last presentation. When coming to medical data, if somebody is left out of medical data and we train AI, it means that does this person or this group cannot be taken into account by am – for training medical system. So, um, what do you think about this issue that if certain groups does not want to consent, they might be discriminated against by AI systems that are trained with the data. Do you think this is an issue? Do you see a possibility to approach that? Maybe we could have this for Andreas and for Robert.
>> of course this is an issue. This issue occurs if you go to traditional medical doctor in your local hospital. Just not (inaudible) with minorities. If you have a very rare disease, then there is a certain likelihood that your physician has never seen this disease and he doesn’t recognize it. The chance is that I could potentially pick up such a rare disease. But what is more likely to occur is that in the near future that we all reduce AI to facilitate difficult tasks and, ever course, you have to make sure that you have an appropriate sampling for the persons on the use. So if you’re only trained on certain ethnicities, then you can also only validate for that specific group. You always have this problem of the complex set with all of these machine learning approaches that all of the statements that you’re doing, they will only be adequate if you have a representative sampling in your training data set. In the sampling is not representative, we cannot guarantee an appropriate prediction for those out of sampling errors. And if you train a big system and you have not sampled according to the population structure, then you may run into problems, of course. Am.
>> thank you for that answer, Andreas. I have received the question from Amali. And her question is: How can agreements support rapid collaboration for COVID‑19?
>> ANDREAS: This is rather a legal question than a technical one. To be honest, I think what is being done in the COVID‑19 case is pretty good because we see public repositories emerging and they provide population data. But what you also see in the repositories is that in every country, you have a different interpretation example of what is a Corona casualty. Some say this is somebody who has tested positive and died and other countries say okay. These are casualty because he died of Corona. So again data quality is an issue.
>> MODERATOR: I wanted to follow up on your question as well. This raises two things. In addition to data quality, regarding Andreas and you mentioned the issue of false positives. So for diagnostic tests, if the tests aren’t done the same, there can be a different error rate and this hasn’t been seen in the U.S. and other places did the test isn’t the right type, but the best results still shared, then what does that mean? Is that a positive elsewhere? I think hearing your comments, I think it really had me link back to something I mentioned in my presentation. If in Hungary somebody who made it on the Forbes top 10 list of richest people in Hungary was worried and used a GDPR to get themselves off the list, what is a person going to do who has their medical data perhaps used in a data set or someone’s pregnancy data or some other data mistakenly goes out there and how do we protect it? I think we have a worry and I think it’s been mentioned in the chat as well too there neigh need to be additional safe guards for medical and generic data. As genetic data becomes for the genome and diseases, that gets used a lot more. There may need to be additional safe guards and I know different institutions are working on that as well too because once the data is out there, it’s out there. I don’t see how, you know, your genetic testing can be removed once it’s out there as well too. People are complaining about their tax records being online and they can’t be taken down later, in case of the Nordic countries, what do you do when your medical data goes outside? We need to make sure that may be going to your earlier comment. That may be the reason why Europe wants to close in and protect its data and citizens, but I worry it may not be protecting it from commercialization and other (inaudible) as well. We haven’t had a discussion about second uses of the data. I worry the UK is leaving and they have been sharing the NHS with commercial sources as well too. About those sources then collaborate internationally, it is really difficult to then, you know, remove your data from the data sets. I think, you know, medical issues create great challenges for GDPR. And the issue is in regards to balancing rights. It shows the protection still has a long way to go in my view.
>> Okay. Normal data will lose its protection when the person dies. Data protection ends (inaudible) the person. But genetic data also concerns your (inaudible) and know their (inaudible). So genetic data remains risky for a longer time. So that’s the reason why we have to take care about genetic data. I think we are approaching at the end of our session and we will have Geneva Internet Platform doing the resume of this session.
>> Hello, everyone. I am with the Geneva Internet Platform and I’m providing the key messaging from the section. So I want to thank everyone, all the panelists and everyone in the discussions for a very great session.
I have summed up the first point as follows. If we can have the next slide, please. Okay.
The first one being privacy regulation that GDPR is based that the rule of law is respected. However, law is not applied in the same way across borders.
The second point being: There are important tensions between data gathering or information freedom and privacy protection arising from compliance and other privacy regulation. A fresh look at registration models is needed to enable seas and information that abides by privacy rules and enables sustainable sharing of information.
And finally, I think I have a last one, which I just added. GDPR is protecting data on the application. There is a need for additional safe guards for medical data. So that would be it for me. Thank you very much.
>> Maybe we should add something about information freedom that we should find a balance between information freedom and data protection and that we are aware that the risk in both directions.
>> Absolutely. So these comments will – these messages will be up for commenting and then in the WIKIs. So then we can make this modification after.
>> Thank you very much. Thank you, Marina, for moderating. This session, thank you for speaking, for all speakers and thank you for the remote Mod righter and, of course, thank you, Cedric for your fine conclusion of this session.
>> thank you very much. It was very interesting and I feel we have learned a lot across different sectors and fields. I would like to give the remote Mod lighter to give the final notes.
>> Thank you, Nadia. Thank you to all speakers for their participation. Your active participation. The chat was extremely active and I hope she’s continuing in the EuroDIG form.
>> Thank you for the participation and I forgot to mention thank you very much and have a nice continuation of the EuroDIG.
>> Thank you very much.