AI in medicine: who watches the algorithms?
George Milner considers the troubling sides of machine learning in clinical practice
Artificial intelligence (AI) has the potential to transform medical practice on an unprecedented scale. As with any new technology to be used in the practice of medicine, this requires some cautionary notes, without detracting from the enormous potential of the effective, fair and safe usage of AI powered technologies in healthcare.
While a strength of AI is its ability to analyse vast amounts of data and integrate it for decision making, this is also the root of a potential weakness. The old adage of “garbage in, garbage out” applies. Machine learning algorithms are only as good as the data which they are built on. Biased data sets or those formed in the context of out-dated practices risk perpetuating such approaches. Any incorrect or inconsistent labelling within these data sets, for example a miscategorised X-ray of a patient with lung cancer, will hamper the accuracy of AI and risks being replicated.
Furthermore, racial or gender underrepresentation in original data sets, for example, may lead to AI performing poorly for certain patient groups. A 2014 study into racial disparities in cancer mortality in the US concluded that lack of diversity in study group participants played a significant role in poorer cancer outcomes for African Americans. Concerns have also been raised regarding disparities in accuracy of identification of melanomas, a form of skin cancer, between individuals of different skin colour. Underrepresentation can be a result of systematic exclusion from trials and sampling, but also due to patterns in who consents for their data to be used. In addition to leaving such groups out of the benefits of AI, algorithms have the potential to actively harm patients for who are not represented in their development.
Even accurate software risks losing that accuracy when applied inappropriately out of the context in which it was developed.
Such ‘out-of-sample’ inputs risk going undetected by clinicians and generating harmful actions. This therefore requires clear guidance on which clinical cases a particular algorithm should be applied to. Similar issues can also apply when the statistical properties of a disease change over time and no longer reflects the data that AI algorithms were trained on. It can also be revealed when new data are introduced in real time to bolster an algorithm’s performance, a process called adaptive learning, but these data are contextually different or have new associations. Thus it is important that the performance of AI is independently scrutinised over time to validate claims of efficacy. It may be difficult to differentiate between useful clinical tools and woefully inaccurate ones.
Exacerbating these shortcomings is automation bias. Decision support systems for healthcare professionals can lead to complacency in the face of incorrect advice. The hazards of handing over complete control to software without sufficient human oversight are exemplified in deadly crashes involving Boeing’s 737 Max planes in which their automated stabilisation systems were heavily implicated.
AI can also lack transparency. This feature, known as black box AI, can obstruct the identification of flaws such as biases in automated decision-making processes. A famous warning to this effect is given in a story of American researchers developing tank detection software. Supposedly, it transpired that all of the photos with tanks in them were taken on a sunny day, in contrast to the cloudy conditions in the group without tanks. The software had simply learnt to accurately distinguish photos based on the weather. This is a nice illustration of how a lack of accountability of how irrelevant, but predictive, correlations in training sets can misinform the behaviour of algorithms in real life. In the same way, if the data were collected from a control group at a different time of day to a group with a condition an algorithm could learn to distinguish between them based on this and not on relevant clinical data. Nevertheless, much like clinicians, some algorithms are able to identify which aspects of input data they ascribe weight to in reaching their conclusions.
Unlike clinicians, however, many machine learning systems provide no indication of the level of confidence in their answer, nor are they always trained to balance the consequences of success and failure. A human clinician with a cautious approach to identifying treatable tumours may underperform in overall accuracy, yet outperform AI in patient outcome by reducing the number of crucial cases missed. For AI to optimise decision-making it must be aware of which outcomes it should prioritise. Putting all decisions in the hands of AI could ignore human considerations by prioritising crude survival rates over quality of life, given the available data and programmed goals. It could prioritise recipients of organ donation over patients in comas if programmed to consider the effects of a treatment decision based solely on very restricted criteria. What about geographical and socioeconomic factors?
Another issue comes when medical practice guidelines are altered. This is a relatively frequent occurrence as best practice is updated continually as more information becomes available. While rules-based clinical decision making can be updated fairly swiftly, updating vast AI data sets is far more difficult and expensive. Creating and updating data sets brings concerns surrounding data privacy too. For example, it was ruled in 2017 that the Royal London Free Hospital breached such data privacy laws in sharing 1.6m patients’ data with DeepMind, a UK AI start-up acquired by Google in 2014, as part of their work with Google Health UK. The more data is anonymised by slimming it down however, the less useful it is for research. Even anonymised patient data can in theory be cross referenced to allow identification. Security concerns exist too beyond safeguarding sensitive data as AI software must be protected from hacking and manipulation.
Ultimately, an unsavoury fact is that the use of AI in medicine will disadvantage some patients and advantage others. As is also the case with all healthcare decisions, it is necessary to weigh up the potential for harm with the potential for good in each context in which AI is applied. All medical practice should be continually critiqued and refined, and AI in healthcare is no exception. An EU commission report lists seven requirements that AI should meet including transparency, technical robustness and safety, and human agency and oversight. Nevertheless, AI doesn’t have to be perfect to be used in medicine. An unrealistically high bar for safety risks denying vast numbers of people the many potential health benefits.
- Lifestyle / How to survive a visit from a home friend19 November 2024
- Comment / Cambridge’s LinkedIn culture has changed the meaning of connection15 November 2024
- Comment / Cambridge’s safety nets are often superficial20 November 2024
- News / Cambridge ‘breaking agreement’ with pro-Palestine students19 November 2024
- Features / Vintage Varsity: the gowns they are a-going15 November 2024