• Research article
  • Open access
  • Published: 18 January 2021

Exploring the impact of Artificial Intelligence and robots on higher education through literature-based design fictions

  • A. M. Cox   ORCID: orcid.org/0000-0002-2587-245X 1  

International Journal of Educational Technology in Higher Education volume  18 , Article number:  3 ( 2021 ) Cite this article

63k Accesses

40 Citations

32 Altmetric

Metrics details

Artificial Intelligence (AI) and robotics are likely to have a significant long-term impact on higher education (HE). The scope of this impact is hard to grasp partly because the literature is siloed, as well as the changing meaning of the concepts themselves. But developments are surrounded by controversies in terms of what is technically possible, what is practical to implement and what is desirable, pedagogically or for the good of society. Design fictions that vividly imagine future scenarios of AI or robotics in use offer a means both to explain and query the technological possibilities. The paper describes the use of a wide-ranging narrative literature review to develop eight such design fictions that capture the range of potential use of AI and robots in learning, administration and research. They prompt wider discussion by instantiating such issues as how they might enable teaching of high order skills or change staff roles, as well as exploring the impact on human agency and the nature of datafication.

Introduction

The potential of Artificial Intelligence (AI) and robots to reshape our future has attracted vast interest among the public, government and academia in the last few years. As in every other sector of life, higher education (HE) will be affected, perhaps in a profound way (Bates et al., 2020 ; DeMartini and Benussi, 2017 ). HE will have to adapt to educate people to operate in a new economy and potentially for a different way of life. AI and robotics are also likely to change how education itself works, altering what learning is like, the role of teachers and researchers, and how universities work as institutions.

However, the potential changes in HE are hard to grasp for a number of reasons. One reason is that impact is, as Clay ( 2018 ) puts it, “wide and deep” yet the research literature discussing it is siloed. AI and robotics for education are separate literatures, for example. AI for education, learning analytics (LA) and educational data mining also remain somewhat separate fields. Applications to HE research as opposed to learning, such as the robot scientist concept or text and data mining (TDM), are also usually discussed separately. Thus if we wish to grasp the potential impact of AI and robots on HE holistically we need to extend our vision across the breadth of these diverse literatures.

A further reason why the potential implications of AI and robots for HE are quite hard to grasp is because rather than a single technology, something like AI is an idea or aspiration for how computers could participate in human decision making. Faith in how to do this has shifted across different technologies over time; as have concepts of learning (Roll and Wylie, 2016 ). Also, because AI and robotics are ideas that have been pursued over many decades there are some quite mature applications: impacts have already happened. Equally there are potential applications that are being developed and many only just beginning to be imagined. So, confusingly from a temporal perspective, uses of AI and robots in HE are past, present and future.

Although hard to fully grasp, it is important that a wider understanding and debate is achieved, because AI and robotics pose a range of pedagogic, practical, ethical and social justice challenges. A large body of educational literature explores the challenges of implementing new technologies in the classroom as a change management issue (e.g. as synthesised by Reid, 2014 ). Introducing AI and robots will not be a smooth process without its challenges and ironies. There is also a strong tradition in the educational literature of critical responses to technology in HE. These typically focus on issues such as the potential of technology to dehumanise the learning experience. They are often driven by fear of commercialisation or neo-liberal ideologies wrapped up in technology. Similar arguments are developing around AI and robotics. There is a particularly strong concentration of critique around the datafication of HE. Thus the questions around the use of AI and robots are as much about what we should do as what is possible (Selwyn, 2019a ). Yet according to a recent literature review most current research about AI in learning is from computer science and seems to neglect both pedagogy and ethics (Zawacki-Richter et al., 2019 ). Research on AIEd has also been recognised to have a WEIRD (western, educated, industrialized, rich and democratic) bias for some time (Blanchard, 2015 ).

One device to make the use of AI and robots more graspable is fiction, with its ability to help us imagine alternative worlds. Science fiction has already had a powerful influence on creating collective imaginaries of technology and so in shaping the future (Dourish and Bell, 2014 ). Science fiction has had a fascination with AI and robots, presumably because they enhance or replace defining human attributes: the mind and the body. To harness the power of fiction for the critical imagination, a growing body of work within Human Computer Interaction (HCI) studies adopts the use of speculative or critical narratives to destabilise assumptions through “design fictions” (Blythe 2017 ): “a conflation of design, science fact, and science fiction” (Bleecker, 2009 : 6). They can be used to pose critical questions about the impact of technology on society and to actively engage wider publics in how technology is designed. This is a promising route for making the impact of AI and robotics on HE easier to grasp. In this context, the purpose of this paper is to describe the development of a collection of design fictions to widen the debate about the potential impact of AI and robots on HE, based on a wide-ranging narrative literature review. First, the paper will explain more fully the design fiction method.

Method: design fictions

There are many types of fictions that are used for our thinking about the future. In strategic planning and in future studies, scenarios—essentially fictional narratives—are used to encapsulate contrasting possible futures (Amer et al., 2013 ; Inayatullah, 2008 ). These are then used collaboratively by stakeholders to make choices about preferred directions. On a more practical level, in designing information systems traditional design scenarios are short narratives that picture use of a planned system and that are employed to explain how it could be used to solve existing problems. As Carroll ( 1999 ) argues, such scenarios are also essentially stories or fictions and this reflects the fact that system design is inherently a creative process (Blythe, 2017 ). They are often used to involve stakeholders in systems design. The benefit is that the fictional scenario prompts reflection outside the constraints of trying to produce something that simply works (Carroll, 1999 ). But they tend to represent a system being used entirely as intended (Nathan et al., 2007 ). They typically only include immediate stakeholders and immediate contexts of use, rather than thinking about the wider societal impacts of pervasive use of the technology. A growing body of work in the study of HCI refashions these narratives:

Design fiction is about creative provocation, raising questions, innovation, and exploration. (Bleecker, 2009 : 7).

Design fictions create a speculative space in which to raise questions about whether a particular technology is desirable, the socio-cultural assumptions built into technologies, the potential for different technologies to make different worlds, our relation to technology in general, and indeed our role in making the future happen.

Design fictions exist on a spectrum between speculative and critical. Speculative fictions are exploratory. More radical, critical fictions ask fundamental questions about the organisation of society and are rooted in traditions of critical design (Dunne and Raby, 2001 ). By definition they challenge technical solutionism: the way that technologies seem to be built to solve a problem that does not necessarily exist or ignore the contextual issues that might impact its success (Blythe et al., 2016 ).

Design fictions can be used in research in a number of ways, where:

Fictions are the output themselves, as in this paper.

Fictions (or an artefact such as a video based on them) are used to elicit research data, e.g. through interviews or focus groups Lyckvi et al. ( 2018 ).

Fictions are co-created with the public as part of a process of raising awareness (e.g. Tsekleves et al. 2017 ).

For a study of the potential impact of AI and robots on HE, design fictions are a particularly suitable method. They are already used by some authors working in the field such as Pinkwart ( 2016 ), Luckin and Holmes ( 2017 ) and Selwyn et al. ( 2020 ). As a research tool, design fictions can encapsulate key issues in a short, accessible form. Critically, they have the potential to change the scope of the debate, by shifting attention away from the existing literature and its focus on developing and testing specific AI applications (Zawacki-Richter et al., 2019 ) to weighing up more or less desirable directions of travel for society. They can be used to pose critical questions that are not being asked by developers because of the WEIRD bias in the research community itself (Blanchard, 2015 ), to shift focus onto ethical and social justice issues, and also raise doubts based on practical obstacles to their widespread adoption. Fictions engage readers imaginatively and on an affective level. Furthermore, because they are explicitly fictions readers can challenge their assumptions, even get involved in actively rewriting them.

Design fictions are often individual texts. But collections of fictions create potential for reading against each other, further prompting thoughts about alternative futures. In a similar way, in future studies, scenarios are often generated around four or more alternatives, each premised on different assumptions (Inayatullah, 2008 ). This avoids the tendency towards a utopian/ dystopian dualism found in some use of fiction (Rummel et al., 2016 ; Pinkwart 2016 ). Thus in this study the aim was to produce a collection of contrasting fictions that surface the range of debates revolving around the application of AI and robotics to HE.

The process of producing fictions is not easy to render transparent.

In this study the foundation for the fictions was a wide-ranging narrative review of the literature (Templier and Paré, 2015 ). The purpose of the review was to generate a picture of the pedagogic, social, ethical and implementation issues raised by the latest trends in the application of AI and robots to teaching, research and administrative functions in HE, as a foundation for narratives which could instantiate the issues in a fictional form. We know from previous systematic reviews that these type of issue are neglected at least in the literature on AIEds (Zawacki-Richter et al., 2019 ). So the chief novelty of the review lay in (a) focusing on social, ethical, pedagogic and management implications (b) encompassing both AI and robotics as related aspects of automation and (c) seeking to be inclusive across the full range of functions of HE, including impacts on learning, but also on research and scholarly communications, as well as administrative functions, and estates management (smart campus).

In order to gather references for the review, systematic searches on the ERIC database for relevant terms such as “AI or Artificial Intelligence”; “conversational agent”, “AIED” were conducted. Selection was made for items which either primarily addressed non-technical issues or which themselves contained substantial literature reviews that could be used to gain a picture of the most recent applications. This systematic search was combined with snowballing (also known as pearl growing techniques) using references by and to highly relevant matches to find other relevant material. While typically underreported in systematic reviews this method has been shown to be highly effective in retrieving more relevant items (Badampudi et al. 2015 ). Some grey literature was included because there are a large number of reports by governmental organisations summarizing the social implications of AI and robots. Because many issues relating to datafication are foreshadowed in the literature on learning analytics, this topic was also included. In addition, some general literature on AI and robots, while not directly referencing education, was deemed to be relevant, particularly as it was recognised that education might be a late adopter and so impacts would be felt through wider social changes rather than directly through educational applications. Literature reviews which suggested trends in current technologies were included but items which were detailed reports of the development of technologies were excluded. Items prior to 2016 tended also to be excluded, because the concern was with the latest wave of AI and robots. As a result of these searches in the order of 500 items were consulted, with around 200 items deemed to be of high relevance. As such there is no claim that this was an “exhaustive” review, rather it should be seen as complimenting existing systematic reviews by serving a different purpose. The review also successfully identified a number of existing fictions in the literature that could then be rewritten to fit the needs of the study, such as to apply to HE, to make them more concise or add new elements (fictions 1, 3, 4).

As an imaginative act, writing fictions is not reducible to a completely transparent method, although some aspects can be described (Lyckvi et al., 2018 ). Some techniques to create effective critical designs are suggested by Auger ( 2013 ) such as placing something uncanny or unexpected against the backdrop of mundane normality and a sense of verisimilitude (perhaps achieved through mixing fact and fiction). Fiction 6, for example, exploits the mundane feel of committee meeting minutes to help us imagine the debates that would occur among university leaders implementing AI. A common strategy is to take the implications of a central counterfactual premise to its logical conclusion: asking: “what if?” For example, fiction 7 extends existing strategies of gathering data and using chatbots to act on them to its logical extension as a comprehensive system of data surveillance. Another technique used here was to exploit certain genres of writing such as in fiction 6 where using a style of writing from marketing and PR remind us of the role of EdTech companies in producing AI and robots.

Table 1 offers a summary of the eight fictions produced through this process. The fictions explore the potential of AI and robots in different areas of university activity, in learning, administration and research (Table 1 column 5). They seek to represent some different types of technology (column 2). Some are rather futuristic, most seem feasible today, or in the very near future (column 3). The full text of the fictions and supporting material can be downloaded from the University of Sheffield data repository, ORDA, and used under a cc-by-sa licence ( https://doi.org/10.35542/osf.io/s2jc8 ). The following sections describe each fiction in turn, showing how it relates to the literature and surfaces relevant issues. Table 2 below will summarise the issues raised.

In the following sections each of the eight fictions is described, set in the context of the literature review material that shaped their construction.

AI and robots in learning: Fiction 1, “AIDan, the teaching assistant”

Much of the literature around AI in learning focuses on tools that directly teach students (Baker and Smith, 2019 ; Holmes et al., 2019 ; Zawacki-Richter et al., 2019 ). This includes classes of systems such as:

Intelligent tutoring systems (ITS) which teach course content step by step, taking an approach personalised to the individual. Holmes et al. ( 2019 ) differentiate different types of Intelligent Tutoring Systems, based on whether they adopt a linear, dialogic or more exploratory model.

One emerging area of adaptivity is using sensors to detect the emotional and physical state of the learner, recognising the embodied and affective aspects of learning (Luckin, et al., 2016 ); a further link is being made to how virtual and augmented reality can be used to make the experience more engaging and authentic (Holmes et al., 2019 ).

Automatic writing evaluation (AWE) which are tools to assess and offer feedback on writing style (rather than content) such as learnandwrite, Grammarly and Turnitin’s Revision Assistant (Strobl, et al. 2019 ; Hussein et al., 2019 ; Hockly, 2019 ).

Conversational agents (also known as Chatbots or virtual assistants) which are AI tools designed to converse with humans (Winkler and Sӧllner, 2018 ).

The adaptive pedagogical agent, which is an “anthropomorphic virtual character used in an online learning environment to serve instructional purposes” (Martha and Santoso, 2017 ).

Many of these technologies are rather mature, such as AWE and ITS. However, there are also a wide range of different type of systems within each category, e.g. conversational agents can be designed for short or long term interaction, and could act as tutors, engage in language practice, answer questions, promote reflection or act as co-learners. They could be based on text or verbal interaction (Følstad et al., 2019 ; Wellnhammer et al., 2020 ).

Much of such literature reflects the development of AI technologies and their evaluation compared to other forms of teaching. However, according to a recent review it is primarily written by computer scientists mostly from a technical point of view with relatively little connection to pedagogy or ethics (Zawacki-Richter et al., 2019 ). In contrast some authors such as Luckin and Holmes, seek to move beyond the rather narrow development of tools and their evaluation, to envisioning how AI can address the grand challenges of learning in the twenty-first century (Luckin, et al. 2016 ; Holmes et al., 2019 ; Woolf et al., 2013 ). According to this vision many of the inefficiencies and injustices of the current global education system can be addressed by applying AI.

To surface such discussion around what is possible fiction 1 is based loosely on a narrative published by Luckin and Holmes ( 2017 ) themselves. In their paper, they imagine a school classroom ten years into the future from the time of writing, where a teacher is working with an AI teaching assistant. Built into their fiction are the key features of their vision of AI (Luckin et al. 2016 ), thus emphasis is given to:

AI designed to support teachers rather than replacing them;

Personalisation of learning experiences through adaptivity;

Replacement of one-off assessment by continuous monitoring of performance (Luckin, 2017 );

The monitoring of haptic data to adjust learning material to students’ emotional and physical state in real time;

The potential of AI to support learning twenty-first century skills, such as collaborative skills;

Teachers developing skills in data analysis as part of their role;

Students (and parents) as well as teachers having access to data about their learning.

While Luckin and Holmes ( 2017 ) acknowledge that the vision of AI sounds a “bit big brother” it is, as one would expect, essentially an optimistic piece in which all the key technologies they envisage are brought together to improve learning in a broad sense. The fiction developed here retains most of these elements, but reimagined for an HE context, and with a number of other changes:

Reference is also made to rooting teaching in learning science, one of the arguments for AI Luckin makes in a number of places (e.g. Luckin et al. 2016 ).

Students developing a long term relationship with the AI. It is often seen as a desirable aspect of providing AI as a lifelong learning partner (Woolf, et al. 2013 ).

Of course, the more sceptical reader may be troubled by some aspects of this vision, including the potential effects of continuously monitoring performance as a form of surveillance. The emphasis on personalization of learning through AI has been increasingly questioned (Selwyn, 2019a ).

The following excerpt gives a flavour of the fiction:

Actually, I partly picked this Uni because I knew they had AI like AIDan which teach you on principles based in learning science. And exams are a thing of the past! AIDan continuously updates my profile and uses this to measure what I have learned. I have set tutorials with AIDan to analyse data on my performance. Jane often talks me through my learning data as well. I work with him planning things like my module choices too. Some of my data goes to people in the department (like my personal tutor) to student and campus services and the library to help personalise their services.

Social robots in learning: Fiction 2, “Footbotball”

Luckin and Holmes ( 2017 ) see AI as instantiated by sensors and cameras built into the classroom furniture. Their AI does not seem to have a physical form, though it does have a human name. But there is also a literature around educational robots: a type of social robot for learning.

a physical robot, in the same space as the student. It has an intelligence that can support learning tasks and students learn by interacting with it through suitable semiotic systems (Catlin et al., 2018 ).

There is some evidence that learning is better when the learner interacts with a physical entity rather than purely virtual agent and certainly there might be beneficial where what is learned involves embodiment (Belpaeme et al., 2018 ). Fiction 2 offers an imaginative account of what learning alongside robots might be like, in the context of university sport rather than within the curriculum. The protagonist describes how he is benefiting from using university facilities to participate in an imaginary sport, footbotball.

Maybe it’s a bit weird to say, but it’s about developing mutual understanding and… respect. Like the bots can sense your feelings too and chip in with a word just to pick you up if you make a mistake. And you have to develop an awareness of their needs too. Know when is the right time to say something to them to influence them in the right direction. When you watch the best teams they are always like talking to each other. But also just moving together, keeping eyes on and moving as a unit.

The protagonist in fiction 2 describes the high level and employability skills he is learning from a sporting application of robotics. This also reminds us of how the widespread use of AI and robots in wider social contexts may be a key driver for adoption in HE.

Fiction 3: “Criticalbot in conversation”

Both fictions 1 and 2 are glossy science fictions, with a strongly futuristic feel and, as in traditional design scenarios the technology seems to be used as intended by the designer. In contrast, the third fiction is inspired by Bayne’s ( 2015 ) description of Teacherbot, a chatbot developed to support a MOOC on elearning and digital cultures. Teacherbot does not masquerade as human. Students on the course are told what it is and its interactions are clumsy enough to reveal this anyway. Yet Bayne’s ( 2015 ) argument is that it offers positive non-human affordances. Students seem to learn from it, partly by reflecting on its place in their learning. Thus fiction 3, echoing the sample transcripts between Teacherbot and real students quoted in Bayne ( 2015 ), is a fictional dialogue between a conversational agent and a group of students working on an assignment (itself on the topic of bias in AI). Criticalbot, as its name suggests, is being imagined used to teach the students to be more critical, e.g. by prompting them to think harder about how to read an academic text, raising questions about the authority of authors, and prompting discussion around quotes from a key text.

Student 1: @CriticalBot we have been set the task to research bias in AIED. What do you know about bias in AIED? CriticalBot: Blanchard is recommended in the reading list. Student 2: I think he means rtfm. Student 1: Blanchard? CriticalBot: Sorry. I should have typed: Blanchard, E. G. (2015). Socio-cultural imbalances in AIED research: Investigations, implications and opportunities. International Journal of Artificial Intelligence in Education, 25(2), 204–228. No one’s perfect. Student 1: @CriticalBot What is the main point he makes? CriticalBot: Nice try. Try reading the abstract. Abstracts summarise key points. It may also be useful to read the conclusion section to grasp the author’s main argument before deciding on whether to undertake a closer reading.

As the quotation from the fiction illustrates, echoing Bayne ( 2015 ), the conversation in Fiction 2 is not necessarily smooth; misunderstandings and conflicts occur. The fiction brings into view the less compliant vision of the student who might wish to game the system, a potential problem with AI which is apparent in the literature of AWE (Hussein et al. 2019 ). This fiction encapsulates an important alternative potential imaginary of AI, as a simple, low-tech intervention. At the same time in being designed to promote critical thinking it can also be seen as teaching a key, high-level skill. This challenges us to ask if an AI can truly do that and how.

The intelligent campus: Fiction 4, “The intelligent campus app”

The AIED literature with its emphasis on the direct application of AI to learning accounts for a big block of the literature about AI in Higher Education, but not all of it. Another rather separate literature exists around the smart or intelligent campus (e.g. JISC 2018; Min-Allah and Alrashed, 2020 ; Dong et al., 2020 ). This is the application of Internet of Things and increasingly AI to the management of the campus environment. This is often oriented towards estates management, such as monitoring room usage and controlling lighting and heating. But it does also encompass support of wayfinding, attendance monitoring, and ultimately of student experience, so presents an interesting contrast to the AIEd literature.

The fourth fiction is adapted from a report each section of which is introduced by quotes from an imaginary day in the life of a student, Leda, who reflects on the benefits of the intelligent/smart campus technologies to her learning experience (JISC, 2018). The emphasis in the report is on:

Data driven support of wayfinding and time management;

Integration of smart campus with smart city features (e.g. bus and traffic news);

Attendance monitoring and delivery of learning resources;

The student also muses about the ethics of the AI. She is presented as a little ambivalent about the monitoring technologies, and as in Luckin and Holmes ( 2017 ), it is referred to in her own words as potentially “a bit big brother” (JISC 2018: 9). But ultimately she concludes that the smart campus improves her experience as a student. In this narrative, unlike in the Luckin and Holmes ( 2017 ) fiction, the AI is much more in the background and lacks a strong personality. It is a different sort of optimistic vision geared towards convenience rather than excellence. There is much less of a futuristic feel, indeed one could say that not only does the technology exist to deliver many of the services described, they are already available and in use—though perhaps not integrated within one application.

Sitting on the bus I look at the plan for the day suggested in the University app. A couple of timetabled classes; a group work meeting; and there is a reminder about that R205 essay I have been putting off. There is quite a big slot this morning when the App suggests I could be in the library planning the essay – as well as doing the prep work for one of the classes it has reminded me about. It is predicting that the library is going to be very busy after 11AM anyway, so I decide to go straight there.

The fiction seeks to bring out more about the idea of “nudging” to change behaviours a concept often linked to AI and the ethics of which are queried by Selwyn ( 2019a ). The issue of how AI and robots might impact the agency of the learner recurs across the first four fictions.

AI and robotics in research: Fiction 5, “The Research Management Suite TM”

So far in this paper most of the focus has been on the application of AI and robotics to learning. AI also has applications in university research, but it is an area far less commonly considered than learning and teaching. Only 1% of CIOs responding to a survey of HEIs by Gartner had deployed AI for research, compared to 27% for institutional analytics and 10% for adaptive learning (Lowendahl and Williams, 2018 ). Some AI could be used directly in research, not just to perform analytical tasks, but to generate hypotheses to be tested (Jones et al., 2019 ). The “robot scientist” being tireless and able to work in a precise way could carry through many experiments and increase reproducibility (King, et al., 2009 ; Sparkes et al., 2010 ). It might have the potential to make significant discoveries independently, perhaps by simply exploiting its tirelessness to test every possible hypothesis rather than use intuition to select promising ones (Kitano, 2016 ).

Another direct application of AI to research is text and data mining (TDM). Given the vast rate of academic publishing there is growing need to mine published literature to offer summaries to researchers or even to develop and test hypotheses (McDonald and Kelly, 2012 ). Advances in translation also offer potential to make the literature in other languages more accessible, with important benefits.

Developments in publishing give us a further insight into how AI might be applied in the research domain. Publishers are investing heavily in AI (Gabriel, 2019 ). One probable landmark was that in 2019, Springer published the first “machine generated research book” (Schoenenberger, 2019 : v): a literature review of research on Lithium-Ion batteries, written entirely automatically. This does not suggest the end of the academic author, Springer suggest, but does imply changing roles (Schoenenberger, 2019 ). AI is being applied to many aspects of the publication process: to identify peer reviewers (Price and Flach, 2017 ), to assist review by checking statistics, to summarise open peer reviews, to check for plagiarism or for the fabrication of data (Heaven, 2018 ), to assist copy editing, to suggest keywords and to summarise and translate text. Other tools claim to predict the future citation of articles (Thelwall, 2019 ). Data about academics, their patterns of collaboration and citation through scientometrics are currently based primarily on structured bibliographic data. The cutting edge is the application of text mining techniques to further analyse research methods, collaboration patterns, and so forth (Atanassova et al., 2019 ). This implies a potential revolution in the management and evaluation of research. It will be relevant to ask what responsible research metrics are in this context (Wilsdon, 2015 ).

Instantiating these developments, the sixth fiction revolves around a university licensing “Research Management Suite TM “a set of imaginary proprietary tools to offer institutional level support to its researchers to increase and perhaps measure their productivity. A flavour of the fiction can be gleaned from this except:

Academic Mentor ™ is our premium meta analysis service. Drawing on historic career data from across the disciplines, it identifies potential career pathways to inform your choices in your research strategy. By identifying structural holes in research fields it enables you to position your own research within emerging research activity, so maximising your visibility and contribution. Mining data from funder strategy, the latest publications, preprints and news sources it identifies emergent interdisciplinary fields, matching your research skills and interests to the complex dynamics of the changing research landscape.

This fiction prompts questions about the nature of the researcher’s role and ultimately about what research is. At what point does the AI become a co-author, because it is making a substantive intellectual contribution to writing a research output, making a creative leap or even securing funding? Given the centrality of research to academic identity this indeed may feel even more challenging than the teaching related scenarios. This fiction also recognised the important role of EdTech companies in how AI reaches HE, partly because of the high cost of AI development. The reader is also prompted to wonder how the technology might disrupt the HE landscape if those investing in these technologies were ambitious newer institutions keen to rise in university league tables.

Tackling pragmatic barriers: Fiction 6, “Verbatim minutes of University AI project steering committee: AI implementation phase 3”

A very large literature around technologies in HE in general focuses on the challenges of implementing them as a change management problem. Reid ( 2014 ), for example, seeks to develop a model of the differing factors that block the smooth implementation of learning technologies in the classroom, such as problems with access to the technology, project management challenges, as well as issues around teacher identity. Echoing these arguments, Tsai et al.’s ( 2017 , 2019 ) work captures why for all the hype around it, Learning Analytics have not yet found extensive practical application in HE. Given that AI requires intensive use of data, by extension we can argue that the same barriers will probably apply to AI. Specifically Tsai et al. ( 2017 , 2019 ) identify barriers in terms of technical, financial and other resource demands, ethics and privacy issues, failures of leadership, a failure to involve all stakeholders (students in particular) in development, a focus on technical issues and neglect of pedagogy, insufficient staff training and a lack of evidence demonstrating the impact on learning. There are hints of similar types of challenge around the implementation of administration focussed applications (Nurshatayeva, et al., 2020 ) and TDM (FutureTDM, 2016 ).

Reflecting these thoughts, the fifth fiction is an extract from an imaginary committee meeting, in which senior university managers discuss the challenges they are facing in implementing AI. It seeks to surface issues around teacher identity, disciplinary differences and resource pressures that might shape the extensive implementation of AI in practice.

Faculty of Humanities Director: But I think there is a pedagogic issue here. With the greatest of respect to Engineering, this approach to teaching, simply does not fit our subject. You cannot debate a poem or a philosophical treatise with a machine. Faculty of Engineering Director: The pilot project also showed improved student satisfaction. Data also showed better student performance. Less drop outs. Faculty of Humanities Director: Maybe that’s because… Vice Chancellor: All areas where Faculty of Humanities has historically had a strategic issue. Faculty of Engineering Director: The impact on employability has also been fantastic, in terms of employers starting to recognise the value of our degrees now fluency with automation is part of our graduate attributes statement. Faculty of Humanities Director: I see the benefits, I really do. But you have to remember you are taking on deep seated assumptions within the disciplinary culture of Humanities at this university. Staff are already under pressure with student numbers not to mention in terms of producing world class research! I am not sure how far this can be pushed. I wouldn’t want to see more industrial action.

Learning analytics and datafication: Fiction 7, “Dashboards”

Given the strong relation between “big data” and AI, the claimed benefits and the controversies that already exist around LA are relevant to AI too (Selwyn, 2019a ). The main argument for LA is that they give teachers and learners themselves information to improve learning processes. Advocates talk of an obligation to act. LA can also be used for the administration of admissions decisions and ensuring retention. Chatbots are now being used to assist applicants through complex admissions processes or to maintain contact to ensure retention and appear to offer a cheap and effective alternative (Page and Gehlbach, 2017 ; Nurshatayeva et al., 2020 ). Gathering more data about HE also promotes public accountability.

However, data use in AI does raise many issues. The greater the dependence on data or data driven AI the greater the security issues associated with the technology. Another inevitable concern is with legality and the need to abide by appropriate privacy legislation, such as GDPR in Europe. Linked to this are clearly privacy issues, implying consent, the right to control over the use of one’s data and the right to withdraw (Fjeld et al., 2020 ). Yet a recent study by Jones ( 2020 ) found students knew little of how LA were being used in their institution or remembered consenting to allowing their data to be used. These would all be recognised as issues by most AI projects.

However, increasingly critiques of AI in learning centre around the datafication of education (Jarke and Breiter, 2019 ; Williamson and Eynon, 2020 ; Selwyn, 2019 a; Kwet and Prinsloo, 2020 ). A data driven educational system has the potential to be used or experienced as a surveillance system. “What can be accomplished with data is usually a euphemism for what can be accomplished with surveillance” (Kwet and Prinsloo, 2020 : 512). Not only might individual freedoms be threatened by institutions or commercial providers undertaking surveillance of student and teaching staff behaviour, there is also a chilling effect just through the fear of being watched (Kwet and Prinsloo, 2020 ). Students become mere data points, as surveillance becomes intensified and normalised (Manolev et al. 2019 ). While access to their own learning data could be empowering for students, techniques such as nudging intended to influence people without their knowledge undermine human agency (Selwyn, 2019b ). Loss of human agency is one of the fears revolving around AI and robots.

Further, a key issue with AI is that although predictions can be accurate or useful it is quite unclear how these were produced. Because AI “learns” from data, even the designers do not fully understand how the results were arrived at so they are certainly hard to explain to the public. The result is a lack of transparency, and so of accountability, leading to deresponsibilisation.

Much of the current debate around big data and AI revolves around bias, created by using training data that does not represent the whole population, reinforced by the lack of diversity among designers of the systems. If data is based on existing behaviour, this is likely to reproduce existing patterns of disadvantage in society, unless AI design takes into account social context—but datafication is driven by standardisation. Focussing on technology diverts attention from the real causes of achievement gaps in social structures, it could be argued (Macgilchrist, 2019 ). While often promoted as a means of empowering learners and their teachers, mass personalisation of education redistributes power away from local decision making (Jarke and Breiter, 2019 ; Zeide, 2017 ). In the context of AIEd there is potential for assumptions about what should be taught to show very strong cultural bias, in the same way that critics have already argued that plagiarism detection systems impose culturally specific notions of authorship and are marketed in a way to reinforce crude ethnic stereotypes (Canzonetta and Kannan, 2016 ).

Datafication also produces performativity: the tendency of institutions (and teachers and students) to shift their behaviour towards doing what scores well against the metric, in a league table mentality. Yet what is measured is often a proxy of learning or reductive of what learning in its full sense is, critics argue (Selwyn, 2019b ). The potential impact is to turn HE further into a marketplace (Williamson, 2019 ). It is evident that AI developments are often partly a marketing exercise (Lacity, 2017 ). Edtech companies play a dominant role in developing AI (Williamson and Eynon, 2020 ). Selwyn ( 2019a ) worries that those running education will be seduced by glittering promises of techno-solutionism, when the technology does not really work. The UK government has invested heavily in gathering more data about HE in order to promote the reform of HE in the direction of marketisation and student choice (Williamson and Eynon, 2020 ). Learning data could also increasingly itself become a commodity, further reinforcing the commercialisation of HE.

Thus fiction 6 explores the potential to gather data about learning on a huge scale, make predictions based on it and take actions via conveying information to humans or through chatbots. In the fiction the protagonist explains an imaginary institutional level system that is making data driven decisions about applicants and current students.

Then here we monitor live progress of current students within their courses. We can dip down into attendance, learning environment use, library use, and of course module level performance and satisfaction plus the extra-curricula data. Really low-level stuff some of it. It’s pretty much all there, monitored in real time. We are really hot on transition detection and monitoring. The chatbots are used just to check in on students, see they are ok, nudge things along, gather more data. Sometimes you just stop and look at it ticking away and think “wow!”. That all gets crunched by the system. All the time we feed the predictives down into departmental dashboards, where they pick up the intervention work. Individual teaching staff have access via smart speaker. Meanwhile, we monitor the trend lines up here.

In the fiction the benefits in terms of being able to monitor and address attainment gaps is emphasised. The protagonist’s description of projects that are being worked on suggests competing drivers behind such developments including meeting government targets, cost saving and the potential to make money by reselling educational data.

Infrastructure: Fiction 8, “Minnie—the AI admin assistant”

A further dimension to the controversy around AI is to consider its environmental cost and the societal impact of the wider infrastructures needed to support AI. Brevini ( 2020 ) points out that a common AI training model in linguistics can create the equivalent of five times the lifetime emissions of an average US car. This foregrounds the often unremarked environmental impact of big data and AI. It also prompts us to ask questions about the infrastructure required for AI. Crawford and Joler’s ( 2018 ) brilliant Anatomy of an AI system reveals that making possible the functioning of a physically rather unassuming AI like Amazon echo, is a vast global infrastructure based on mass human labour, complex logistic chains and polluting industry.

The first part of fiction 8 describes a personal assistant based on voice recognition, like Siri, which answers all sorts of administrative questions.The protagonist expresses some unease with how the system works, reflecting the points made by Rummel et al. ( 2016 ) about the failure of systems if despite their potential sophistication they lack nuance and flexibility in their application. There is also a sense of alienation (Griffiths, 2015 ). The second part of the fiction extends this sense of unease to a wider perspective on the usually invisible, but very material infrastructure which AI requires, as captured in Crawford and Joler ( 2018 ). In addition, imagery is drawn from Maughan’s ( 2016 ) work where he travels backwards up the supply chain for consumer electronics from the surreal landscape of hi-tech docks then visiting different types of factories and ending up visiting a huge polluted lake created by mining operations for rare earth elements in China. This perspective queries all the other fictions with their focus on using technologies or even campus infrastructure by widening the vision to encompass the global infrastructures that are required to make AI possible.

The vast effort of global logistics to bring together countless components to build the devices through which we interact with AI. Lorries queuing at the container port as another ship comes in to dock. Workers making computer components in hi-tech factories in East Asia. All dressed in the same blue overalls and facemasks, two hundred workers queue patiently waiting to be scan searched as they leave work at the end of the shift. Exploitative mining extracting non-renewable, scarce minerals for computer components, polluting the environment and (it is suspected) reducing the life expectancy of local people. Pipes churn out a clayey sludge into a vast lake.

Conclusion: using the fictions together

As we have seen each of the fictions seeks to open up different positive visions or dimensions of debate around AI (summarised in Table 2 below). All implicitly ask questions about the nature of human agency in relationship to AI systems and robots, be that through empowerment through access to learning data (fiction 1), their power to play against the system (Fiction 3) or the hidden effects of nudging (Fiction 4) and the reinforcements of social inequalities. Many raise questions about the changing role of staff or the skills required to operate in this environment. They are written in a way seeking to avoid taking sides, e.g. not to always undercut a utopian view or simply present a dark dystopia. Each contains elements that might be inspirational or a cause of controversy. Specifically, they can be read together to suggest tensions between different possible futures. In particular fictions 7 and 8 and the commercial aspects implied by the presentation of fiction 5, reveal aspects of AI largely invisible in the glossy strongly positive images in fictions 1 and 2, or the deceptive mundanity of fiction 3. It is also anticipated that the fictions will be read “against the grain” by readers wishing to question what the future is likely to be or should be like. This is one of the affordances of them being fictions.

The most important contribution of the paper was the wide-ranging narrative literature review emphasising the social, ethical, pedagogic and management issues of automation through AI and robots on HE as a whole. On the basis of the understanding gained from the literature review a secondary contribution was the development of a collection of eight accessible, repurposable design fictions that prompt debate about the potential role of AI and robots in HE. This prompts us to notice common challenges, such as around commodification and the changing role of data. It encompasses work written by developers, by those with more visionary views, those who see the challenges as primarily pragmatic and those coming from much more critical perspectives.

The fictions are intended to be used to explore staff and student responses through data collection using the fictions to elicit views. The fictions could also be used in teaching to prompt debate among students, perhaps setting them the task to write new fictions (Rapp, 2020 ). Students of education could use them to explore the potential impact of AI on educational institutions and to discuss the role of technologies in educational change more generally. The fictions could be used in teaching students of computer science, data science, HCI and information systems in courses about computer ethics, social responsibility and sustainable computing—as well as those directly dealing with AI. They could also be used in Media Studies and Communications, e.g. to compare them with other future imaginaries in science fiction or to design multimedia creations inspired by such fictions. They might also be used for management studies as a case study of strategizing around AI in a particular industry.

While there is an advantage in seeking to encompass the issues within a small collection of engaging fictions that in total run to less than 5000 words, it must be acknowledged that not every issue is reflected. For example, what is not included is the different ways that AI and robots might be used in teaching different disciplines, such as languages, computer science or history. The many ways that robots might be used in background functions or to play the role themselves of learner also requires further exploration. Most of the fictions were located in a fairly near future, but there is also potential to develop much more futuristic fictions. These gaps leave room for the development of more fictions.

The paper has explained the rationale and process of writing design fictions. To the growing literature around design fictions, the paper seeks to make a contribution by emphasising the use of design fictions as collections, exploiting different narratives and styles and genre of writing to set up intertextual reflections that help us ask questions about technologies in the widest sense.

Availability of data and materials

Data from the project is available from the University of Sheffield repository, ORDA. https://doi.org/10.35542/osf.io/s2jc8 .

Amer, M., Daim, T., & Jetter, A. (2013). A review of scenario planning. Futures, 46, 23–40.

Article   Google Scholar  

Atanassova, I., Bertin, M., & Mayr, P. (2019). Editorial: mining scientific papers: NLP-enhanced bibliometrics. Frontiers in Research Metrics and Analytics . https://doi.org/10.3389/frma.2019.00002 .

Auger, J. (2013). Speculative design: Crafting the speculation. Digital Creativity, 24 (1), 11–35.

Badampudi, D., Wohlin, C., & Petersen, K. (2015). Experiences from using snowballing and database searches in systematic literature studies. In Proceedings of the 19th International Conference on Evaluation and Assessment in Software Engineering (pp. 1–10).

Baker, T., Smith, L. and Anissa, N. (2019). Educ-AI-tion Rebooted? Exploring the future of artificial intelligence in schools and colleges. NESTA. https://www.nesta.org.uk/report/education-rebooted/ .

Bates, T., Cobo, C., Mariño, O., & Wheeler, S. (2020). Can artificial intelligence transform higher education? International Journal of Educational Technology in Higher Education . https://doi.org/10.1186/s41239-020-00218-x .

Bayne, S. (2015). Teacherbot: interventions in automated teaching. Teaching in Higher Education, 20 (4), 455–467.

Belpaeme, T., Kennedy, J., Ramachandran, A., Scassellati, B., & Tanaka, F. (2018). Social robots for education: A review. https://doi.org/10.1126/scirobotics.aat5954 .

Blanchard, E. G. (2015). Socio-cultural imbalances in AIED research: Investigations, implications and opportunities. International Journal of Artificial Intelligence in Education, 25 (2), 204–228.

Bleecker, J. (2009). Design fiction: A short essay on design, science, fact and fiction. Near Future Lab.

Blythe, M. (2017). Research fiction: storytelling, plot and design. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (pp. 5400–5411).

Blythe, M., Andersen, K., Clarke, R., & Wright, P. (2016). Anti-solutionist strategies: Seriously silly design fiction. Conference on Human Factors in Computing Systems - Proceedings (pp. 4968–4978). Association for Computing Machinery.

Brevini, B. (2020). Black boxes, not green: Mythologizing artificial intelligence and omitting the environment. Big Data & Society, 7 (2), 2053951720935141.

Canzonetta, J., & Kannan, V. (2016). Globalizing plagiarism & writing assessment: a case study of Turnitin. The Journal of Writing Assessment , 9(2). http://journalofwritingassessment.org/article.php?article=104 .

Carroll, J. M. (1999) Five reasons for scenario-based design. In Proceedings of the 32nd Annual Hawaii International Conference on Systems Sciences . HICSS-32. Abstracts and CD-ROM of Full Papers, Maui, HI, USA, 1999, pp. 11. https://doi.org/10.1109/HICSS.1999.772890 .

Catlin, D., Kandlhofer, M., & Holmquist, S. (2018). EduRobot Taxonomy a provisional schema for classifying educational robots. 9th International Conference on Robotics in Education 2018, Qwara, Malta.

Clay, J. (2018). The challenge of the intelligent library. Keynote at What does your eResources data really tell you? 27th February, CILIP.

Crawford, K., & Joler, V. (2018) Anatomy of an AI system , https://anatomyof.ai/ .

Darby, E., Whicher, A., & Swiatek, A. (2017). Co-designing design fictions: a new approach for debating and priming future healthcare technologies and services. Archives of design research. Health Services Research, 30 (2), 2.

Google Scholar  

Demartini, C., & Benussi, L. (2017). Do Web 4.0 and Industry 4.0 Imply Education X.0? IT Pro , 4–7.

Dong, Z. Y., Zhang, Y., Yip, C., Swift, S., & Beswick, K. (2020). Smart campus: Definition, framework, technologies, and services. IET Smart Cities, 2 (1), 43–54.

Dourish, P., & Bell, G. (2014). “resistance is futile”: Reading science fiction alongside ubiquitous computing. Personal and Ubiquitous Computing, 18 (4), 769–778.

Dunne, A., & Raby, F. (2001). Design noir: The secret life of electronic objects . New York: Springer Science & Business Media.

Fjeld, J., Achten, N., Hilligoss, H., Nagy, A., & Srikumar, M. (2020). Principled artificial intelligence: Mapping consensus in ethical and rights-based approaches to principles for AI. SSRN Electronic Journal . https://doi.org/10.2139/ssrn.3518482 .

Følstad, A., Skjuve, M., & Brandtzaeg, P. (2019). Different chatbots for different purposes: Towards a typology of chatbots to understand interaction design. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 11551 LNCS , pp. 145–156. Springer Verlag.

Future TDM. (2016). Baseline report of policies and barriers of TDM in Europe. https://project.futuretdm.eu/wp-content/uploads/2017/05/FutureTDM_D3.3-Baseline-Report-of-Policies-and-Barriers-of-TDM-in-Europe.pdf .

Gabriel, A. (2019). Artificial intelligence in scholarly communications: An elsevier case study. Information Services & Use, 39 (4), 319–333.

Griffiths, D. (2015). Visions of the future, horizon report . LACE project. http://www.laceproject.eu/visions-of-the-future-of-learning-analytics/ .

Heaven, D. (2018). The age of AI peer reviews. Nature, 563, 609–610.

Hockly, N. (2019). Automated writing evaluation. ELT Journal, 73 (1), 82–88.

Holmes, W., Bialik, M. and Fadel, C. (2019). Artificial Intelligence in Education . The center for curriculum redesign. Boston, MA.

Hussein, M., Hassan, H., & Nassef, M. (2019). Automated language essay scoring systems: A literature review. PeerJ Computer Science . https://doi.org/10.7717/peerj-cs.208 .

Inayatullah, S. (2008). Six pillars: Futures thinking for transforming. foresight, 10 (1), 4–21.

Jarke, J., & Breiter, A. (2019). Editorial: the datafication of education. Learning, Media and Technology, 44 (1), 1–6.

JISC. (2019). The intelligent campus guide. Using data to make smarter use of your university or college estate . https://www.jisc.ac.uk/rd/projects/intelligent-campus .

Jones, E., Kalantery, N., & Glover, B. (2019). Research 4.0 Interim Report. Demos.

Jones, K. (2019). “Just because you can doesn’t mean you should”: Practitioner perceptions of learning analytics ethics. Portal, 19 (3), 407–428.

Jones, K., Asher, A., Goben, A., Perry, M., Salo, D., Briney, K., & Robertshaw, M. (2020). “We’re being tracked at all times”: Student perspectives of their privacy in relation to learning analytics in higher education. Journal of the Association for Information Science and Technology . https://doi.org/10.1002/asi.24358 .

King, R. D., Rowland, J., Oliver, S. G., Young, M., Aubrey, W., Byrne, E., et al. (2009). The automation of science. Science, 324 (5923), 85–89.

Kitano, H. (2016). Artificial intelligence to win the nobel prize and beyond: Creating the engine for scientific discovery. AI Magazine, 37 (1), 39–49.

Kwet, M., & Prinsloo, P. (2020). The ‘smart’ classroom: a new frontier in the age of the smart university. Teaching in Higher Education, 25 (4), 510–526.

Lacity, M., Scheepers, R., Willcocks, L. & Craig, A. (2017). Reimagining the University at Deakin: An IBM Watson Automation Journey . The Outsourcing Unit Working Research Paper Series.

Lowendahl, J.-M., & Williams, K. (2018). 5 Best Practices for Artificial Intelligence in Higher Education. Gartner. Research note.

Luckin, R. (2017). Towards artificial intelligence-based assessment systems. Nature Human Behaviour, 1 (3), 1–3.

Luckin, R., & Holmes, W. (2017). A.I. is the new T.A. in the classroom. https://howwegettonext.com/a-i-is-the-new-t-a-in-the-classroom-dedbe5b99e9e .

Luckin, R., Holmes, W., Griffiths, M., & Pearson, L. (2016). Intelligence unleashed an argument for AI in Education. Pearson. https://www.pearson.com/content/dam/one-dot-com/one-dot-com/global/Files/about-pearson/innovation/open-ideas/Intelligence-Unleashed-v15-Web.pdf .

Lyckvi, S., Wu, Y., Huusko, M., & Roto, V. (2018). Eagons, exoskeletons and ecologies: On expressing and embodying fictions as workshop tasks. ACM International Conference Proceeding Series (pp. 754–770). Association for Computing Machinery.

Macgilchrist, F. (2019). Cruel optimism in edtech: When the digital data practices of educational technology providers inadvertently hinder educational equity. Learning, Media and Technology, 44 (1), 77–86.

Manolev, J., Sullivan, A., & Slee, R. (2019). The datafication of discipline: ClassDojo, surveillance and a performative classroom culture. Learning, Media and Technology, 44 (1), 36–51.

Martha, A. S. D., & Santoso, H. B. (2019). The design and impact of the pedagogical agent: A systematic literature review. Journal of Educators Online, 16 (1), n1.

Maughan, T. (2016). The hidden network that keeps the world running. https://datasociety.net/library/the-hidden-network-that-keeps-the-world-running/ .

McDonald, D., & Kelly, U. (2012). The value and benefits of text mining . England: HEFCE.

Min-Allah, N., & Alrashed, S. (2020). Smart campus—A sketch. Sustainable Cities and Society . https://doi.org/10.1016/j.scs.2020.102231 .

Nathan, L. P., Klasnja, P. V., & Friedman, B. (2007). Value scenarios: a technique for envisioning systemic effects of new technologies. In CHI'07 extended abstracts on human factors in computing systems (pp. 2585–2590).

Nurshatayeva, A., Page, L. C., White, C. C., & Gehlbach, H. (2020). Proactive student support using artificially intelligent conversational chatbots: The importance of targeting the technology. EdWorking paper, Annenberg University https://www.edworkingpapers.com/sites/default/files/ai20-208.pdf .

Page, L., & Gehlbach, H. (2017). How an artificially intelligent virtual assistant helps students navigate the road to college. AERA Open . https://doi.org/10.1177/2332858417749220 .

Pinkwart, N. (2016). Another 25 years of AIED? Challenges and opportunities for intelligent educational technologies of the future. International journal of artificial intelligence in education, 26 (2), 771–783.

Price, S., & Flach, P. (2017). Computational support for academic peer review: A perspective from artificial intelligence. Communications of the ACM, 60 (3), 70–79.

Rapp, A. (2020). Design fictions for learning: A method for supporting students in reflecting on technology in human–computer interaction courses. Computers & Education, 145, 103725.

Reid, P. (2014). Categories for barriers to adoption of instructional technologies. Education and Information Technologies, 19 (2), 383–407.

Renz, A., & Hilbig, R. (2020). Prerequisites for artificial intelligence in further education: Identification of drivers, barriers, and business models of educational technology companies. International Journal of Educational Technology in Higher Education . https://doi.org/10.1186/s41239-020-00193-3 .

Roll, I., & Wylie, R. (2016). Evolution and Revolution in Artificial Intelligence in Education. International Journal of Artificial Intelligence in Education, 26 (2), 582–599.

Rummel, N., Walker, E., & Aleven, V. (2016). Different futures of adaptive collaborative learning support. International Journal of Artificial Intelligence in Education, 26 (2), 784–795.

Schoenenberger, H. (2019). Preface. In H. Schoenenberger (Ed.), Lithium-ion batteries a machine-generated summary of current research (v–xxiii) . Berlin: Springer.

Selwyn, N. (2019a). Should robots replace teachers? AI and the future of education . New Jersey: Wiley.

Selwyn, N. (2019b). What’s the problem with learning analytics? Journal of Learning Analytics, 6 (3), 11–19.

Selwyn, N., Pangrazio, L., Nemorin, S., & Perrotta, C. (2020). What might the school of 2030 be like? An exercise in social science fiction. Learning, Media and Technology, 45 (1), 90–106.

Sparkes, A., Aubrey, W., Byrne, E., Clare, A., Khan, M. N., Liakata, M., et al. (2010). Towards robot scientists for autonomous scientific discovery. Automated Experimentation, 2 (1), 1.

Strobl, C., Ailhaud, E., Benetos, K., Devitt, A., Kruse, O., Proske, A., & Rapp, C. (2019). Digital support for academic writing: A review of technologies and pedagogies. Computers and Education, 131, 33–48.

Templier, M., & Paré, G. (2015). A framework for guiding and evaluating literature reviews. Communications of the Association for Information Systems, 37 (1), 6.

Thelwall, M. (2019). Artificial intelligence, automation and peer review . Bristol: JISC.

Tsai, Y., & Gasevic, D. (2017). Learning analytics in higher education—Challenges and policies: A review of eight learning analytics policies. ACM International Conference Proceeding Series (pp. 233–242). Association for Computing Machinery.

Tsai, Y. S., Poquet, O., Gašević, D., Dawson, S., & Pardo, A. (2019). Complexity leadership in learning analytics: Drivers, challenges and opportunities. British Journal of Educational Technology, 50 (6), 2839–2854.

Tsekleves, E., Darby, A., Whicher, A., & Swiatek, P. (2017). Co-designing design fictions: A new approach for debating and priming future healthcare technologies and services. Archives of Design Research, 30 (2), 5–21.

Wellnhammer, N., Dolata, M., Steigler, S., & Schwabe, G. (2020). Studying with the help of digital tutors: Design aspects of conversational agents that influence the learning process. Proceedings of the 53rd Hawaii International Conference on System Sciences , (pp. 146–155).

Williamson, B. (2019). Policy networks, performance metrics and platform markets: Charting the expanding data infrastructure of higher education. British Journal of Educational Technology, 50 (6), 2794–2809.

Williamson, B., & Eynon, R. (2020). Historical threads, missing links, and future directions in AI in education. Learning, Media and Technology. https://doi.org/10.1080/17439884.2020.1798995 .

Wilsdon, J. (2015). The metric tide: Independent review of the role of metrics in research assessment and management . Sage.

Winkler, R. & Söllner, M. (2018). Unleashing the potential of chatbots in education: A state-of-the-art analysis. In: Academy of Management Annual Meeting (AOM). Chicago, USA.

Woolf, B. P., Lane, H. C., Chaudhri, V. K., & Kolodner, J. L. (2013). AI grand challenges for education. AI Magazine, 34 (4), 66–84.

Zawacki-Richter, O., Marín, V., Bond, M., & Gouverneur, F. (2019). Systematic review of research on artificial intelligence applications in higher education—where are the educators? International Journal of Educational Technology in Higher Education . https://doi.org/10.1186/s41239-019-0171-0 .

Zeide, E. (2017). The structural consequences of big data-driven education. Big Data, 5 (2), 164–172.

Download references

Acknowledgements

Not applicable.

The project was funded by Society of Research into Higher Education—Research Scoping Award—SA1906.

Author information

Authors and affiliations.

Information School, The University of Sheffield, Level 2, Regent Court, 211 Portobello, Sheffield, S1 4DP, UK

You can also search for this author in PubMed   Google Scholar

Contributions

AC conceived and wrote the entire article. All authors read and approved the final manuscript.

Corresponding author

Correspondence to A. M. Cox .

Ethics declarations

Competing interests.

The author declares that he has no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Cox, A.M. Exploring the impact of Artificial Intelligence and robots on higher education through literature-based design fictions. Int J Educ Technol High Educ 18 , 3 (2021). https://doi.org/10.1186/s41239-020-00237-8

Download citation

Received : 04 September 2020

Accepted : 24 November 2020

Published : 18 January 2021

DOI : https://doi.org/10.1186/s41239-020-00237-8

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Artificial Intelligence
  • Social robots
  • Learning analytics
  • Design fiction

artificial intelligence in robotics research paper

  • Position paper
  • Open access
  • Published: 28 January 2022

Human-centered AI and robotics

  • Stephane Doncieux   ORCID: orcid.org/0000-0003-1541-054X 1 ,
  • Raja Chatila 1 ,
  • Sirko Straube 2 &
  • Frank Kirchner 2 , 3  

AI Perspectives volume  4 , Article number:  1 ( 2022 ) Cite this article

8974 Accesses

4 Citations

13 Altmetric

Metrics details

Robotics has a special place in AI as robots are connected to the real world and robots increasingly appear in humans everyday environment, from home to industry. Apart from cases were robots are expected to completely replace them, humans will largely benefit from real interactions with such robots. This is not only true for complex interaction scenarios like robots serving as guides, companions or members in a team, but also for more predefined functions like autonomous transport of people or goods. More and more, robots need suitable interfaces to interact with humans in a way that humans feel comfortable and that takes into account the need for a certain transparency about actions taken. The paper describes the requirements and state-of-the-art for a human-centered robotics research and development, including verbal and non-verbal interaction, understanding and learning from each other, as well as ethical questions that have to be dealt with if robots will be included in our everyday environment, influencing human life and societies.

Introduction

Already 30 years ago, people have learned in school that automation of facilities is replacing human workers, but over time people recognized in parallel that working profiles are changing and that also new type of work is created through this development, so that the effect was rather a change in industry and not a mere replacement of work. Now, we see that AI systems are getting increasingly powerful in many domains that were initially solvable only using human intelligence and cognition, thus starting this debate anew. Examples for AI beating human experts in Chess [ 1 ] or Go [ 2 ], for instance, cause significant enthusiasm and concerns at the same time about where societies are going when widely using robotics and AI. However, we see at the same time with a closer look, that although the performance of AI in such selected domains may outrun that of humans, the mechanisms and algorithms applied do not necessarily resemble human intelligence and methodology, and may even not involve any kind of cognition. In addition, AI algorithms are application specific and their transfer to other domains is not straightforward [ 3 ].

Robots using AI means an advancement from pure automation systems to intelligent agents in the environment that can not only work in isolated factory areas, but also in an unstructured or natural environment as well as in direct interaction with humans. Then, the application areas of robots are highly diverse, such that robots might influence our everyday life in the future in many ways. Already without direct contact to a human being required, robots are sought to support human ambitions, e.g. for surface exploration or installment, inspection or maintenance of infrastructure in our oceans [ 4 , 5 ] or in space [ 6 – 8 ]. Everywhere, the field of robotics is an integrator for AI technology, since complex robots need to be capable in many ways, because they have the ability to act and thus have a physical impact on their environment. Robots therefore create opportunities for collaboration and empowerment that are more diverse than what a computer-only AI system can offer. A robot can speak or show pictures through an embedded screen, but it can also make gestures or physically interact with humans [ 9 ], opening many possible interactions for a wide variety of applications. Interactions that can benefit to children with autism [ 10 , 11 ] or elderly [ 12 ] have been shown with robots that are called social[ 13 , 14 ] as they put a strong emphasis on robot social skills. Mechanical skills are also important for empowering humans, for instance through a collaborative work in teams involving both robots and humans [ 15 , 16 ]. Such robots are called cobots: collaborative robots that share the physical space of a human operator and can help to achieve a task by handling tools or parts to assemble. Thus cobots can help the operator to achieve a task with a greater precision while limiting the trauma associated to repetitive motions, excessive loads or awkward postures [ 17 ]. Similar robots can be used in other contexts, for instance in rehabilitation [ 18 , 19 ].

If humans and robots work together in such a close way, then it is required that humans have a certain trust in the technology and also an impression of understanding what the robot is doing and why. Providing robots with the ability to communicate and naturally interact with humans, would minimize the required adaptation from the human side. Making this a requirement such that humans can actually work and interact with robots in the same environment, complements the view of Human-Centered AI as a technology designed for collaboration and empowerment of humans [ 20 ].

After examining the specificity of robotics from an AI point of view in the next section, we discuss the requirements of human-centered robotics and, in the light of the current research on these topics, we examine the following questions: How can a robot interact with humans? How can it understand and learn from a human? How can the human understand the robot? And finally what ethical issues does it raise?

AI and robotics

A robot is a physical agent that is connected to the real world through its sensors and effectors [ 21 ]. It perceives the environment and uses this information to decide what action to apply at a particular moment (Fig.  1 ). These interactions of an autonomous robot with its environment are not mediated by humans: sensor data flows shape perceptions which are directed to the decision or planning system after some processing, but without any human intervention. Likewise, when an autonomous robot selects an action to apply, it sends the corresponding orders directly to its motors without going through any human mediated process. Its actions have an impact on the environment and influence future perceptions. This direct relation of the robot with the real world thus raises many challenges for AI and takes robotics away from the fields in which AI has known its major recent successes.

figure 1

A typical AI system interacts with a human user (search engine, recommendation tool, translation engine...). The human user launches the request and the result is intended to be perceived by him or her and there is in general no other connection to the real world. The system is thus not active in the real world, only the human is. A robotic system is active. It directly interacts with its environment through its perceptions and actions. Humans may be part of the environment, but otherwise are not involved in robot control loop, at least for autonomous robots

When it was first coined in 1956 at the Dartmouth College workshop, AI was defined as the problem of “making a machine behave in ways that would be called intelligent if a human were so behaving” [ 22 ]. This definition has evolved over time, with a traditional definition now that states that “AI refers to machines or agents that are capable of observing their environment, learning, and based on the knowledge and experience gained, taking intelligent action or proposing decisions” [ 23 ]. This view of AI includes many of the impressive applications that have appeared since Watson’s victory at the Jeopardy! quiz show in 2011, from recommendation tools or image recognition to machine translation software. These major successes of AI actually rely on learning algorithms and in particular on deep learning algorithms. Their results heavily depend on the data they are fed with. The fact that the design of the dataset is critical for the returned results has been clearly demonstrated by Tay, the learning chatbot launched in 2016 by Microsoft that twitted racist, sexist and anti-Semitic messages after less than 24 h of interactions with users [ 24 ]. Likewise, despite impressive results in natural language processing, as demonstrated by Watson success at the Jeopardy! show, this system has had troubles to be useful for applications in oncology, where medical records are frequently ambiguous and contain subtle indications that are clear for a doctor, but not straightforward to extract for Watson’s algorithm [ 25 ]. The "intelligence" of these algorithms thus again depends heavily on the datasets used for learning, that should be complete, unambiguous and fair. They are external to the system and need to be carefully prepared.

Typically, AI systems receive data in forms of images or texts generated or selected by humans and send their result directly to the human user. Contrary to robots, such AI systems are not directly connected to the real world and critically depend on humans at different levels. Building autonomous robots is thus part of a more restrictive definition of AI based on the whole intelligent agent design problem: “an intelligent agent is a system that acts intelligently: What it does is appropriate for its circumstances and its goal, it is flexible to changing environments and changing goals, it learns from experience, and it makes appropriate choices given perceptual limitations and finite computation” [ 26 ].

The need to face the whole agent problem makes robotics challenging for AI, but robotics also raises other challenges. A robot is in a closed-loop interaction with its environment: any error at some point may be amplified over time or create oscillations, calling for methods that ensure stability, at least asymptotically. A robot moves in a continuous environment, most of the time with either less degrees-of-freedom than required – underactuated system, like cars – or more degrees-of-freedom than required – redundant systems, like humanoid robots. Both conditions imply the development of special strategies to make the system act in an appropriate way. Likewise, the robot relies on its own sensors to make a decision, potentially leading to partial observability. Sensors and actuators may also be a source of errors because of noise or failures. These issues can be abstracted away for AI to focus on high level decision, but doing so limits the capabilities that are reachable for the robot, as building the low-level control part of the robot requires to make decisions in advance about what the robot can do and how it can achieve it: does it need position control, velocity control, force control or impedance control (controlling both force and position)? Does it need a slow but accurate control or a fast and rough one? For a multi-purpose robot like a humanoid robot, deciding it a priori limits what the robot can achieve and considering control and planning or decision in a unified framework opens the possibility to better coordinate the tasks the robot has to achieve [ 27 , 28 ].

In the meantime, robotics also creates unique opportunities for AI. A robot has a body and this embodiment produces alternative possibilities to solve the problems it is facing. Morphological computation is the ability of materials to take over some of the processes normally attributed to control and computation [ 29 ]. It may drastically simplify complex tasks. Grasping with rigid grippers requires, for instance, to determine where to put the fingers and what effort to exert on the object. The same task with granular jamming grippers or any other gripper made with soft and compliant materials is much simpler as there is basically just to activate grasping without any particular computation [ 30 ]. Embodiment may also help to deal with one of the most important problems in AI: symbol grounding [ 31 ]. Approaches like Watson rely on a huge text dataset in which the relevant relations between symbols are expected to be explicitly described. An alternative is to let the robot experience such relations through interactions with the environment and the observation of their consequences. Pushing an object and observing what has moved clearly shows object boundaries without the need to have a large database of similar objects, this is called interactive perception [ 32 ]. Many concepts are easier to understand when interaction can be taken into account: a chair can be characterised by the sitting ability, so if the system can experience what sitting means, it can guess whether an object is a chair or not without the need to have a dataset of labelled images containing similar chairs. This is the notion of affordance that associates perception, action and effect [ 33 ]: a chair is sittable, a button pushable, an object graspable, etc.

Robots are a challenge for AI, but also an opportunity to build an artificial intelligence that is embodied in the real world and thus close to the conditions that allowed the emergence of human intelligence. Robots have another specificity: humans are explicitly out of the interaction loop between the robot and its environment. The gap between robots and humans is thus larger than for other AI systems. Current robots on the market are designed for simple tasks with limited or even no interactions (e.g. vacuum cleaning). This situation can be overcome only if the goal of a human-centered robotic assistant is properly addressed, because the robot has to reach a certain level of universality to be perceived as an interaction partner. One component alone, like, e.g., speech recognition, is not enough to satisfy the needs for proper interaction.

Requirements of human-centered AI and robotics

All humans are different. If they share some common behaviours, each human has their specificities that may further change along time. A human-centered robot should deal with this to properly collaborate with humans and empower them. It should then be robust and adaptive to unknown and changing conditions. Each robot is engaged in an interaction with its environment that can be perturbed in different ways. A walking robot may slip on the ground, a flying one may experience wind gusts. Adaptation is thus a core objective of robotics since its advent and in all fields of robotics, from control to mechanics or planning. All fields of robotics aim thus at reaching the goal of a robot that can ultimately deal with the changes it is confronted with, but these changes are, in general, known to the robot designer that has anticipated the strategies to deal with them. With these strategies one tries to build methods that can, to some extent, deal with perturbations and changes.

Crafting the robot environment and simplifying its task is a straight-forward way to control the variability the robot can be subject to. The application of this principle to industry has lead to the large deployment of robots integrated in production lines built explicitly to make their work as simple as possible. New applications of robotics have known a rapid development since the years 2000: autonomous vacuum cleaners. These robots are not locked up into cages as they move around in uncontrolled environments, but despite the efforts deployed by engineers, they may still have some troubles in certain situations [ 34 ]. When a trouble happens, the user has to discover where the problem comes from and make whatever change to its own home or to the way the robot is used so that the situation will not occur again. Adaptation is thus on the human user side. Human-centered robotics aims at building robots that can collaborate with humans and empower them. They should then first not be a burden for their human collaborators and exhibit a high level of autonomy [ 35 ].

The more variable the tasks and the environments to fulfil them, the more difficult it is to anticipate all the situations that may occur. Human-centered robots are supposed to be in contact with humans and thus experience their everyday environment, that is extremely diverse. Current robots clearly have trouble to appropriately react to situations that have not been taken into account by their designer. When an unexpected situation occurs and results in a robot failure, a human-centered robot is expected to, at least, avoid to infinitely repeat this failure. It implies an ability to exploit its experience to improve its behaviour: a human-centered robot needs to possess a learning ability . Learning is the ability to exploit experience to improve the behaviour of a machine [ 36 ]. Robotics represents a challenge for all learning algorithms, including deep learning [ 37 ]. Reinforcement learning algorithms aim at discovering the behaviour of an agent from a reward that tells whether it behaves well or not. From an indication of what to do, it searches how to do it. It is thus a powerful tool to make robots more versatile and less dependant on their initial skills, but reinforcement learning is notoriously difficult in robotics [ 38 ]. One of the main reasons is that a robot is in a continuous environment, with continuous actions in a context that is, in general, partially observable and subject to noise and uncertainty. A robot that successfully learns to achieve a task owes a significant part of its success to the appropriate design of the state and action spaces that learning relies on. Different kinds of algorithms do exist to explore the possible behaviours and keep the ones that maximise the reward [ 39 ], but for all of them holds, the larger the state and action spaces, the more difficult the discovery of appropriate behaviours. In the meantime, a small state and action space limits robot abilities. A human-centered robot is expected to be versatile, it is thus important to avoid too strong limitations of their capabilities. A solution is to build robots with an open-ended learning ability [ 40 , 41 ], that is with the ability to build their own state and action spaces on-the-fly [ 42 ]. The perception of their environment can be structured by their interaction capability (Fig.  2 ). The skills they need can be built on the basis of an exploration of possible behaviours. In a process inspired from child development [ 43 ], this search process can be guided by intrinsic motivations, that can replace the task oriented reward used in reinforcement learning, for the robot to bootstrap the acquisition process of world models and motor skills [ 44 ]. This adaptation capability is important to make robots able to deal with the variability of human behaviours and environments and to put adaptation on the robot side instead of the human side, but it is not enough to make robots human-centered.

figure 2

A PR2 robot engaged in an interactive perception experiment to learn a segmentation of its visual scene [ 93 , 94 ]. The interaction of the robot with its surrounding environment provides data to learn to discriminate objects that can be moved by the robot from the background (Copyright: Sorbonne Université)

The main reason is that humans play a marginal role in this process, if any. A human-centered robot needs to have or develop human-specific skills. To do so, they first need to be able to interact with humans . It can be done in different ways that are introduced, with the challenges it raises, in “ Humans in the loop ” section. They also need to understand humans . “namerefsec:Undersanding-humans” section discusses this topic. Based on this understanding, robots may have to adapt their behaviour. Humans are used to transmit their knowledge and skills to other humans. They can teach, explain or show the knowledge they want to convey. Providing a robot with a particular knowledge is done through programming, a process that requires a strong expertise. A human-centered robot needs to provide other means of knowledge transmission. It needs to be able to learn from humans , see “ Learning from humans ” section for a discussion on this topic. Last but not least, humans need to understand what robots know, what they can and what they cannot do. It is not straightforward, in particular in the context of the current trend of AI that mostly relies on black-box machine learning algorithms [ 45 ]. “namerefsec:robots-understandable” section examines this topic in a robotics context.

Humans in the loop

The body of literature about the interaction of humans with computers and robots is huge and contains metrics [ 46 , 47 ], taxonomies [ 48 ] and other kinds of descriptions and classifications trying to establish criteria for the possible scenarios. Often, a certain aspect is in the focus, like e.g. safety [ 49 ]. Still, a structured and coherent view is not established, such that it remains difficult to directly compare approaches in a universal concept [ 50 ]. Despite this ongoing discussion, we take a more fundamental view in the following to describe what is actually possible. A human has three possibilities to interact with robots: physical interaction, verbal interaction and non-verbal interaction. Each of these interaction modalities has its own features, complexities and creates its own requirements.

Physical interaction

As a robot has a physical body, any of its movements is likely to create a physical interaction with a human. It may not be voluntary, for instance if the robot hits a human that it has not perceived, but physical interaction is also used on purpose, when gestures are the main target. Physical interaction between humans and robots has gained much attention over the past years since some significant advancements have been made in two main areas of robotics. On the one hand, new mechanical designs of robotic systems integrate compliant materials as well as compliant elements like springs. On the other hand, on the control side, it became possible to effectively control compliant structures because of increased computational power of embedded micro-controllers. Another reason is also the availability of new, smaller and yet very powerful sensor elements to measure forces applied to the mechanical structures. It has lead to the implementation of control algorithms that can react extremely rapidly to external forces applied to the mechanical structure. A good overview of the full range of applications and the several advancements that have been made in recent years can be found in [ 51 ].

These advancements were mandatory for a safe use of robotic systems in direct contact with human beings in highly integrated interaction scenarios like rehabilitation. Rehabilitation opens up enormous possibilities for the immediate restoration of mobility and thus quality of life (see, e.g. the scene with an exoskeleton and a wheel chair depicted in Fig.  3 ), while at the same time promoting the human neuronal structures through sensory influx. Furthermore, the above-mentioned methods of machine learning, especially in their deep (Deep-Learning) form, are suitable methods to observe and even predict accompanying neural processes in the human brain [ 52 ]. By observing the human electro-encephalogram, it becomes possible to predict the so-called lateral readiness potential (LRP) -that reflects the process of certain brain regions to prepare deliberate extremity movements- up to 200ms before the actual movement occurs. This potential still occurs in people even after lesions or strokes and can be predicted by AI-methods. In experimental studies, the prediction of an LRP was used to actually perform the intended human movement via an exoskeleton. By predicting the intended movement at an early stage and controlling the exoskeleton mechanics in time, the human being experiences the intended movement as being consciously performed by him or herself.

figure 3

An upper-body exoskeleton integrated into a wheel chair can support patients in doing everyday tasks as well as the overall rehabilitation process. (Copyright: DFKI GmbH)

As appealing and promising such scenarios sound, it is necessary to consider the implications of having an ’intelligent’ robot acting in direct contact with humans. There are several aspects that need to be considered and that do pose challenges in several ways [ 53 ]. To start with, we do need to consider the mechanical design and the kinematic structure in much deeper way as we would have to in other domains. First of all, there is the issue of safety of the human. In no way can it be allowed for the robot to harm the human interaction partner. Therefore safety is usually considered on three different levels:

On the level of mechanical design we must ensure that compliant mechanisms are used that absorb the energy of potential impacts with an object or a human. This can be done in several ways by integrating spring like elements in the actuators that work in series with a motor/gear setting. This usually allows the spring to absorb any impact energy but on the other hand it decreases the stiffness of the system which is a problem if it comes to very precise control with repeatable motions even under load.

on the second level the control loops can be used to basically implement an electronic spring. This is done by measuring the forces and torques on the motor and by controlling the actuators based on these values instead of position signal only. The control based on position ensures a very stiff and extremely precise and repeatable system performance while torque control is somewhat less precise. It further requires a nested control approach which combines position and torque control in order to achieve the desired position of the joint while at the same time respecting torque limits set by the extra control loop. Overall the effect is similar to that of a mechanical spring as the robot will immediately retract (or stop to advance) as soon as external forces are measured, and torque limits are violated. Even though this sounds like it is a pure control problem and AI-Technologies are not required. The problem quickly becomes NP Hard if the robot actually consists of many degrees of freedom like e.g. a humanoid robot. In these cases, deep neural network strategies are used to find approximations to the optimal control scheme [ 54 ]. Yet there are cases when even higher levels of cognitive AI approaches are required, and this is in cases where the limitations of torques to the joints contradict the stability of the robot standing or walking behavior, for instance, or when it comes to deliberately surpass the torque limits if e.g. the robot needs to drill a hole in the wall. In this case some joints need to be extremely stiff in order to provide enough resistance to penetrate the wall with the drill. These cases require higher levels of spatio-temporal planning and reasoning approaches to correctly predict context and to adjust the low-level control parameters accordingly and temporarily.

on the level of environmental observation there are several techniques that use external sensors like cameras, laser range finders and other kinds of sensors to monitor the environment of the robot and to intervene with the control scheme of the robot as soon as a person enters the work cell of the robotic system. Several AI technologies are used to predict the intentions of the person entering the robots environment and can be used to modify the robots behavior in an adequate way: instead of just a full stop if anything enters the area, it is a progressive approach with a decrease of robot movement speed if the person comes closer. In most well-defined scenarios these approaches can be implemented with static rule-based reasoning approaches, however, imagine a scenario where a robot and a human being are working together to build cars. In this situation there will always be close encounters between the robot and the human and most of them are wanted and required. There might even be cases where the human and the robot actually get into physical contact, for instance when handing over a tool. Classical reasoning and planning approaches have huge difficulties in adequately representing such situations [ 55 ]. What is needed instead is an even deeper approach to actually make the robot understand intentions of the human partner [ 56 ].

Verbal interaction

“Go forward”, “turn left”, “go to the break room”, it is very convenient to give orders to robots using natural language, in particular when robot users are not experts or physically impaired [ 57 ]. Besides sending orders to the robot (human-to-robot interaction), a robot could answer questions or ask for help (robot-to-human interaction) or engage in a conversation (two-way communication) [ 58 ]. Verbal interaction has thus many different applications in robotics and contrary to physical interactions, it does not create strong safety requirements. A human cannot be physically harmed through verbal interaction, except if it makes the robot act in a way that is dangerous for the human, but in this case the danger still comes from the physical interaction, not from the verbal interaction that has initiated it.

Although a lot of progress has been made on natural language processing, robotics creates specific challenges. A robot has a body. Robots are thus expected to understand spatial (and eventually temporal) relations and to connect the symbols they are manipulating to their sensorimotor flow [ 59 ]. This is a situated interaction. Giving a robot an order as “go through the door” is expected to make the robot move to the particular door that is in the vicinity of the robot. There is a need to connect words to the robots own sensorimotor flow: each robot has specific sensors and effectors and it needs to be taken into account. If the robot needs to understand a limited number of known words, it can be hand-crafted [ 57 ]. It can also rely on deep learning methods [ 60 ], but language is not static, it dynamically evolves through social interaction, as illustrated by the appearance of new words: in 2019, 2700 words have been added to the Oxford English Dictionary Footnote 1 . Furthermore the same language may be used in a different way in distant places of the world. French as talked in Quebec, for instance, has some specificities that distinguishes it from the French talked in France. A human-centered robot needs to be able to adapt the language it uses to its interlocutor. It raises many different challenges [ 61 ], including symbol grounding, that is one of the main long-standing AI challenges [ 31 ]. Using words requires to know their meaning. This meaning can be guessed from a semantic network, but as the interaction is situated, at least some of the words will need to be associated with raw data from the sensorimotor flow, for instance the door in the "go through the door" order needs to be identified and found in the robot environment. This is the grounding problem.

The seminal work of Steels on language games [ 62 , 63 ] shows how robots could actually engage in a process that converges to a shared vocabulary of grounded words. When the set of symbols is closed and known beforehand, symbol grounding is not a challenge anymore, but it still is if the robot has to build it autonomously [ 64 ]. To differentiate it from the grounding of a fixed set of symbols, it has been named symbol emergence [ 65 , 66 ]. A symbol has different definitions. In symbolic AI, symbols are basically a pointer to a name, a value and possibly other properties, like a function definition, for instance. A symbol carries a semantic which is different for the human and for the robot, but enables them to partially share the same grounds. In the context of language study, the definition of a symbol is different. Semiotics, the study of signs that mediate communication, defines it as a triadic relationship between an object, a sign and an interpretant. This is not a static relationship, but a process. The interpretant is the effect of a sign on its receiver, it is thus a process relating the sign with the object. The dynamic of this process can be seen in our ability to dynamically give names to objects (may they be known or not). Although many progresses have been made recently on these topic [ 58 , 66 ], building a robot with this capability remains a challenge.

Non-verbal interaction

The embodiment of robots creates opportunities to communicate with humans by other means than language. It is an important issue as multiple nonverbal communication modalities do exist between humans and they are estimated to represent a significant part of communicated meaning between humans. Non verbal cues revealed for instance to help children to learn new words from robots [ 67 ]. Adding nonverbal interaction abilities to robots thus opens the perspective of building robots that can better engage with humans [ 68 ], i.e. social robots [ 13 ]. Nonverbal interaction may support verbal communication, as lip-syncing or other intertwined motor actions as head nods [ 69 ], and may have a significant impact on humans [ 70 ], as observed through their behaviour response, task performance, emotion recognition and response as well as cognitive framing, that is the perspective humans adopt, in particular on the robot they interact with.

Different kinds of nonverbal communications do exist. The ones that incorporate robots movements are kinesics, proxemics, haptics and chronemics. Kinesics relies on body movements, positioning, facial expressions and gestures and most robotics related research on the topic focus on arm gestures, body and head movements, eye gaze and facial expressions. Proxemics is about the perception and use of space in the context of communication, including the notions of social distance or personal space. Haptics is about the sense of touch and chronemics with time-experiencing. Sanuderson and Nejat have reviewed robotics research work on these different topics [ 70 ].

Besides explicit non-verbal communication means, the appearance of a robot has revealed to impact the way humans perceive a robot and engage in a human-robot interaction [ 71 , 72 ]. It has been shown for instance that a humanlike-shape influences non-verbal behaviors towards a robot like delay of response, distance [ 73 ] or embarrassment [ 74 ]. Anthropomorphic robots significantly draw the attention of the public and thus creates high expectations in different service robotics applications, but the way they are perceived and their acceptance is a complex function involving multiple factors, including user culture, context and quality of the interaction or even degree of human likeness [ 75 ]. The impact of this last point, in particular, is not trivial. Mori proposed the uncanny valley theory to model this relation [ 76 , 77 ]. In this model, the emotional response improves when robot appearance gets more humanlike, but a sudden drop appears beyond a certain level: robots that look like humans but still with noticeable differences, can thus create a feeling of eeriness resulting in discomfort and rejection. This effect disappears when the robot appearance gets close enough to humans. The empirical validation of this model is difficult. Some experiments seem to validate it [ 78 ], while others lead to contradicting results [ 79 ]. For more details, see the reviews by Fink [ 80 ] or Złotowski et al. [ 81 ].

Understanding humans and human intentions

There are situations in which robots operate in isolation, such as in manufacturing lines for welding or painting, or in deep sea or planetary exploration. Such situations are dangerous for humans and the robot task is provided to it through pre-programming (e.g. welding) or teleprogramming (e.g., a location to reach on a remote planet). However, in many robotic application areas, be it in manufacturing or in service, robots and humans are starting to more and more interact with each other in different ways. The key characteristics making these interactions so challenging are the following:

Sharing space, for navigation or for reaching to objects for manipulation

Deciding for joint actions that are going to be executed by both the robot and the human

Coordination of actions over time and space

Achieving joint actions physically

These characteristics lead to many different scientific questions and issues. For example sharing space requires geometric reasoning, motion planning and control capabilities [ 82 ]. Deciding for joint actions [ 83 ] requires a mutual representation of human capabilities by the robot and vice-versa, e.g., is the human (resp. robot) capable of holding a given object? It also requires a Theory of Mind on the part of the robot and of the human: what are the robot’s representations and what are the human’s representations of a given situation? What is the human (resp. robot) expected to do in this situation?

The third mentioned characteristic, coordination of action, requires in addition to what has been mentioned above signal exchanges between human and robot to ensure that each is indeed engaged and committed to the task being executed. For example gaze detection through eye trackers enables to formulate hypotheses about human visual focus. The robot in turn has to provide equivalent information to the human, since the human usually cannot determine the robot’s visual focus from only observing its sensors. In this case, it becomes therefore necessary that the robot signals explicitly what is its focus or what are its intentions (see “namerefsec:robots-understandable” section).

Now, when it comes to physical interaction, robot and human are not only in close proximity, but they also exchange physical signals such as force. Consider for example a robot and a human moving a table together. Force feedback enables to distribute the load correctly between them, and enables to coordinate the actions. In the case of physical interaction, another important aspect is to ensure human safety, which puts constraints on robot design and control. Compliance and haptic feedback become key (see “ Physical interaction ” section).

In all these interaction scenarios, the robot must already have all the autonomous capacities for decision-making and task supervision. Indeed the robot must be able to plan its own actions to achieve a common goal with the human, taking into account the human model and intentions.

Take the simple example of a human handing an object to the robot. The common goal is that, in the final state, the robot is holding the object, whereas in the initial state the human is holding it. The goal must be shared right from the beginning of the interaction, for example through an explicit order given by the human. Alternatively the robot might be able to determine the common goal by observing the human’s behavior, which requires the robot to have the ability to deduce human intentions from their actions, posture, gestures (e.g., deictic gestures) or facial expressions. This cannot be but a probabilistic reasoning capacity, given the uncertainties of observation and of prior hypotheses. Then the robot must plan its actions according to its human model, and this cannot be but a probabilistic planning process, e.g., using markovian processes, because of the inherent uncertainties of the observations – and therefore the robot’s beliefs – and of action execution. Robot task supervision must also ensure that the human is acting in accordance to the plan, by observing actions and posture.

Another essential dimension for complex interactions is communication using dialogue. The robot can start such a dialogue for example when it detects that some information is needed to complete its model, or to reduce its uncertainties. Formulating the correct questions requires the robot to have a self assessment capacity of its own belief state.

Learning from humans

Using the human as a teacher to train robotic systems has been around for some time [ 84 ]. Many cases and scenarios, like the hybrid team scenario (see example depicted in Fig.  4 ) where humans and robots are building cars together acting as a team, are too complex to be completely modelled. Consequently, it is difficult or impossible to devise exact procedures and rule-based action execution schemes in advance. One example here could be to formulate the task to have a robot pack a pair of shoes in a shoebox [ 85 ]. Even a task that sounds as simple as this proved to be impossible to be completely modeled. Therefore, a learning by demonstration method has been applied to teach the robot the task by a human demonstrator. In such cases learning, or said differently a step-wise approximation and improvement of the optimal control strategy, is the most straightforward option available. In situations where enough a priori data is available, this can be done offline and the robotic system can be trained to achieve a certain task. However, in many cases, data is not available and therefore online strategies are needed to acquire the desired skill. The learning by demonstration approach can already be implemented quite successfully by e.g. recording data from human demonstrators that are instrumented with reflectors for image capturing devices and then feeding skeleton representations of the human movements as sample trajectories into the learning system which in turn uses e.g. Reinforcement Learning techniques to generate appropriate trajectories. This approach usually leads to quite usable policies on the side of the robotic system, yet in many cases when applied in a realistic task scenario it turns out that “quite good” is not good enough and online optimization has to be performed. Here it turns out to be advantageous to include approaches like discussed in the previous section on understanding human intentions or state of mind.

figure 4

Examples for humans, robots and other AI agents working in hybrid teams. Due to the possible applications and scenarios robots can be configured here as stationary or mobile systems up to even complex systems with humanoid appearance. (Copyright: Uwe Völkner/Fotoagentur FOX)

Using this general idea, it was possible to online improve the performance of an already trained robot by applying a signal generated by the human brain on a subconscious level providing it as a reinforcement signal back to the robot [ 56 ]. The signal is the so-called Error potential. This is an event related potential (ERP) generated by brain areas when a mismatch between expected input and actual input occurs. In many real-world situations such a signal is produced e.g., when a human observes another human to perform a movement in an obviously wrong way in the correct context or the correct movement is performed but in the wrong context. The beauty about this signal is that it is generated on subconscious levels, so before the human actively is aware of it. This is important for two reasons:

When the human becomes aware of the signal that means that it was already analyzed and modulated by other brain regions. This means that a cognitive classification of the subconscious signal has taken place which will disassociate the original signal.

The second reason why it is important that the signal occurs before evaluation by other brain areas is that it does not have to be externalized e.g. by verbalization. Imagine a hybrid team scenario where the human in the team has to explicitly verbalize each error that he or she observes in the performance of the robot. First, the above mentioned disassociation process will lead to a blurriness or haziness of the verbalized feedback to the robot but more importantly as a second result the human would probably not verbalize each and every error due to fatigue and information valuable for interaction is lost.

To summarize, the learning could either happen using external information available, like getting commands or watching humans demonstrating a task, or implicit signals during interaction like evaluation of facial expressions or by using brain signals like certain ERPs to provide feedback. The latter is of course using information from the human interaction partner that is not directly controlled by the human and also not per se voluntarily given. This raises ethical and legal questions that have to be addressed when using this as a standard procedure for interaction (see also “ Ethical questions ” section), underlining the fact that Human-centered AI and robotics ultimately include the involvement of disciplines from social sciences. At the same time, we have outlined that making use of such information can be highly beneficial for fluent and intuitive interaction and learning.

Making robots understandable for humans

In “ Understanding humans and human intentions ” section, it was discussed how the robot can better understand humans and how this can be achieved to some point. It is rather straightforward to equip the robot with the necessary sensors and software to detect humans and to interpret gestures, postures and movements, as well as to detect their gaze and infer some intentions. Even if it is not the whole complexity of human behavior, these capacities can capture enough of human intentions and actions to enable task sharing and cooperation. Equally important however in an interaction is the opposite case, that is how can the human better understand the robot’s intentions and actions.

In most scenarios, we can safely assume that the human does have some a priori knowledge about the framework of action that the robot is equipped with. That is to say that the human can infer some of the physical capabilities and limitations of the system from its appearance (e.g., a legged robot vs. a wheeled robot) but not of its power e.g., can the robot jump or climb a given slope? Even if the human could have some general ideas of the spectrum of robot sensing possibilities, it is not clear whether the robot perceptive capabilities and their limits can be completely and precisely understood. This is e.g., a result of the fact that it is difficult for humans to understand the capabilities and limitations of sensors that they don’t have e.g., infrared sensors or laser-rangefinders providing point-clouds. It is fundamentally impossible for a human being to understand the information processing going on in robot systems with multi-level hierarchies, from low-level control of single joints to higher levels of control involving deep neural networks and finally to top level planning and reasoning processes that all interact with each other and influence each other’s output. This is even extremely difficult for trained computer science experts and robot designers. It represents a complete field of research that deals with the problems of how to manage the algorithmic complexity that occurs in structurally complex robotic systems that act in dynamic environments. Actually the design of robot control or cognitive architectures is an open research area and still a big challenge for AI-Based-Robotics [ 86 ].

Attempts to approach the problem of understanding robots by humans have been made in several directions. One attempt is the robot verbally explaining its actions [ 16 ]. This is to say that the robot actually tells (or writes on a screen) the human what it is doing and why a specific action is carried out. At the same time, it is possible for the human to ask the robot for an explanation of its action(s) and the robot gives the explanation verbally, in computer animated graphics or in iconized form on a screen installed on the robot. The hope behind such approaches is that the need for explanations deliberately uttered by the robot as well as the quest for answers from the side of the human will decrease over time as learning and understanding occurs on the side of the human. Of course this is difficult to assess as long term studies so far have not been carried out or could not be carried out because of the unavailability of appropriate robots. But one assumption that we can safely make is that the explicit answering or required listening to explanations by the human will not be highly appreciated when it comes to practical situations, and the repetitive explanatory utterances of the robot will quickly bother humans.

Therefore it is necessary to think about more subtle strategies to communicate robot internal states and intentions to the human counterpart e.g., its current goals, its knowledge about the world, its intended motions, its acknowledgement of a command, or its requests for an action by the human. Examples of such approaches are to use mimics and gestures. Robots equipped with faces - either just as computer screens where the face is generated or by actually actuated motors forming faces under artificial skin covered robotic heads (if such devices are deemed acceptable - see “ Ethical questions ” section - in order to produce facial expressions which gives some information about the internal state of the robot. These approaches could successfully be applied in e.g. home and elderly care scenarios. However, the internal states being externalized here are rather simple ones that are meant to stimulate actions on the human side like in the pet robot Paro.

However, we can assume that it should be possible in well known scenarios, such as in manufacturing settings, to define fixed signals for interaction made from a set of gestures, including deictic gestures, facial expressions or simply graphical patterns that can be used to externalize internal robot states to human partners. Such a model of communication can be described as the first steps towards achieving a more general common alphabet [ 87 ] as the basis for a language between humans and robots. It is likely that such a common language will be developed or more likely emerge, from more and more robot human interaction scenarios in real world applications as a result of best practice experiences.

It is certain that the corresponding challenges on the robotic side go beyond what was described earlier like the soft and compliant joints that are used for safety reasons. It will be necessary to develop soft and intelligent skin as a cover of the mechanical robot structures that can be used not just as an interface for expressions -in the case of facial skin- but also as a great and powerful sensor on other parts of the robot body for improving and extending the range of physical interactions with humans [ 88 ]. Just a simple example that we all know is that in a task performed by two humans it is often observed that one of two partners slightly pushes or touches the other on the shoulder or the arm in order to communicate e.g. that a stable grip has been achieved or to say: ’okay I got it, you can let it go...’. This kind of interaction could also be verbally transmitted to the interaction partner, but humans have the ability to visualize the internal states of their human counterparts, because we share the same kinematic structure and disposition. It is thus in this case not necessary to speak. Just a simple touch suffices to transmit a complex state of affairs. Yet, the interaction of humans with robots that are equipped with such kind of advanced skin technologies can be expected to be a starting point for a common language. The physical interaction will therefore enable new ways of non-physical interaction and very likely the increased possibilities for nonphysical interaction will in turn stimulate other physical interaction possibilities. In summary, it will be an interesting voyage to undertake if in fact intelligent and structurally competent robotic systems will become available as human partners in various everyday life situation. Like in all other technologies, the human designer will shape the technology, but at the same time the technology will shape the human, both as a user of the technology but also as the designer of this technology.

Ethical questions

There are several issues which raise questions of ethics of robotic technologies considered as interaction partners for humans [ 89 ]. To list but a few:

Transformation of work in situations where humans and robots interact together. Depending on how it is designed, the interaction might impose constraints on the human instead of making the robot adapt to the human and carry the burden of the interaction. For example the human is given more dexterous tasks such as grasping, which end up being repetitive and wearing when robot speed doing simpler tasks imposes the pace.

Mass surveillance and privacy issues when personal or domestic robots collect information about their users and households, or self-driving cars which are permanently collecting data on their users and their environments.

Affective bonds and attachment to personal robots, especially those made to detect and express emotions.

Human transformation and augmentation through exoskeletons or prosthetic devices.

Human identity, status of robots in society (e.g;, legal personality), especially for android robots mimicking humans in appearance, language and behavior.

Sexbots designed to be sexual devices that can be made to degrade the image of women, or to look like children

Autonomous weapon systems - which are not so to speak "interacting" with humans, but which are endowed with recognition capacities to target humans.

If we speak about ethics in the context of robots and AI technologies, what we fundamentally mean is that we want to make sure that this technology is designed and used for the good of mankind and not for the bad. The first problem is obviously how do we define good and bad? There are the obvious answers implying that a robot should not harm a person. No question, but what about a surgical robot that needs to inject a vaccine into the arm of a person with a syringe, thus physically injuring her at the moment, but for her benefit? How can we make the distinction between these cases in a formal way? This is the core of the problem.

If we speak about ethics and how to design ethical deliberation into technical systems so that the robot decision-making or control system behaves for "the good", we are fundamentally required to come up with a formalization of ethics. In some form or the other we will be required to put down in expressions of logic and numerical values what is ethical and what is not. In our understanding this will not be possible in a general form, because human ethical judgment and moral thinking is not amenable to algorithmic processing and computations. For example, how would we define algorithmically a principle of respect for human dignity? The concept of dignity itself is complex and has several moral and legal interpretations.

Ethical deliberation cannot be reduced to computing and comparing utilities, as we often see in publications on ethical dilemmas for self driving cars for example. The car could only make computations based on data acquired by its sensors, but the ethical choices would have actually been already made by the designers. Even deciding that the passengers can customise ethical choices, or to let the system learn [ 90 ], for example in simulations, to determine values to be optimized is a negation of what ethical deliberation is. Indeed this would entail an a priori decision on a situation to come, or to decide that ethical deliberation is based on statistics of past actions.

We will of course be able to formalize ethical guidelines (to the designers) for robot design and control if concrete well specified domains are regarded. We could e.g. solve the syringe problem easily if we built a surgical robot that is used and operated only in hospitals and that has a clearly defined set of tasks to fulfill in e.g. the vaccination department of the hospital. And then this becomes a matter of safety design, similar to any other technical device. But what about a household service robot that is designed to clean the floors and wash the dishes... Wouldn’t we want this robot also to be able to perform first aid services e.g. if the person in the household suffers diabetics and need insulin injections from time to time... Cases can be constructed where we come to the problem that a complete and full formalization of ethics is impossible.

Carrying a responsible approach or a value-based design procedure [ 91 ] can help to conceive robots and AI systems for which ethical issues are actually solved by the human designers and manufacturers beforehand, during specification, development and manufacturing. The robot itself will not be endowed with moral judgment. But we will have to make sure that the humans will abstain from misusing the technology.

But more profound questions arise when it comes to the last three issues listed above. For example, building android human-like robots can be considered a scientific research topic, or a practical solution to facilitate human-robot interaction. However, the confusion this identification of humans with machines provokes requires a reflection on the nature of human identity as compared to machines, that needs to address all aspects and consequences of such technical achievements.

A reflection grounded on philosophical, societal and legal considerations is necessary, beyond sole scholarly studies, to address the impact of these technologies on society. Indeed, there are numerous initiatives and expert groups who have actually already issued ethics recommendations on the development and use of AI and Robotics systems, including the European High-Level Expert Group on AI (HLEG-AI), the IEEE Global Initiative on Ethics of Autonomous and Intelligent Systems, the UNESCO COMEST, and the OECD (see [ 92 ] for a comprehensive overview). As an example of commonly accepted ethics recommendations are the seven “requirements for trustworthy AI Footnote 2 ” issued by the HLEG-AI in 2019:

“Human agency and oversight”: AI systems should be subject to human oversight and they should support humans in their autonomy and decision-making

“Technical Robustness and Safety” should be provided. Systems should be reliable and stable also in situations with uncertainty, they should be resilient against manipulations from outside

“Privacy and Data Governance” should be guaranteed during the lifecycle with data access controlled and managed, and data quality provided.

“Transparency”: Data and processes should be well documented to trace the cause of errors. Systems should become explainable to the user on the level appropriate to understand certain decisions the system is making.

“Diversity, Non-Discrimination and Fairness” should be ensured by controlling for biases that could lead to discriminatory results. Access to AI should be granted to all people.

“Societal and Environmental Well-Being”: The use of AI should be for the benefit of society and the natural environment. Violation of democratic processes should be prevented.

“Accountability” should be provided such that AI systems can be assessed and audited. Negative impacts should be minimised or erased.

However there are still open issues, mostly related to how to translate principles into practice, or topics subject to hard debates such as robot legal personality, advocated by some to address liability issues. Furthermore, when considering specific use-cases, tensions between several requirements could arise, that will have to be specifically addressed.

Most AI systems are tools for which humans play a critical role, either at the input of the system, to analyse their behavior, or at the output, to give them an information they need. Robotics is different as it develops physical systems that can perceive and act in the real world without the mediation of any humans, at least for autonomous robots. Building human-centered robots requires to put humans back into the loop and to provide the system with the ability to interact with humans, to understand them and learn from them while ensuring that humans will also understand what they can and cannot do. It also raises many ethical questions that have been listed and discussed. Human centered AI and Robotics thus create many different challenges and require the integration of a wide spectrum of technologies. It also highlights that robots assisting humans are not only a technological challenge in many aspects, but rather a socio-technological transformation in our societies. In particular, the use of this technology and how it is accessible, are important topics involving actors in dealing with social processes, public awareness and political and legal decisions.

Availability of data and materials

Not applicable.

https://public.oed.com/updates/

https://ec.europa.eu/digital-single-market/en/news/ethics-guidelines-trustworthy-ai

Campbell M, Hoane Jr AJ, Hsu F. -h. (2002) Deep blue. Artificial intelligence 134(1-2):57–83.

Article   MATH   Google Scholar  

Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, et al. (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484–489.

Article   Google Scholar  

Torrey L, Shavlik J (2010) Transfer learning In: Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques, 242–264.. IGI global, Hershey.

Chapter   Google Scholar  

Yuh J, West M (2001) Underwater robotics. Adv Robot 15(5):609–639. https://doi.org/10.1163/156855301317033595 .

Kirchner F, Straube S, Kühn D, Hoyer N (2020) AI Technology for Underwater Robots. Springer, Cham.

Book   Google Scholar  

Yoshida K (2009) Achievements in space robotics. IEEE Robot Autom Mag 16(4):20–28. https://doi.org/10.1109/MRA.2009.934818 .

Yoshida K, Wilcox B (2008) Space robots In: Springer handbook of robotics, 1031–1063.. Springer, Berlin.

Yangsheng X, Kanade T (1993) Space Robotics: Dynamics and Control. Springer.

Goodrich MA, Schultz AC (2008) Human-robot Interaction: a Survey. Now Publishers Inc.

Ricks DJ, Colton MB (2010) Trends and considerations in robot-assisted autism therapy In: 2010 IEEE International Conference on Robotics and Automation, 4354–4359, Anchorage.

Boucenna S, Narzisi A, Tilmont E, Muratori F, Pioggia G, Cohen D, Chetouani M (2014) Interactive technologies for autistic children: A review. Cogn Comput 6(4):722–740.

Shishehgar M, Kerr D, Blake J (2018) A systematic review of research into how robotic technology can help older people. Smart Health 7:1–18.

Breazeal C, Dautenhahn K, Kanda T (2016) Social robotics In: Springer Handbook of Robotics, 1935–1972.. Springer, Berlin.

Sheridan TB (2020) A review of recent research in social robotics. Curr Opin Psychol 36:7–12.

Schwartz T, Feld M, Bürckert C, Dimitrov S, Folz J, Hutter D, Hevesi P, Kiefer B, Krieger H, Lüth C, Mronga D, Pirkl G, Röfer T, Spieldenner T, Wirkus M, Zinnikus I, Straube S (2016) Hybrid teams of humans, robots, and virtual agents in a production setting In: 2016 12th International Conference on Intelligent Environments (IE), 234–237.. IOS Press, Amsterdam.

Schwartz T, Zinnikus I, Krieger H-U, Bürckert C, Folz J, Kiefer B, Hevesi P, Lüth C, Pirkl G, Spieldenner T, Schmitz N, Wirkus M, Straube S (2016) Hybrid teams: Flexible collaboration between humans, robots and virtual agents. In: Klusch M, Unland R, Shehory O, Pokahr A, Ahrndt S (eds)Multiagent System Technologies, 131–146.. Springer, Cham.

Peshkin M, Colgate JE (1999) Cobots. Ind Robot Int J 26(5):335–341.

Maciejasz P, Eschweiler J, Gerlach-Hahn K, Jansen-Troy A, Leonhardt S (2014) A survey on robotic devices for upper limb rehabilitation. J Neuroeng Rehabil 11(1):3.

Kumar S, Wöhrle H, Trampler M, Simnofske M, Peters H, Mallwitz M, Kirchner EA, Kirchner F (2019) Modular design and decentralized control of the recupera exoskeleton for stroke rehabilitation. Appl Sci 9(4). https://doi.org/10.3390/app9040626 .

Nowak A, Lukowicz P, Horodecki P (2018) Assessing artificial intelligence for humanity: Will ai be the our biggest ever advance? or the biggest threat [opinion]. IEEE Technol Soc Mag 37(4):26–34.

Siciliano B, Khatib O (2016) Springer Handbook of Robotics. Springer, Berlin.

Book   MATH   Google Scholar  

McCarthy J, Minsky ML, Rochester N, Shannon CE (2006) A proposal for the dartmouth summer research project on artificial intelligence, august 31, 1955. AI Mag 27(4):12–12.

Google Scholar  

Annoni A, Benczur P, Bertoldi P, Delipetrev B, De Prato G, Feijoo C, Macias EF, Gutierrez EG, Portela MI, Junklewitz H, et al. (2018) Artificial intelligence: A european perspective. Technical report, Joint Research Centre (Seville site).

Wolf MJ, Miller KW, Grodzinsky FS (2017) Why we should have seen that coming: comments on microsoft’s tay “experiment,” and wider implications. ORBIT J 1(2):1–12.

Strickland E (2019) Ibm watson, heal thyself: How ibm overpromised and underdelivered on ai health care. IEEE Spectr 56(4):24–31.

Article   MathSciNet   Google Scholar  

Poole D, Mackworth A, Goebel R (1998) Computational intelligence.

Salini J, Padois V, Bidaud P (2011) Synthesis of complex humanoid whole-body behavior: A focus on sequencing and tasks transitions In: 2011 IEEE International Conference on Robotics and Automation, 1283–1290, Changaï.

Hayet J-B, Esteves C, Arechavaleta G, Stasse O, Yoshida E (2012) Humanoid locomotion planning for visually guided tasks. Int J Humanoid Robotics 9(02):1250009.

Pfeifer R, Gómez G (2009) Morphological computation–connecting brain, body, and environment In: Creating Brain-like Intelligence, 66–83.. Springer, Berlin.

Shintake J, Cacucciolo V, Floreano D, Shea H (2018) Soft robotic grippers. Adv Mater 30(29):1707035.

Harnad S (1990) The symbol grounding problem. Physica D Nonlinear Phenom 42(1-3):335–346.

Bohg J, Hausman K, Sankaran B, Brock O, Kragic D, Schaal S, Sukhatme GS (2017) Interactive perception: Leveraging action in perception and perception in action. IEEE Trans Robot 33(6):1273–1291.

Jamone L, Ugur E, Cangelosi A, Fadiga L, Bernardino A, Piater J, Santos-Victor J (2016) Affordances in psychology, neuroscience, and robotics: A survey. IEEE Trans Cogn Dev Syst 10(1):4–25.

Vaussard F, Fink J, Bauwens V, Rétornaz P, Hamel D, Dillenbourg P, Mondada F (2014) Lessons learned from robotic vacuum cleaners entering the home ecosystem. Robot Auton Syst 62(3):376–391.

Kaufman K, Ziakas E, Catanzariti M, Stoppa G, Burkhard R, Schulze H, Tanner A (2020) Social robots: Development and evaluation of a human-centered application scenario In: Human Interaction and Emerging Technologies: Proceedings of the 1st International Conference on Human Interaction and Emerging Technologies (IHIET 2019), August 22-24, 2019, Nice, France, vol. 1018, 3–9.. Springer Nature, Berlin.

Jordan MI, Mitchell TM (2015) Machine learning: Trends, perspectives, and prospects. Science 349(6245):255–260.

Article   MathSciNet   MATH   Google Scholar  

Sünderhauf N, Brock O, Scheirer W, Hadsell R, Fox D, Leitner J, Upcroft B, Abbeel P, Burgard W, Milford M, et al. (2018) The limits and potentials of deep learning for robotics. Int J Robot Res 37(4-5):405–420.

Kober J, Bagnell JA, Peters J (2013) Reinforcement learning in robotics: A survey. Int J Robot Res 32(11):1238–1274.

Sigaud O, Stulp F (2019) Policy search in continuous action domains: an overview. Neural Netw 113:28–40.

Doncieux S, Filliat D, Díaz-Rodríguez N, Hospedales T, Duro R, Coninx A, Roijers DM, Girard B, Perrin N, Sigaud O (2018) Open-ended learning: a conceptual framework based on representational redescription. Front Neurorobotics 12:59.

Doncieux S, Bredeche N, Goff LL, Girard B, Coninx A, Sigaud O, Khamassi M, Díaz-Rodríguez N, Filliat D, Hospedales T, et al. (2020) Dream architecture: a developmental approach to open-ended learning in robotics. arXiv preprint arXiv:2005.06223.

Lesort T, Díaz-Rodríguez N, Goudou J-F, Filliat D (2018) State representation learning for control: An overview. Neural Netw 108:379–392.

Cangelosi A, Schlesinger M (2015) Developmental Robotics: From Babies to Robots. MIT press.

Santucci VG, Oudeyer P-Y, Barto A, Baldassarre G (2020) Intrinsically motivated open-ended learning in autonomous robots. Front Neurorobotics 13:115.

Hagras H (2018) Toward human-understandable, explainable ai. Computer 51(9):28–36.

Steinfeld A, Fong T, Kaber D, Lewis M, Scholtz J, Schultz A, Goodrich M (2006) Common metrics for human-robot interaction In: Proceedings of the 1st ACM SIGCHI/SIGART Conference on Human-Robot Interaction, HRI ’06, 33–40.. Association for Computing Machinery, New York. https://doi.org/10.1145/1121241.1121249 .

Murphy R, Schreckenghost D (2013) Survey of metrics for human-robot interaction In: Proceedings of the 8th ACM/IEEE International Conference on Human-Robot Interaction, HRI ’13, 197–198.. IEEE Press.

Yanco HA, Drury J (2004) Classifying human-robot interaction: an updated taxonomy In: 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583), 2841–28463. https://doi.org/10.1109/ICSMC.2004.1400763 .

Pervez A, Ryu J (2008) Safe physical human robot interaction–past, present and future. J Mech Sci Technol 22:469–483.

Onnasch L, Roesler E (2021) A taxonomy to structure and analyze human–robot interaction. Int J Soc Robot 13(4):833–849.

Haddadin S, Croft E (2016) Physical Human–Robot Interaction. In: Siciliano B Khatib O (eds)Springer Handbook of Robotics, 1835–1874.. Springer, Cham. https://doi.org/10.1007/978-3-319-32552-169 .

Gutzeit L, Otto M, Kirchner EA (2016) Simple and robust automatic detection and recognition of human movement patterns in tasks of different complexity In: Physiological Computing Systems, 39–57.. Springer, Berlin.

Kirchner EA, Fairclough SH, Kirchner F (2019) Embedded multimodal interfaces in robotics: applications, future trends, and societal implications In: The Handbook of Multimodal-Multisensor Interfaces: Language Processing, Software, Commercialization, and Emerging Directions-Volume 3, 523–576.

Haarnoja T, Ha S, Zhou A, Tan J, Tucker G, Levine S (2018) Learning to walk via deep reinforcement learning. arXiv preprint arXiv:1812.11103:1–10.

Tsarouchi P, Makris S, Chryssolouris G (2016) Human–robot interaction review and challenges on task planning and programming. Int J Comput Integr Manuf 29(8):916–931. https://doi.org/10.1080/0951192X.2015.1130251 .

Kim S, Kirchner E, Stefes A, Kirchner F (2017) Intrinsic interactive reinforcement learning–using error-related potentials for real world human-robot interaction. Sci Rep 7.

Williams T, Scheutz M (2017) The state-of-the-art in autonomous wheelchairs controlled through natural language: A survey. Robot Auton Syst 96:171–183.

Tellex S, Gopalan N, Kress-Gazit H, Matuszek C (2020) Robots that use language. Annu Rev Control Robot Auton Syst 3:25–55.

Landsiedel C, Rieser V, Walter M, Wollherr D (2017) A review of spatial reasoning and interaction for real-world robotics. Adv Robot 31(5):222–242.

Mei H, Bansal M, Walter MR (2016) Listen, attend, and walk: neural mapping of navigational instructions to action sequences In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2772–2778.

Taniguchi T, Mochihashi D, Nagai T, Uchida S, Inoue N, Kobayashi I, Nakamura T, Hagiwara Y, Iwahashi N, Inamura T (2019) Survey on frontiers of language and robotics. Adv Robot 33(15-16):700–730.

Steels L (2001) Language games for autonomous robots. IEEE Intell Syst 16(5):16–22.

Steels L (2015) The Talking Heads Experiment: Origins of Words and Meanings, vol. 1. Language Science Press.

Steels L (2008) The symbol grounding problem has been solved. so what’s next. Symbols Embodiment Debates Meaning Cogn:223–244.

Taniguchi T, Nagai T, Nakamura T, Iwahashi N, Ogata T, Asoh H (2016) Symbol emergence in robotics: a survey. Adv Robot 30(11-12):706–728.

Taniguchi T, Ugur E, Hoffmann M, Jamone L, Nagai T, Rosman B, Matsuka T, Iwahashi N, Oztop E, Piater J, et al. (2018) Symbol emergence in cognitive developmental systems: a survey. IEEE Trans Cogn Dev Syst 11(4):494–516.

Westlund JMK, Dickens L, Jeong S, Harris PL, DeSteno D, Breazeal CL (2017) Children use non-verbal cues to learn new words from robots as well as people. Int J Child-Computer Interact 13:1–9.

Anzalone SM, Boucenna S, Ivaldi S, Chetouani M (2015) Evaluating the engagement with social robots. Int J Soc Robot 7(4):465–478.

Mavridis N (2015) A review of verbal and non-verbal human–robot interactive communication. Robot Auton Syst 63:22–35.

Saunderson S, Nejat G (2019) How robots influence humans: A survey of nonverbal communication in social human–robot interaction. Int J Soci Robot 11(4):575–608.

Mathur MB, Reichling DB (2009) An uncanny game of trust: social trustworthiness of robots inferred from subtle anthropomorphic facial cues In: 2009 4th ACM/IEEE International Conference on Human-Robot Interaction (HRI), 313–314.. IEEE.

Natarajan M, Gombolay M (2020) Effects of anthropomorphism and accountability on trust in human robot interaction In: Proceedings of the 2020 ACM/IEEE International Conference on Human-Robot Interaction, 33–42.

Kanda T, Miyashita T, Osada T, Haikawa Y, Ishiguro H (2008) Analysis of humanoid appearances in human–robot interaction. IEEE Trans Robot 24(3):725–735.

Bartneck C, Bleeker T, Bun J, Fens P, Riet L (2010) The influence of robot anthropomorphism on the feelings of embarrassment when interacting with robots. Paladyn 1(2):109–115.

Murphy J, Gretzel U, Pesonen J (2019) Marketing robot services in hospitality and tourism: the role of anthropomorphism. J Travel Tourism Mark 36(7):784–795.

MORI M (1970) Bukimi no tani [the uncanny valley]. Energy 7:33–35.

Mori M, MacDorman KF, Kageki N (2012) The uncanny valley [from the field]. IEEE Robot Autom Mag 19(2):98–100.

De Visser EJ, Monfort SS, McKendrick R, Smith MA, McKnight PE, Krueger F, Parasuraman R (2016) Almost human: Anthropomorphism increases trust resilience in cognitive agents. J Exp Psychol Appl 22(3):331.

Bartneck C, Kanda T, Ishiguro H, Hagita N (2009) My robotic doppelgänger-a critical look at the uncanny valley In: RO-MAN 2009-The 18th IEEE International Symposium on Robot and Human Interactive Communication, 269–276.. IEEE.

Fink J (2012) Anthropomorphism and human likeness in the design of robots and human-robot interaction In: International Conference on Social Robotics, 199–208.. Springer.

Złotowski J, Proudfoot D, Yogeeswaran K, Bartneck C (2015) Anthropomorphism: opportunities and challenges in human–robot interaction. Int J Soc Robot 7(3):347–360.

Khambhaita H, Alami R (2020) Viewing robot navigation in human environment as a cooperative activity. In: Amato NM, Hager G, Thomas S, Torres-Torriti M (eds)Robotics Research, 285–300.. Springer, Cham.

Khamassi M, Girard B, Clodic A, Sandra D, Renaudo E, Pacherie E, Alami R, Chatila R (2016) Integration of action, joint action and learning in robot cognitive architectures. Intellectica-La revue de l’Association pour la Recherche sur les sciences de la Cognition (ARCo) 2016(65):169–203.

Billard AG, Calinon S, Dillmann R (2016) Learning from Humans(Siciliano B, Khatib O, eds.). Springer, Cham.

Gracia L, Pérez-Vidal C, Mronga D, Paco J, Azorin J-M, Gea J (2017) Robotic manipulation for the shoe-packaging process. Int J Adv Manuf. Technol. 92:1053–1067.

Chatila R, Renaudo E, Andries M, Chavez-Garcia R-O, Luce-Vayrac P, Gottstein R, Alami R, Clodic A, Devin S, Girard B, Khamassi M (2018) Toward self-aware robots. Front Robot AI 5:88. https://doi.org/10.3389/frobt.2018.00088 .

de Gea Fernández J, Mronga D, Günther M, Knobloch T, Wirkus M, Schröer M, Trampler M, Stiene S, Kirchner E, Bargsten V, Bänziger T, Teiwes J, Krüger T, Kirchner F (2017) Multimodal sensor-based whole-body control for human–robot collaboration in industrial settings. Robot Auton Syst 94:102–119. https://doi.org/10.1016/j.robot.2017.04.007 .

Aggarwal A, Kampmann P (2012) Tactile sensors based object recognition and 6d pose estimation In: ICIRA.. Springer, Berlin.

Veruggio G, Operto F, Bekey G (2016) Roboethics: Social and Ethical Implications(Siciliano B, Khatib O, eds.). Springer, Cham.

Iacca G, Lagioia F, Loreggia A, Sartor G (2020) A genetic approach to the ethical knob In: Legal Knowledge and Information Systems. JURIX 2020: The Thirty-third Annual Conference, Brno, Czech Republic, December 9–11, 2020, 103–112.. IOS Press BV, 2020, 334.

Dignum V (2019) Responsible Artificial Intelligence: How to Develop and Use AI in a Responsible Way. Springer, Berlin.

Jobin A, Ienca M, Vayena E (2019) The global landscape of ai ethics guidelines. Nat Mach Intell 1(9):389–399. https://doi.org/10.1038/s42256-019-0088-2 .

Goff LKL, Mukhtar G, Coninx A, Doncieux S (2019) Bootstrapping robotic ecological perception from a limited set of hypotheses through interactive perception. arXiv preprint arXiv:1901.10968.

Goff LKL, Yaakoubi O, Coninx A, Doncieux S (2019) Building an affordances map with interactive perception. arXiv preprint arXiv:1903.04413.

Download references

The project has received funding from the European Union’s Horizon 2020 research and innovation programme Project HumanE-AI-Net under grant agreement No 952026.

Author information

Authors and affiliations.

Institute of Intelligent Systems and Robotics (ISIR), Sorbonne Université, CNRS, Paris, France

Stephane Doncieux & Raja Chatila

Robotics Innovation Center, DFKI GmbH (German Research Center for Artificial Intelligence), Bremen, DE, Germany

Sirko Straube & Frank Kirchner

Faculty of Mathematics and Computer Science, Robotics Group, University of Bremen, Bremen, DE, Germany

Frank Kirchner

You can also search for this author in PubMed   Google Scholar

Contributions

All authors have contributed to the text and approved the final manuscript.

Corresponding author

Correspondence to Stephane Doncieux .

Ethics declarations

Competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Doncieux, S., Chatila, R., Straube, S. et al. Human-centered AI and robotics. AI Perspect 4 , 1 (2022). https://doi.org/10.1186/s42467-021-00014-x

Download citation

Received : 02 June 2021

Accepted : 27 October 2021

Published : 28 January 2022

DOI : https://doi.org/10.1186/s42467-021-00014-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Human-centered
  • Human robot interaction

artificial intelligence in robotics research paper

  • Reference Manager
  • Simple TEXT file

People also looked at

Methods article, a review of artificial intelligence and robotics in transformed health ecosystems.

artificial intelligence in robotics research paper

  • 1 Institute for Medical Information, Bern University of Applied Sciences, Bern, Switzerland
  • 2 Object Management Group, Needham, MA, United States

Health care is shifting toward become proactive according to the concept of P5 medicine–a predictive, personalized, preventive, participatory and precision discipline. This patient-centered care heavily leverages the latest technologies of artificial intelligence (AI) and robotics that support diagnosis, decision making and treatment. In this paper, we present the role of AI and robotic systems in this evolution, including example use cases. We categorize systems along multiple dimensions such as the type of system, the degree of autonomy, the care setting where the systems are applied, and the application area. These technologies have already achieved notable results in the prediction of sepsis or cardiovascular risk, the monitoring of vital parameters in intensive care units, or in the form of home care robots. Still, while much research is conducted around AI and robotics in health care, adoption in real world care settings is still limited. To remove adoption barriers, we need to address issues such as safety, security, privacy and ethical principles; detect and eliminate bias that could result in harmful or unfair clinical decisions; and build trust in and societal acceptance of AI.

The Need for AI and Robotics in Transformed Health Ecosystems

“Artificial intelligence (AI) is the term used to describe the use of computers and technology to simulate intelligent behavior and critical thinking comparable to a human being” ( 1 ). Machine learning enables AI applications to automatically (i.e., without being explicitly programmed for) improving their algorithms through experiences gained by cognitive inputs or by the use of data. AI solutions provide data and knowledge to be used by humans or other technologies. The possibility of machines behaving in such a way was originally raised by Alan Turing and further explored starting in the 1950s. Medical expert systems such as MYCIN, designed in the 1970s for medical consultations ( 2 ), were internationally recognized a revolution supporting the development of AI in medicine. However, the clinical acceptance was not very high. Similar disappointments across multiple domains led to the so-called “AI winter,” in part because rule-based systems do not allow the discovery of unknown relationships and in part because of the limitations in computing power at the time. Since then, computational power has increased enormously.

Over the centuries, we have improved our knowledge about structure and function of the human body, starting with the organs, tissues, cells sub-cell components etc. Meanwhile, we could advance it up to the molecular and sub-molecular level, including protein coding genes, DNA sequences, non-coding RNA etc. and their effects and behavior in the human body. This has resulted in a continuously improving understanding of the biology of diseases and disease progressions ( 3 ). Nowadays, biomedical research and clinical practice are struggling with the size and complexity of the data produced by sequencing technologies, and how to derive from it new diagnoses and treatments. Experiment results, often hidden in clinical data warehouses, must be aggregated, analyzed, and exploited to derive our new, detailed and data-driven knowledge of diseases and enable better decision making.

New tools based on AI have been developed to predict disease recurrence and progression ( 4 ) or response to treatment; and robotics, often categorized as a branch of AI, plays an increasing role in patient care. In a medical context, AI means for example imitating the decision-making processes of health professionals ( 1 ). In contrast to AI that generates data, robotics provides touchable outcomes or realize physical tasks. AI and robotics use knowledge and patient data for various tasks such as: diagnosis; planning of surgeries; monitoring of patient physical and mental wellness; basic physical interventions to improve patient independence during physical or mental deterioration. We will review concrete realizations in a later section of this paper.

These advances are causing a revolution in health care, enabling it to become proactive as called upon by the concept of P5 medicine –a predictive, personalized, preventive, participatory and precision discipline ( 5 ). AI can help interpret personal health information together with other data to stratify the diseases to predict, stop or treat their progression.

In this paper, we describe the impact of AI and robotics on P5 medicine and introduce example use cases. We then discuss challenges faced by these developments. We conclude with recommendations to help AI and robotics transform health ecosystems. We extensively refer to appropriate literature for details on the underlying methods and technologies. Note that we concentrate on applications in the care setting and will not address in more detail the systems used for the education of professionals, logistics, or related to facility management–even though there are clearly important applications of AI in these areas.

Classification of AI and Robotic Systems in Medicine

We can classify the landscape of AI and robotic systems in health care according to different dimensions ( Figure 1 ): use, task, technology. Within the “use” dimension, we can further distinguish the application area or the care setting. The “task” dimension is characterized by the system's degree of autonomy. Finally, regarding the “technology” dimension, we consider the degree of intrusion into a patient and the type of system. Clearly, this is a simplification and aggregation: AI algorithms as such will not be located in a patient etc.

www.frontiersin.org

Figure 1 . Categorization of systems based on AI and robotics in health care.

Classification Based on Type of System

We can distinguish two types of such systems: virtual and physical ( 6 ).

• Virtual systems (relating to AI systems) range from applications such as electronic health record (EHR) systems, or text and data mining applications, to systems supporting treatment decisions.

• Physical systems relate to robotics and include robots that assist in performing surgeries, smart prostheses for handicapped people, and physical aids for elderly care.

There can also be hybrid systems combining AI with robotics, such as social robots that interact with users or microrobots that deliver drugs inside the body.

All these systems exploit enabling technologies that are data and algorithms (see Figure 2 ). For example, a robotic system may collect data from different sensors–visual, physical, auditory or chemical. The robot's processor manipulates, analyzes, and interprets the data. Actuators enable the robot to perform different functions including visual, physical, auditory or chemical responses.

www.frontiersin.org

Figure 2 . Types of AI-based systems and enabling technologies.

Two kinds of data are required: data that captures the knowledge and experience gained by the system during diagnosis and treatment, usually through machine learning; and individual patient data, which AI can assess and analyze to derive recommendations. Data can be obtained from physical sensors (wearable, non-wearable), from biosensors ( 7 ), or from other information systems such as an EHR application. From the collected data, digital biomarkers can be derived that AI can analyze and interpret ( 8 ).

AI-specific algorithms and methods allow data analysis, reasoning, and prediction. AI consists of a growing number of subfields such as machine learning (supervised, unsupervised, and reinforcement learning), machine vision, natural language processing (NLP) and more. NLP enables computers to process and understand natural language (written or spoken). Machine vision or computer vision extracts information from images. An authoritative taxonomy of AI does not exist yet, although several standards bodies have started addressing this task.

AI methodologies can be divided into knowledge-based AI and data-driven AI ( 9 ).

• Knowledge-based AI models human knowledge by asking experts for relevant concepts and knowledge they use to solve problems. This knowledge is then formalized in software ( 9 ). This is the form of AI closest to the original expert systems of the 1970s.

• Data-driven AI starts from large amounts of data, which are typically processed by machine learning methods to learn patterns that can be used for prediction. Virtual or augmented reality and other types of visualizations can be used to present and explore data, which helps understand relations among data items that are relevant for diagnosis ( 10 ).

To more fully exploit the knowledge captured in computerized models, the concept of digital twin has gained traction in the medical field ( 11 ). The terms “digital patient model,” “virtual physiological human,” or “digital phenotype” designate the same idea. A digital twin is a virtual model fed by information coming from wearables ( 12 ), omics, and patient records. Simulation, AI and robotics can then be applied to the digital twin to learn about the disease progression, to understand drug responses, or to plan surgery, before intervening on the actual patient or organ, effecting a significant digital transformation of the health ecosystems. Virtual organs (e.g., a digital heart) are an application of this concept ( 13 ). A digital twin can be customized to an individual patient, thus improving diagnosis.

Regardless of the specific kind of AI, there are some requirements that all AI and robotic systems must meet. They must be:

• Adaptive . Transformed health ecosystems evolve rapidly, especially since according to P5 principles they adapt treatment and diagnosis to individual patients.

• Context-aware . They must infer the current activity state of the user and the characteristics of the environment in order to manage information content and distribution.

• Interoperable . A system must be able to exchange data and knowledge with other ones ( 14 ). This requires common semantics between systems, which is the object of standard terminologies, taxonomies or ontologies such as SNOMED CT. NLP can also help with interoperability ( 15 ).

Classification Based on Degree of Autonomy

AI and robotic systems can be grouped along an assistive-to-autonomous axis ( Figure 3 ). Assistive systems augment the capabilities of their user by aggregating and analyzing data, performing concrete tasks under human supervision [for example, a semiautonomous ultrasound scanner ( 17 )], or learning how to perform tasks from a health professional's demonstrations. For example, a robot may learn from a physiotherapist how to guide a patient through repetitive rehabilitation exercises ( 18 ).

www.frontiersin.org

Figure 3 . Levels of autonomy of robotic and AI systems. [following models proposed by ( 16 )].

Autonomous systems respond to real world conditions, make decisions, and perform actions with minimal or no interaction with a human ( 19 ). They be encountered in a clinical setting (autonomous implanted devices), in support functions to provide assistance 1 (carrying things around in a facility), or to automate non-physical work, such as a digital receptionist handling patient check-in ( 20 ).

Classification Based on Application Area

The diversity of users of AI and robotics in health care implies an equally broad range of application areas described below.

Robotics and AI for Surgery

Robotics-assisted surgery, “the use of a mechanical device to assist surgery in place of a human-being or in a human-like way” ( 21 ) is rapidly impacting many common general surgical procedures, especially minimally invasive surgery. Three types of robotic systems are used in surgery:

• Active systems undertake pre-programmed tasks while remaining under the control of the operating surgeon;

• Semi-active systems allow a surgeon to complement the system's pre-programmed component;

• Master–slave systems lack any autonomous elements; they entirely depend on a surgeon's activity. In laparoscopic surgery or in teleoperation, the surgeon's hand movements are transmitted to surgical instruments, which reproduce them.

Surgeons can also be supported by navigation systems, which localize positions in space and help answer a surgeon's anatomical orientation questions. Real-time tracking of markers, realized in modern surgical navigation systems using a stereoscopic camera emitting infrared light, can determine the 3D position of prominent structures ( 22 ).

Robotics and AI for Rehabilitation

Various AI and robotic systems support rehabilitation tasks such as monitoring, risk prevention, or treatment ( 23 ). For example, fall detection systems ( 24 ) use smart sensors placed within an environment or in a wearable device, and automatically alert medical staff, emergency services, or family members if assistance is required. AI allows these systems to learn the normal behavioral patterns and characteristics of individuals over time. Moreover, systems can assess environmental risks, such as household lights that are off or proximity to fall hazards (e.g., stairwells). Physical systems can provide physical assistance (e.g., lifting items, opening doors), monitoring, and therapeutic social functions ( 25 ). Robotic rehabilitation applications can provide both physical and cognitive support to individuals by monitoring physiological progress and promoting social interaction. Robots can support patients in recovering motions after a stroke using exoskeletons ( 26 ), or recovering or supplementing lost function ( 27 ). Beyond directly supporting patients, robots can also assist caregivers. An overview on home-based rehabilitation robots is given by Akbari et al. ( 28 ). Virtual reality and augmented reality allow patients to become immersed within and interact with a 3D model of a real or imaginary world, allowing them to practice specific tasks ( 29 ). This has been used for motor function training, recovery after a stroke ( 30 ) and in pain management ( 31 ).

Robotics and AI for Telemedicine

Systems supporting telemedicine support among others the triage, diagnostic, non-surgical treatment, surgical treatment, consultation, monitoring, or provision of specialty care ( 32 ).

• Medical triage assesses current symptoms, signs, and test results to determine the severity of a patient's condition and the treatment priority. An increasing number of mobile health applications based on AI are used for diagnosis or treatment optimization ( 33 ).

• Smart mobile and wearable devices can be integrated into “smart homes” using Internet-of-Things (IoT) technologies. They can collect patient and contextual data, assist individuals with everyday functioning, monitor progress toward individualized care and rehabilitation goals, issue reminders, and alert care providers if assistance is required.

• Telemedicine for specialty care includes additional tools to track mood and behavior (e.g., pain diaries), AI-based chatbots can mitigate social isolation in home care environments 2 by offering companionship and emotional support to users, noting if they are not sleeping well, in pain or depressed, which could indicate a more complex mental condition ( 34 ).

• Beyond this, there are physical systems that can deliver specialty care: Robot DE NIRO can interact naturally, reliably, and safely with humans, autonomously navigate through environments on command, intelligently retrieve or move objects ( 35 ).

Robotics and AI for Prediction and Precision Medicine

Precision medicine considers the individual patients, their genomic variations as well as contributing factors (age, gender, ethnicity, etc.), and tailors interventions accordingly ( 8 ). Digital health applications can also incorporate data such as emotional state, activity, food intake, etc. Given the amount and complexity of data this requires, AI can learn from comprehensive datasets to predict risks and identify the optimal treatment strategy ( 36 ). Clinical decision support systems (CDSS) that integrate AI can provide differential diagnoses, recognize early warning signs of patient morbidity or mortality, or identify abnormalities in radiological images or laboratory test results ( 37 ). They can increase patient safety, for example by reducing medication or prescription errors or adverse events and can increase care consistency and efficiency ( 38 ). They can support clinical management by ensuring adherence to the clinical guidelines or automating administrative functions such as clinical and diagnostic encoding ( 39 ), patient triage or ordering of procedures ( 37 ).

AI and Agents for Management and Support Tasks

NLP applications, such as voice transcription, have proved helpful for clinical note-taking ( 40 ), compiling electronic health records, automatically generating medical reports from patient-doctor conversations or diagnostic reports ( 41 ). AI algorithms can help retrieving context-relevant patient data. Concept-based information retrieval can improve search accuracy and retrieval speed ( 42 ). AI algorithms can improve the use and allocation of hospital resources by predicting the length of stay of patients ( 43 ) or risk of re-admission ( 44 ).

Classification Based on Degree of Intrusion Into a Patient

Robotic systems can be used inside the body, on the body or outside the body. Those applied inside the body include microrobots ( 45 ), surgical robots and interventional robots. Microrobots are sub-millimeter untethered devices that can be propelled for example by chemical reactions ( 46 ), or physical fields ( 47 ). They can move unimpeded through the body and perform tasks such as targeted therapy (localized delivery of drugs) ( 48 ).

Microrobots can assist in physical surgery, for example by drilling through a blood clot or by opening up obstructions in the urinary tract to restore normal flow ( 49 ). They can provide directed local tissue heating to destroy cancer cells ( 50 ). They can be implanted to provide continuous remote monitoring and early awareness of an emerging disease.

Robotic prostheses, orthoses and exoskeletons are examples of robotic systems worn on the body. Exoskeletons are wearable robotic systems that are tightly physically coupled with a human body to provide assistance or enhance the wearer's physical capabilities ( 51 ). While they have often been developed for applications outside of health care, they can help workers with physically demanding tasks such as moving patients ( 52 ) or assist people with muscle weakness or movement disorders. Wearable technology can also be used to measure and transmit data about vital signs or physical activity ( 19 ).

Robotic systems applied outside the body can help avoid direct contact when treating patients with infectious diseases ( 53 ), assist in surgery (as already mentioned), including remote surgical procedures that leverage augmented reality ( 54 ) or assist providers when moving patients ( 55 ).

Classification Based on Care Setting

Another dimension of AI and robotics is the duration of their use, which directly correlates with the location of use. Both can significantly influence the requirements, design, and technology components of the solution. In a longer-term care setting, robotics can be used in a patient's home (e.g., for monitoring of vital signs) or for treatment in a nursing home. Shorter-term care settings include inpatient hospitals, palliative care facilities or inpatient psychiatric facilities. Example applications are listed in Table 1 .

www.frontiersin.org

Table 1 . Classification by care setting.

Sample Realizations

Having seen how to classify AI and robotic systems in health care, we turn to recent concrete achievements that illustrate their practical application and achievements already realized. This list is definitely not exhaustive, but it illustrates the fact that we're no longer purely at the research or experimentation stage: the technology is starting to bear fruit in a very concrete way–that is, by improving outcomes–even when only in the context of clinical trials prior to regulatory approval for general use.

Sepsis Onset Prediction

Sepsis was recently identified as the leading cause of death worldwide, surpassing even cancer or cardiovascular diseases. 3 And while timely diagnosis and treatment are difficult in other care settings, it is also the leading cause of death in hospitals in the United States (Sepsis Fact Sheet 4 ) A key reason is the difficulty of recognizing precursor symptoms early enough to initiate effective treatment. Therefore, early onset prediction promises to save millions of lives each year. Here are four such projects:

• Bayesian Health 5 , a startup founded by a researcher at Johns Hopkins University, applied its model to a test population of hospital patients and correctly identified 82% of the 9,800 patients who later developed sepsis.

• Dascena, a California startup, has been testing its software on large cohorts of patients since 2017, achieving significant improvements in outcomes ( 63 ).

• Patchd 6 uses wearable devices and deep learning to predict sepsis in high-risk patients. Early studies have shown that this technology can predict sepsis 8 h earlier, and more accurately, than under existing standards of care.

• A team of researchers from Singapore developed a system that combines clinical measures (structured data) with physician notes (unstructured data), resulting in improved early detection while reducing false positives ( 64 ).

Monitoring Systems in the Intensive Care Unit

For patients in an ICU, the paradox is that large amounts of data are collected, displayed on monitors, and used to trigger alarms, but these various data streams are rarely used together, nor can doctors or nurses effectively observe all the data from all the patients all the time.

This is an area where much has been written, but most available information points to studies that have not resulted in actual deployments. A survey paper alluded in particular to the challenge of achieving effective collaboration between ICU staff and automated processes ( 65 ).

In one application example, machine learning helps resolving the asynchrony between a mechanical ventilator and the patient's own breathing reflexes, which can cause distress and complicate recovery ( 66 ).

Tumor Detection From Image Analysis

This is another area where research has provided evidence of the efficacy of AI, generally not employed alone but rather as an advisor to a medical professional, yet there are few actual deployments at scale.

These applications differ based on the location of the tumors, and therefore on the imaging techniques used to observe them. AI makes the interpretation of the images more reliable, generally by pinpointing to the radiologists areas they might otherwise overlook.

• In a study performed in Korea, AI appeared to improve the recognition of lung cancer in chest X-rays ( 67 ). AI by itself performed better than unaided radiologists, and the improvement was greater when AI was used as an aid by radiologists. Note however that the sample size was fairly small.

• Several successive efforts aimed to use AI to classify dermoscopic images to discriminate between benign nevi and melanoma ( 68 ).

AI for COVID-19 Detection

The rapid and tragic emergence of the COVID-19 disease, and its continued evolution at the time of this writing, have mobilized many researchers, including the AI community. This domain is naturally divided into two areas, diagnostic and treatment.

An example of AI applied to COVID-19 diagnostic is based on an early observation that the persistent cough that is one of the common symptoms of the disease “sounds different” from the cough caused by other ailments, such as the common cold. The MIT Opensigma project 7 has “crowdsourced” sound recordings of coughs from many people, most of whom do not have the disease while some know that they have it or had it. Several similar projects have been conducted elsewhere ( 69 ).

Another effort used AI to read computer tomography images to provide a rapid COVID-19 test, reportedly achieving over 90% accuracy in 15 s ( 70 ). Curiously, after this news was widely circulated in February-March 2020, nothing else was said for several months. Six months later, a blog post 8 from the University of Virginia radiology and medical department asserted that “CT scans and X-rays have a limited role in diagnosing coronavirus.” The approach pioneered in China may have been the right solution at a specific point in time (many cases concentrated in a small geographical area, requiring a massive detection effort before other rapid tests were available), thus overriding the drawbacks related to equipment cost and patient exposure to radiation.

Patient Triage and Symptom Checkers

While the word triage immediately evokes urgent decisions about what interventions to perform on acutely ill patients or accident victims, it can also be applied to remote patient assistance (e.g., telehealth applications), especially in areas underserved by medical staff and facilities.

In an emergency care setting, where triage decisions can result in the survival or death of a person, there is a natural reluctance to entrust such decisions to machines. However, AI as a predictor of outcomes could serve as an assistant to an emergency technician or doctor. A 2017 study of emergency room triage of patients with acute abdominal pain only showed an “acceptable level of accuracy” ( 71 ), but more recently, the Mayo Clinic introduced an AI-based “digital triage platform” from Diagnostic Robotics 9 to “perform clinical intake of patients and suggest diagnoses and hospital risk scores.” These solutions can now be delivered by a website or a smartphone app, and have evolved from decision trees designed by doctors to incorporate AI.

Cardiovascular Risk Prediction

Google Research announced in 2018 that it has achieved “prediction of cardiovascular risk factors from retinal fundus photographs via deep learning” with a level of accuracy similar to traditional methods such as blood tests for cholesterol levels ( 72 ). The novelty consists in the use of a neural network to analyze the retina image, resulting in more power at the expense of explainability.

In practice, the future of such a solution is unclear: certain risk factors could be assessed from the retinal scan, but those were often factors that could be measured directly anyway–such as from blood pressure.

Gait Analysis

Many physiological and neurological factors affect how someone walks, given the complex interactions between the sense of touch, the brain, the nervous system, and the muscles involved. Certain conditions, in particular Parkinson's disease, have been shown to affect a person's gait, causing visible symptoms that can help diagnose the disease or measure its progress. Even if an abnormal gait results from another cause, an accurate analysis can help assess the risk of falls in elderly patients.

Compared to other applications in this section, gait analysis has been practiced for a longer time (over a century) and has progressed incrementally as new motion capture methods (film, video, infrared cameras) were developed. In terms of knowledge representation, see for example the work done at MIT twenty years ago ( 73 ). Computer vision, combined with AI, can considerably improve gait analysis compared to a physician's simple observation. Companies such as Exer 10 offer solutions that physical therapists can use to assess patients, or that can help monitor and improve a home exercise program. This is an area where technology has already been deployed at scale: there are more than 60 clinical and research gate labs 11 in the U.S. alone.

Home Care Robots

Robots that provide assistance to elderly or sick persons have been the focus of research and development for several decades, particularly in Japan due to the country's large aging population with above-average longevity. “Elder care robots” can be deployed at home (with cost being an obvious issue for many customers) or in senior care environments ( 74 ), where they will help alleviate a severe shortage of nurses and specialized workers, which cannot be easily addressed through the hiring of foreign help given the language barrier.

The types of robots used in such settings are proliferating. They range from robots that help patients move or exercise, to robots that help with common tasks such as opening the front door to a visitor or bringing a cup of tea, to robots that provide psychological comfort and even some form of conversation. PARO, for instance, is a robotic bay seal developed to provide treatment to patients with dementia ( 75 ).

Biomechatronics

Biomechatronics combines biology, mechanical engineering, and electronics to design assistive devices that interpret inputs from sensors and send commands to actuators–with both sensors and actuators attached in some manner to the body. The sensors, actuators, control system, and the human subject form together a closed-loop control system.

Biomechatronic applications live at the boundary of prosthetics and robotics, for example to help amputees achieve close-to-normal motion of a prosthetic limb. This work has been demonstrated for many years, with impressive results, at the MIT Media Lab under Prof. Hugh Herr 12 However, those applications have rarely left the lab environment due to the device cost. That cost could be lowered by production in large quantities, but coverage by health insurance companies or agencies is likely to remain problematic.

Mapping of Use Cases to Classification

Table 2 shows a mapping of the above use cases to the classification introduced in the first section of this paper.

www.frontiersin.org

Table 2 . Mapping of use cases to our classification.

Adoption Challenges to AI and Robotics in Health Care

While the range of opportunities, and the achievements to date, of robotics and AI are impressive as seen above, multiple issues impede their deployment and acceptance in daily practice.

Issues related to trust, security, privacy and ethics are prevalent across all aspects of health care, and many are discussed elsewhere in this issue. We will therefore only briefly mention those challenges that are unique to AI and robotics.

Resistance to Technology

Health care professionals may ignore or resist new technologies for multiple reasons, including actual or perceived threats to professional status and autonomy ( 76 ), privacy concerns ( 77 ) or the unresolved legal and ethical questions of responsibility ( 78 ). The issues of worker displacement by robots are just as acute in health care as in other domains. Today, while surgery robots operate increasingly autonomously, humans still perform many tasks and play an essential role in determining the robot's course of operation (e.g., for selecting the process parameters or for the positioning of the patient) ( 79 ). This allocation of responsibilities is bound to evolve.

Transparency and Explainability

Explainability is “a characteristic of an AI-driven system allowing a person to reconstruct why a certain AI came up with the presented prediction” ( 80 ). In contrast to rule-based systems, AI-based predictions can often not be explained in a human-intelligible manner, which can hide errors or bias (the “black box problem” of machine learning). The explainability of AI models is an ongoing research area. When information on the reasons for an AI-based decision is missing, physicians cannot judge the reliability of the advice and there is a risk to patient safety.

Responsibility, Accountability and Liability

Who is responsible when the AI or robot makes mistakes or creates harm in patients? Is it the programmer, manufacturer, end user, the AI/robotic system itself, the provider of the training dataset, or something (or someone) else? The answer depends on the system's degree of autonomy. The European Parliament's 2017 Resolution on AI ( 81 ) assigns legal responsibility for an action of an AI or robotic system to a human actor, which may be its owner, developer, manufacturer or operator.

Data Protection

Machine learning requires access to large quantities of data regarding patients as well as healthy people. This raises issues regarding the ownership of data, protection against theft, compliance with regulations such as HIPAA in the U.S. ( 82 ) or GDPR for European citizens ( 83 ), and what level of anonymization of data is necessary and possible. Regarding the last point, AI models could have unintended consequences, and the evolution of science itself could make patient re-identification possible in the future.

Data Quality and Integration

Currently, the reliability and quality of data received from sensors and digital health devices remain uncertain ( 84 )–a fact that future research and development must address. Datasets in medicine are naturally imperfect (due to noise, errors in documentation, incompleteness, differences in documentation granularities, etc.), hence it is impossible to develop error-free machine learning models ( 80 ). Furthermore, without a way to quickly and reliably integrate the various data sources for analysis, there is lost potential for fast diagnosis by AI algorithms.

Safety and Security

Introducing AI and robotics into the delivery of health care is likely to create new risks and safety issues. Those will exist even under normal functioning circumstances, when they may be due to design, programming or configuration errors, or improper data preparation ( 85 ).

These issues only get worse when considering the probability of cyberattacks:

• Patient data may be exposed or stolen, perhaps by scammers who want to exploit it for profit.

• Security vulnerabilities in robots that interact directly with patients may cause malfunctions that physically threaten the patient or professional. The robot may cause harm directly, or indirectly by giving a surgeon incorrect feedback. In case of unexpected robot behavior, it may be unclear to the user whether the robot is functioning properly or is under attack ( 86 ).

The EU Commission recently drafted a legal framework 13 addressing the risks of AI (not only in health care) in order to improve the safety of and trust in AI. The framework distinguishes four levels of risks: unacceptable risk, high risk, limited risk and minimal risk. AI systems with unacceptable risks will be prohibited, high-risk ones will have to meet strict obligations before release (e.g., risk assessment and mitigation, traceability of results). Limited-risk applications such as chatbots (which can be used in telemedicine) will require “labeling” so that users are made aware that they are interacting with an AI-powered system.

While P5 medicine aims at considering multiple factors–ethnicity, gender, socio-economic background, education, etc.–to come up with individualized care, current implementations of AI often demonstrate potential biases toward certain patient groups of the population. The training datasets may have under-represented those groups, or important features may be distributed differently across groups–for example, cardiovascular disease or Parkinson's disease progress differently in men and women ( 87 ), so the corresponding features will vary. These causes result in undesirable bias and “unintended of unnecessary discrimination” of subgroups ( 88 ).

On the flip side, careful implementations of AI could explicitly consider gender, ethnicity, etc. differences to achieve more effective treatments for patients belonging to those groups. This can be considered “desirable bias” that counteracts the undesirable kind ( 89 ) and gets us closer to the goals of P5 medicine.

Trust–An Evolving Relationship

The relationship between patients and medical professionals has evolved over time, and AI is likely to impact it by inserting itself into the picture (see Figure 4 ). Although AI and robotics are performing well, human surveillance is still essential. Robots and AI algorithms operate logically, but health care often requires acting empathically. If doctors become intelligent users of AI, they may retain the trust associated with their role, but most patients, who have a limited understanding of the technologies involved, would have much difficulty in trusting AI ( 90 ). Conversely, reliable and accurate diagnosis and beneficial treatment, and appropriate use of AI and robotics by the physician can strengthen the patient's trust ( 91 ).

www.frontiersin.org

Figure 4 . Physician-patient-AI relationship.

This assumes of course that the designers of those systems adhere to established guidelines for trustworthy AI in the first place, which includes such requirements as creating systems that are lawful, ethical, and robust ( 92 , 93 ).

AI and Robotics for Transformed Health Care–A Converging Path

We can summarize the previous sections as follows:

1. There are many types of AI applications and robotic systems, which can be introduced in many aspects of health care.

2. AI's ability to digest and process enormous amounts of data, and derive conclusions that are not obvious to a human, holds the promise of more personalized and predictive care–key goals of P5 medicine.

3. There have been, over the last few years, a number of proof-of-concept and pilot projects that have exhibited promising results for diagnosis, treatment, and health maintenance. They have not yet been deployed at scale–in part because of the time it takes to fully evaluate their efficacy and safety.

4. There is a rather daunting list of challenges to address, most of which are not purely technical–the key one being demonstrating that the systems are effective and safe enough to warrant the confidence of both the practitioners and their patients.

Based on this analysis, what is the roadmap to success for these technologies, and how will they succeed in contributing to the future of health care? Figure 5 depicts the convergent approaches that need to be developed to ensure safe and productive adoption, in line with the P5 medicine principles.

www.frontiersin.org

Figure 5 . Roadmap for transformed health care.

First, AI technology is currently undergoing a remarkable revival and being applied to many domains. Health applications will both benefit from and contribute to further advances. In areas such as image classification or natural language understanding, both of which have obvious utility in health care, the rate of progress is remarkable. Today's AI techniques may seem obsolete in ten years.

Second, the more technical challenges of AI–such as privacy, explainability, or fairness–are being worked on, both in the research community and in the legislative and regulatory world. Standard procedures for assessing the efficacy and safety of systems will be needed, but in reality, this is not a new concept: it is what has been developed over the years to approve new medicines. We need to be consistent and apply the same hard-headed validation processes to the new technologies.

Third, it should be clear from our exploration of this subject that education –of patients as well as of professionals–is key to the societal acceptance of the role that AI and robotics will be called upon to play. Every invention or innovation–from the steam engine to the telephone to the computer–has gone through this process. Practitioners must learn enough about how AI models and robotics work to build a “working relationship” with those tools and build trust in them–just as their predecessors learned to trust what they saw on an X-ray or CT scan. Patients, for their part, need to understand what AI and robotics can or cannot do, how the physician will remain in the loop when appropriate, and what data is being collected about them in the process. We will have a responsibility to ensure that complex systems that patients do not sufficiently understand cannot be misused against them, whether accidentally or deliberately.

Fourth, health care is also a business, involving financial transactions between patients, providers, and insurers (public or private, depending on the country). New cost and reimbursement models will need to be developed, especially given that when AI is used to assist professionals, not replace them, the cost of the system is additive to the human cost of assessing the data and reviewing the system's recommendations.

Fifth and last, clinical pathways have to be adapted and new role models for physicians have to be built. Clinical paths can already differ and make it harder to provide continuity of care to a patient who moves across care delivery systems that have different capabilities. This issue is being addressed by the BPM+ Health Community 14 using the business process, case management and decision modeling standards of the Object Management Group (OMG). The issue will become more complex by integrating AI and robotics: every doctor has similar training and a stethoscope, but not every doctor or hospital will have the same sensors, AI programs, or robots.

Eventually, the convergence of these approaches will help to build a complete digital patient model–a digital twin of each specific human being – generated out of all the data gathered from general practitioners, hospitals, laboratories, mHealth apps, and wearable sensors, along the entire life of the patient. At that point, AI will be able to support superior, fully personal and predictive medicine, while robotics will automate or support many aspects of treatment and care.

Data Availability Statement

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

Author Contributions

KD came up with the classification of AI and robotic systems. CB identified concrete application examples. Both authors contributed equally, identified adoption challenges, and developed recommendations for future work. Both authors contributed to the article and approved the submitted version.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

1. ^ https://cmte.ieee.org/futuredirections/2019/07/21/autonomous-systems-in-healthcare/

2. ^ https://emag.medicalexpo.com/ai-powered-chatbots-to-help-against-self-isolation-during-covid-19/

3. ^ https://www.med.ubc.ca/news/sepsis-leading-cause-of-death-worldwide/

4. ^ https://www.sepsis.org/wp-content/uploads/2017/05/Sepsis-Fact-Sheet-2018.pdf

5. ^ https://medcitynews.com/2021/07/johns-hopkins-spinoff-looking-to-build-better-risk-prediction-tooing,ls-emerges-with-15m/

6. ^ https://www.patchdmedical.com/

7. ^ https://hisigma.mit.edu

8. ^ https://blog.radiology.virginia.edu/covid-19-and-imaging/

9. ^ https://hitinfrastructure.com/news/diagnostic-robotics-mayo-clinic-bring-triage-platform-to-patients

10. ^ https://www.exer.ai

11. ^ https://www.gcmas.org/map

12. ^ https://www.media.mit.edu/groups/biomechatronics/overview/

13. ^ https://digital-strategy.ec.europa.eu/en/policies/regulatory-framework-ai

14. ^ https://www.bpm-plus.org/

1. Amisha Malik P, Pathania M, Rathaur VK. Overview of artificial intelligence in medicine. J Fam Med Prim Care. (2019) 8:2328–31. doi: 10.4103/jfmpc.jfmpc_440_19

PubMed Abstract | CrossRef Full Text | Google Scholar

2. van Melle W, Shortliffe EH, Buchanan BG. EMYCIN: a knowledge engineer's tool for constructing rule-based expert systems. In: Buchanan BG, Shortliffe EH, editors. Rule-Based Expert Systems . Reading, MA: Addison-Wesley Publishing Company (1984). p. 302–13.

3. Tursz T, Andre F, Lazar V, Lacroix L, Soria J-C. Implications of personalized medicine—perspective from a cancer center. Nat Rev Clin Oncol. (2011) 8:177–83. doi: 10.1038/nrclinonc.2010.222

4. van't Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AAM, Mao M, et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature. (2002) 415:530–6. doi: 10.1038/415530a

5. Auffray C, Charron D, Hood L. Predictive, preventive, personalized and participatory medicine: back to the future. Genome Med. (2010) 2:57. doi: 10.1186/gm178

6. Hamet P, Tremblay J. Artificial intelligence in medicine. Metabolism. (2017) 69:S36–40. doi: 10.1016/j.metabol.2017.01.011

7. Kim J, Campbell AS, de Ávila BE-F, Wang J. Wearable biosensors for healthcare monitoring. Nat Biotechnol. (2019) 37:389–406. doi: 10.1038/s41587-019-0045-y

8. Nam KH, Kim DH, Choi BK, Han IH. Internet of things, digital biomarker, and artificial intelligence in spine: current and future perspectives. Neurospine. (2019) 16:705–11. doi: 10.14245/ns.1938388.194

9. Steels L, Lopez de, Mantaras R. The Barcelona declaration for the proper development and usage of artificial intelligence in Europe. AI Commun. (2018) 31:485–94. doi: 10.3233/AIC-180607

10. Olshannikova E, Ometov A, Koucheryavy Y, Olsson T. Visualizing big data with augmented and virtual reality: challenges and research agenda. J Big Data. (2015) 2:22. doi: 10.1186/s40537-015-0031-2

CrossRef Full Text

11. Björnsson B, Borrebaeck C, Elander N, Gasslander T, Gawel DR, Gustafsson M, et al. Digital twins to personalize medicine. Genome Med. (2019) 12:4. doi: 10.1186/s13073-019-0701-3

12. Bates M. Health care chatbots are here to help. IEEE Pulse. (2019) 10:12–4. doi: 10.1109/MPULS.2019.2911816

13. Corral-Acero J, Margara F, Marciniak M, Rodero C, Loncaric F, Feng Y, et al. The “Digital Twin” to enable the vision of precision cardiology. Eur Heart J. (2020) 41:4556–64. doi: 10.1093/eurheartj/ehaa159

14. Montani S, Striani M. Artificial intelligence in clinical decision support: a focused literature survey. Yearb Med Inform. (2019) 28:120–7. doi: 10.1055/s-0039-1677911

15. Oemig F, Blobel B. natural language processing supporting interoperability in healthcare. In: Biemann C, Mehler A, editors. Text Mining. Cham: Springer International Publishing (2014). p. 137–56. (Theory and Applications of Natural Language Processing). doi: 10.1007/978-3-319-12655-5_7

16. Bitterman DS, Aerts HJWL, Mak RH. Approaching autonomy in medical artificial intelligence. Lancet Digit Health. (2020) 2:e447–9. doi: 10.1016/S2589-7500(20)30187-4

17. Carriere J, Fong J, Meyer T, Sloboda R, Husain S, Usmani N, et al. An Admittance-Controlled Robotic Assistant for Semi-Autonomous Breast Ultrasound Scanning. In: 2019 International Symposium on Medical Robotics (ISMR). Atlanta, GA: IEEE (2019). p. 1–7. doi: 10.1109/ISMR.2019.8710206

CrossRef Full Text | Google Scholar

18. Tao R, Ocampo R, Fong J, Soleymani A, Tavakoli M. Modeling and emulating a physiotherapist's role in robot-assisted rehabilitation. Adv Intell Syst. (2020) 2:1900181. doi: 10.1002/aisy.201900181

19. Tavakoli M, Carriere J, Torabi A. Robotics, smart wearable technologies, and autonomous intelligent systems for healthcare during the COVID-19 pandemic: an analysis of the state of the art and future vision. Adv Intell Syst. (2020) 2:2000071. doi: 10.1002/aisy.202000071

20. Ahn HS, Yep W, Lim J, Ahn BK, Johanson DL, Hwang EJ, et al. Hospital receptionist robot v2: design for enhancing verbal interaction with social skills. In: 2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN). New Delhi: IEEE (2019). p. 1–6. doi: 10.1109/RO-MAN46459.2019.8956300

21. Lane T. A short history of robotic surgery. Ann R Coll Surg Engl . (2018) 100:5–7. doi: 10.1308/rcsann.supp1.5

22. Mezger U, Jendrewski C, Bartels M. Navigation in surgery. Langenbecks Arch Surg. (2013) 398:501–14. doi: 10.1007/s00423-013-1059-4

23. Luxton DD, June JD, Sano A, Bickmore T. Intelligent mobile, wearable, and ambient technologies for behavioral health care. In: Artificial Intelligence in Behavioral and Mental Health Care . Elsevier (2016). p. 137–62. Available online at: https://linkinghub.elsevier.com/retrieve/pii/B9780124202481000064

Google Scholar

24. Casilari E, Oviedo-Jiménez MA. Automatic fall detection system based on the combined use of a smartphone and a smartwatch. PLoS ONE. (2015) 10:e0140929. doi: 10.1371/journal.pone.0140929

25. Sriram KNV, Palaniswamy S. Mobile robot assistance for disabled and senior citizens using hand gestures. In: 2019 International Conference on Power Electronics Applications and Technology in Present Energy Scenario (PETPES) . Mangalore: IEEE (2019). p. 1–6. doi: 10.1109/PETPES47060.2019.9003821

26. Nibras N, Liu C, Mottet D, Wang C, Reinkensmeyer D, Remy-Neris O, et al. Dissociating sensorimotor recovery and compensation during exoskeleton training following stroke. Front Hum Neurosci. (2021) 15:645021. doi: 10.3389/fnhum.2021.645021

27. Maciejasz P, Eschweiler J, Gerlach-Hahn K, Jansen-Troy A, Leonhardt S. A survey on robotic devices for upper limb rehabilitation. J NeuroEngineering Rehabil. (2014) 11:3. doi: 10.1186/1743-0003-11-3

28. Akbari A, Haghverd F, Behbahani S. Robotic home-based rehabilitation systems design: from a literature review to a conceptual framework for community-based remote therapy during COVID-19 pandemic. Front Robot AI. (2021) 8:612331. doi: 10.3389/frobt.2021.612331

29. Howard MC. A meta-analysis and systematic literature review of virtual reality rehabilitation programs. Comput Hum Behav. (2017) 70:317–27. doi: 10.1016/j.chb.2017.01.013

30. Gorman C, Gustafsson L. The use of augmented reality for rehabilitation after stroke: a narrative review. Disabil Rehabil Assist Technol . (2020) 17:409–17. doi: 10.1080/17483107.2020.1791264

31. Li A, Montaño Z, Chen VJ, Gold JI. Virtual reality and pain management: current trends and future directions. Pain Manag. (2011) 1:147–57. doi: 10.2217/pmt.10.15

32. Tulu B, Chatterjee S, Laxminarayan S. A taxonomy of telemedicine efforts with respect to applications, infrastructure, delivery tools, type of setting and purpose. In: Proceedings of the 38th Annual Hawaii International Conference on System Sciences . Big Island, HI: IEEE (2005). p. 147.

33. Lai L, Wittbold KA, Dadabhoy FZ, Sato R, Landman AB, Schwamm LH, et al. Digital triage: novel strategies for population health management in response to the COVID-19 pandemic. Healthc Amst Neth. (2020) 8:100493. doi: 10.1016/j.hjdsi.2020.100493

34. Valtolina S, Marchionna M. Design of a chatbot to assist the elderly. In: Fogli D, Tetteroo D, Barricelli BR, Borsci S, Markopoulos P, Papadopoulos GA, Editors. End-User Development . Cham: Springer International Publishing (2021). p. 153–68. (Lecture Notes in Computer Science; Bd. 12724).

PubMed Abstract | Google Scholar

35. Falck F, Doshi S, Tormento M, Nersisyan G, Smuts N, Lingi J, et al. Robot DE NIRO: a human-centered, autonomous, mobile research platform for cognitively-enhanced manipulation. Front Robot AI. (2020) 7:66. doi: 10.3389/frobt.2020.00066

36. Bohr A, Memarzadeh K, . (Eds.) The rise of artificial intelligence in healthcare applications. In: Artificial Intelligence in Healthcare . Oxford: Elsevier (2020). p. 25–60. doi: 10.1016/B978-0-12-818438-7.00002-2

37. Sutton RT, Pincock D, Baumgart DC, Sadowski DC, Fedorak RN, Kroeker KI. An overview of clinical decision support systems: benefits, risks, and strategies for success. NPJ Digit Med. (2020) 3:17. doi: 10.1038/s41746-020-0221-y

PubMed Abstract | CrossRef Full Text

38. Saddler N, Harvey G, Jessa K, Rosenfield D. Clinical decision support systems: opportunities in pediatric patient safety. Curr Treat Options Pediatr. (2020) 6:325–35. doi: 10.1007/s40746-020-00206-3

39. Deng H, Wu Q, Qin B, Chow SSM, Domingo-Ferrer J, Shi W. Tracing and revoking leaked credentials: accountability in leaking sensitive outsourced data. In: Proceedings of the 9th ACM Symposium on Information, Computer and Communications Security . New York, NY: Association for Computing Machinery (2014). p. 425–34. (ASIA CCS'14). doi: 10.1145/2590296.2590342

40. Leventhal R. How Natural Language Processing is Helping to Revitalize Physician Documentation . Cleveland, OH: Healthc Inform (2017). Vol. 34, p. 8–13.

41. Gu Q, Nie C, Zou R, Chen W, Zheng C, Zhu D, et al. Automatic generation of electromyogram diagnosis report. In: 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) . Seoul: IEEE (2020). p. 1645–50.

42. Jain V, Wason R, Chatterjee JM, Le D-N, editor. Ontology-Based Information Retrieval For Healthcare Systems. 1 st ed . Wiley-Scrivener (2020). doi: 10.1002/9781119641391

43. Awad A, Bader–El–Den M, McNicholas J. Patient length of stay and mortality prediction: a survey. Health Serv Manage Res. (2017) 30:105–20. doi: 10.1177/0951484817696212

44. Mahajan SM, Mahajan A, Nguyen C, Bui J, Abbott BT, Osborne TF. Predictive models for identifying risk of readmission after index hospitalization for hip arthroplasty: a systematic review. J Orthop. (2020) 22:73–85. doi: 10.1016/j.jor.2020.03.045

45. Ceylan H, Yasa IC, Kilic U, Hu W, Sitti M. Translational prospects of untethered medical microrobots. Prog Biomed Eng. (2019) 1:012002. doi: 10.1088/2516-1091/ab22d5

46. Sánchez S, Soler L, Katuri J. Chemically powered micro- and nanomotors. Angew Chem Int Ed Engl. (2015) 54:1414–44. doi: 10.1002/anie.201406096

47. Schuerle S, Soleimany AP, Yeh T, Anand GM, Häberli M, Fleming HE, et al. Synthetic and living micropropellers for convection-enhanced nanoparticle transport. Sci Adv. (2019) 5:eaav4803. doi: 10.1126/sciadv.aav4803

48. Erkoc P, Yasa IC, Ceylan H, Yasa O, Alapan Y, Sitti M. Mobile microrobots for active therapeutic delivery. Adv Ther. (2019) 2:1800064. doi: 10.1002/adtp.201800064

49. Yu C, Kim J, Choi H, Choi J, Jeong S, Cha K, et al. Novel electromagnetic actuation system for three-dimensional locomotion and drilling of intravascular microrobot. Sens Actuators Phys. (2010) 161:297–304. doi: 10.1016/j.sna.2010.04.037

50. Chang D, Lim M, Goos JACM, Qiao R, Ng YY, Mansfeld FM, et al. Biologically Targeted magnetic hyperthermia: potential and limitations. Front Pharmacol. (2018) 9:831. doi: 10.3389/fphar.2018.00831

51. Phan GH. Artificial intelligence in rehabilitation evaluation based robotic exoskeletons: a review. EEO. (2021) 20:6203–11. doi: 10.1007/978-981-16-9551-3_6

52. Hwang J, Kumar Yerriboina VN, Ari H, Kim JH. Effects of passive back-support exoskeletons on physical demands and usability during patient transfer tasks. Appl Ergon. (2021) 93:103373. doi: 10.1016/j.apergo.2021.103373

53. Hager G, Kumar V, Murphy R, Rus D, Taylor R. The Role of Robotics in Infectious Disease Crises. ArXiv201009909 Cs (2020).

54. Walker ME, Hedayati H, Szafir D. Robot teleoperation with augmented reality virtual surrogates. In: 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI) . Daegu: IEEE (2019). p. 202–10. doi: 10.1109/HRI.2019.8673306

55. Ding M, Matsubara T, Funaki Y, Ikeura R, Mukai T, Ogasawara T. Generation of comfortable lifting motion for a human transfer assistant robot. Int J Intell Robot Appl. (2017) 1:74–85. doi: 10.1007/s41315-016-0009-z

56. Mohebali D, Kittleson MM. Remote monitoring in heart failure: current and emerging technologies in the context of the pandemic. Heart. (2021) 107:366–72. doi: 10.1136/heartjnl-2020-318062

57. Blasco R, Marco Á, Casas R, Cirujano D, Picking R. A smart kitchen for ambient assisted living. Sensors. (2014) 14:1629–53. doi: 10.3390/s140101629

58. Valentí Soler M, Agüera-Ortiz L, Olazarán Rodríguez J, Mendoza Rebolledo C, Pérez Muñoz A, Rodríguez Pérez I, et al. Social robots in advanced dementia. Front Aging Neurosci. (2015) 7:133. doi: 10.3389/fnagi.2015.00133

59. Bickmore TW, Mitchell SE, Jack BW, Paasche-Orlow MK, Pfeifer LM, O'Donnell J. Response to a relational agent by hospital patients with depressive symptoms. Interact Comput. (2010) 22:289–98. doi: 10.1016/j.intcom.2009.12.001

60. Chatzimina M, Koumakis L, Marias K, Tsiknakis M. Employing conversational agents in palliative care: a feasibility study and preliminary assessment. In: 2019 IEEE 19th International Conference on Bioinformatics and Bioengineering (BIBE) . Athens: IEEE (2019). p. 489–96. doi: 10.1109/BIBE.2019.00095

61. Cecula P, Yu J, Dawoodbhoy FM, Delaney J, Tan J, Peacock I, et al. Applications of artificial intelligence to improve patient flow on mental health inpatient units - narrative literature review. Heliyon. (2021) 7:e06626. doi: 10.1016/j.heliyon.2021.e06626

62. Riek LD. Healthcare robotics. Comm ACM. (2017) 60:68–78. doi: 10.1145/3127874

63. Burdick H, Pino E, Gabel-Comeau D, McCoy A, Gu C, Roberts J, et al. Effect of a sepsis prediction algorithm on patient mortality, length of stay and readmission: a prospective multicentre clinical outcomes evaluation of real-world patient data from US hospitals. BMJ Health Care Inform. (2020) 27:e100109. doi: 10.1136/bmjhci-2019-100109

64. Goh KH, Wang L, Yeow AYK, Poh H, Li K, Yeow JJL, et al. Artificial intelligence in sepsis early prediction and diagnosis using unstructured data in healthcare. Nat Commun. (2021) 12:711. doi: 10.1038/s41467-021-20910-4

65. Uckun S. Intelligent systems in patient monitoring and therapy management. a survey of research projects. Int J Clin Monit Comput. (1994) 11:241–53. doi: 10.1007/BF01139876

66. Gholami B, Haddad WM, Bailey JM. AI in the ICU: in the intensive care unit, artificial intelligence can keep watch. IEEE Spectr. (2018) 55:31–5. doi: 10.1109/MSPEC.2018.8482421

67. Nam JG, Hwang EJ, Kim DS, Yoo S-J, Choi H, Goo JM, et al. Undetected lung cancer at posteroanterior chest radiography: potential role of a deep learning–based detection algorithm. Radiol Cardiothorac Imaging. (2020) 2:e190222. doi: 10.1148/ryct.2020190222

68. Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature. (2017) 542:115–8. doi: 10.1038/nature21056

69. Scudellari M. AI Recognizes COVID-19 in the Sound of a Cough . Available online at: https://spectrum.ieee.org/the-human-os/artificial-intelligence/medical-ai/ai-recognizes-covid-19-in-the-sound-of-a-cough (accessed November 4, 2020).

70. Ai T, Yang Z, Hou H, Zhan C, Chen C, Lv W, et al. Correlation of chest CT and RT-PCR testing for coronavirus disease 2019 (COVID-19) in China: a report of 1014 cases. Radiology. (2020) 296:E32–40. doi: 10.1148/radiol.2020200642

71. Farahmand S, Shabestari O, Pakrah M, Hossein-Nejad H, Arbab M, Bagheri-Hariri S. Artificial intelligence-based triage for patients with acute abdominal pain in emergency department; a diagnostic accuracy study. Adv J Emerg Med. (2017) 1:e5. doi: 10.22114/AJEM.v1i1.11

72. Poplin R, Varadarajan AV, Blumer K, Liu Y, McConnell MV, Corrado GS, et al. Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning. Nat Biomed Eng. (2018) 2:158–64. doi: 10.1038/s41551-018-0195-0

73. Lee L. Gait analysis for classification . (Bd. Thesis Ph. D.)–Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science (2002). Available online at: http://hdl.handle.net/1721.1/8116

74. Foster M. Aging Japan: Robots May Have Role in Future of Elder Care . Healthcare & Pharma. Available online at: https://www.reuters.com/article/us-japan-ageing-robots-widerimage-idUSKBN1H33AB (accessed March 28, 2018).

75. Pu L, Moyle W, Jones C. How people with dementia perceive a therapeutic robot called PARO in relation to their pain and mood: a qualitative study. J Clin Nurs February. (2020) 29:437–46. doi: 10.1111/jocn.15104

76. Walter Z, Lopez MS. Physician acceptance of information technologies: role of perceived threat to professional autonomy. Decis Support Syst. (2008) 46:206–15. doi: 10.1016/j.dss.2008.06.004

77. Price WN, Cohen IG. Privacy in the age of medical big data. Nat Med. (2019) 25:37–43. doi: 10.1038/s41591-018-0272-7

78. Lamanna C, Byrne L. Should artificial intelligence augment medical decision making? the case for an autonomy algorithm. AMA J Ethics. (2018) 20:E902–910. doi: 10.1001/amajethics.2018.902

79. Fosch-Villaronga E, Drukarch H. On Healthcare Robots . Leiden: Leiden University (2021). Available online at: https://arxiv.org/ftp/arxiv/papers/2106/2106.03468.pdf

80. The The Precise4Q consortium, Amann J, Blasimme A, Vayena E, Frey D, Madai VI. Explainability for artificial intelligence in healthcare: a multidisciplinary perspective. BMC Med Inform Decis Mak . (2020) 20:310. doi: 10.1186/s12911-020-01332-6

81. European Parliament. Resolution with Recommendations to the Commission on Civil Law Rules on Robotics (2015/2103(INL)). (2017). Available online at: http://www.europarl.europa.eu/

82. Mercuri RT. The HIPAA-potamus in health care data security. Comm ACM. (2004) 47:25–8. doi: 10.1145/1005817.1005840

83. Marelli L, Lievevrouw E, Van Hoyweghen I. Fit for purpose? the GDPR and the governance of European digital health. Policy Stud. (2020) 41:447–67. doi: 10.1080/01442872.2020.1724929

84. Poitras I, Dupuis F, Bielmann M, Campeau-Lecours A, Mercier C, Bouyer L, et al. Validity and reliability of wearable sensors for joint angle estimation: a systematic review. Sensors. (2019) 19:1555. doi: 10.3390/s19071555

85. Macrae C. Governing the safety of artificial intelligence in healthcare. BMJ Qual Saf June. (2019) 28:495–8. doi: 10.1136/bmjqs-2019-009484

86. Fosch-Villaronga E, Mahler T. Cybersecurity, safety and robots: strengthening the link between cybersecurity and safety in the context of care robots. Comput Law Secur Rev. (2021) 41:105528. doi: 10.1016/j.clsr.2021.105528

87. Miller IN, Cronin-Golomb A. Gender differences in Parkinson's disease: clinical characteristics and cognition. Mov Disord Off J Mov Disord Soc. (2010) 25:2695–703. doi: 10.1002/mds.23388

88. Cirillo D, Catuara-Solarz S, Morey C, Guney E, Subirats L, Mellino S, et al. Sex and gender differences and biases in artificial intelligence for biomedicine and healthcare. Npj Digit Med. (2020) 3:81. doi: 10.1038/s41746-020-0288-5

89. Wolff RF, Moons KGM, Riley RD, Whiting PF, Westwood M, Collins GS, et al. PROBAST: a tool to assess the risk of bias and applicability of prediction model studies. Ann Intern Med. (2019) 170:51–8. doi: 10.7326/M18-1376

90. LaRosa E, Danks D. Impacts on trust of healthcare AI. In: Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society . New Orleans, LA: ACM (2018). p. 210–5. doi: 10.1145/3278721.3278771

91. Lee D, Yoon SN. Application of artificial intelligence-based technologies in the healthcare industry: opportunities and challenges. Int J Environ Res Public Health. (2021) 18:271. doi: 10.3390/ijerph18010271

92. Smuha NA. Ethics guidelines for trustworthy AI. Comput Law Rev Int. (2019) 20:97–106. doi: 10.9785/cri-2019-200402

93. Grinbaum A, Chatila R, Devillers L, Ganascia J-G, Tessier C, Dauchet M. Ethics in robotics research: CERNA mission and context. IEEE Robot Autom Mag. (2017) 24:139–45. doi: 10.1109/MRA.2016.2611586

Keywords: artificial intelligence, robotics, healthcare, personalized medicine, P5 medicine

Citation: Denecke K and Baudoin CR (2022) A Review of Artificial Intelligence and Robotics in Transformed Health Ecosystems. Front. Med. 9:795957. doi: 10.3389/fmed.2022.795957

Received: 15 October 2021; Accepted: 15 June 2022; Published: 06 July 2022.

Reviewed by:

Copyright © 2022 Denecke and Baudoin. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Kerstin Denecke, kerstin.denecke@bfh.ch

This article is part of the Research Topic

Managing Healthcare Transformation Towards P5 Medicine

Artificial Intelligence and Robotics: Impact & Open issues of automation in Workplace

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Advanced Search
  • Journal List
  • v.15(5); 2023 May
  • PMC10287569

Logo of cureus

Artificial Intelligence With Robotics in Healthcare: A Narrative Review of Its Viability in India

1 Medical School, Jawaharlal Nehru Medical College, Datta Meghe Institute of Medical Sciences, Wardha, IND

Ashish Anjankar

2 Biochemistry, Jawaharlal Nehru Medical College, Datta Meghe Institute of Medical Sciences, Wardha, IND

This short review focuses on the emerging role of artificial intelligence (AI) with robotics in the healthcare sector. It may have particular utility for India, which has limited access to healthcare providers for a large growing population and limited health resources in rural India. AI works with an amalgamation of enormous amounts of data using fast and complex algorithms. This permits the software to quickly adapt the pattern of the data characteristics. It has the possibility to collide with most of the facets of the health system which may range from discovery to prediction and deterrence. The use of AI with robotics in the healthcare sector has shown a remarkable rising trend in the past few years. Functions like assistance with surgery, streamlining hospital logistics, and conducting routine checkups are some of the tasks that may be managed with great efficiency using artificial intelligence in urban and rural hospitals across the country. AI in the healthcare sector is advantageous in terms of ensuring exclusive patient care, safe working conditions where healthcare providers are at a lower risk of getting infected, and perfectly organized operational tasks. As the healthcare segment is globally recognized as one of the most dynamic and biggest industries, it tends to expedite development through modernization and original approaches. The future of this lucrative industry is looking forward to a great revolution aiming to create intelligent machines that work and respond like human beings. The future perspective of AI and robotics in the healthcare sector encompasses the care of elderly people, drug discovery, diagnosis of deadly diseases, a boost in clinical trials, remote patient monitoring, prediction of epidemic outbreaks, etc. However, the viability of using robotics in healthcare may be questionable in terms of expenditure, skilled workforce, and the conventional mindset of people. The biggest challenge is the replication of these technologies to the smaller towns and rural areas so that these facilities may reach the larger segment of the entire population of the country. This review aims to examine the adaptability and viability of these new technologies in the Indian scenario and identify the major challenges.

Introduction and background

The status of the healthcare sector in India is far from providing universal healthcare coverage to the entire population and lags behind many developing and few least developed countries in terms of health indicators. In addition to this, there are large disparities among various states in achieving the desired health outcomes, as well as the establishment of a sound information system. The adoption of the National Health Policy of India in 2017 has largely facilitated the bridging of the gap among various stakeholders of National Healthcare through the digital corridor. The policy recognizes the significant role of technology in healthcare delivery. It advocates the setting up of a National Digital health authority (NDHA) to regulate, develop and deploy digital health within the field of care. National Institution for Transforming India (NITI) Aayog, after being authorized by the Government of India to draft a National Strategy on Artificial Intelligence (AI) emphasized five sectors that would benefit the most from AI in 2018, of which healthcare is one [ 1 ]. 

The application of AI in healthcare may be classified into four broad categories, i.e. expressive, analytical, prognostic, and prescriptive. The gap created by a lack of skilled healthcare professionals can only be bridged by enhancing the use of AI in the health sector. Usual health issues can easily be diagnosed with the help of AI, thus reducing the workload of expert health professionals as well as reducing the cost of treatment in India [ 2 ]. It is envisaged that by the year 2035, AI would be able to enhance the economy of India by adding 957 billion USD to it (Accenture, 2017) [ 2 ]. AI will also prove to be a medium for reducing the economic disparity in the country. A report of the TCS global survey (TCS, 2017) projects that the visible reduction of jobs by AI could possibly be replaced by the creation of new jobs in the upcoming AI-integrated healthcare projects [ 2 ].

As a matter of fact, the healthcare setup in India is not perfect. It is deficient in terms of the availability of doctors, nurses, medical technicians, and healthcare facilities needed to attend to the community. The number of qualified doctors is insufficient for the rapidly growing needs of the Indian healthcare system. At the same time, these doctors are concentrated in urban areas and there is a huge gap in medical personnel in rural areas as compared to urban settings. Approximately 74% of the graduate doctors in India work in urban areas which cater to only about one-fourth of the population [ 3 ]. Because of the maldistribution of resources, each doctor serves 19,000 people [ 4 ]. India will need 2.3 million doctors by 2030 to reach the minimum doctor-patient ratio of 1:1000, which the World Health Organization recommends. The early ideas by a few dozen of healthcare startups have the potential to boost the Indian healthcare systems in the future and also have the capability to reduce the burden of the healthcare system.

Recently the coronavirus disease 2019 (COVID-19) pandemic posed a great challenge to the healthcare sector creating a huge demand for equipment, medicines AI-based applications, and robotics. Many reputed hospitals all over the world have switched over to AI and robotic procedures during the COVID-19 pandemic for functions like disinfection and screening of patients and employees at the entry point. Measures such as distantly supervised surgeries, distance education, telemedicine, and video conferencing with doctors were used during the recent pandemic. The experience gained during the pandemic has primarily enhanced the adaptability for use of robotics in the healthcare sector [ 5 ].

AI's major forms of relevance in healthcare are as follows: 1. Machine learning: The use and development of complete systems that are able to learn and adapt without explicit instructions to analyze and draw inferences from data patterns; 2. Natural language processing: A specialized branch of AI focused on the interpretation and manipulation of human-generated written or spoken data; 3. Robotic process automation: An automation technology that uses software to mimic the back office tasks of human workers, such as extracting data, filling the forms, moving files, etc.

In addition, AI also supports the healthcare system in diagnosis and treatment applications, patient engagement and adherence, and administrative applications [ 6 ]. AI not only simplifies the work of doctors, nurses, and other healthcare workers but also saves an ample amount of time. Thus the adoption of digital solutions for the prevention, diagnosis, and cure of various ailments is the wise route for India to deal with the aim of providing health for all. 

Research methodology

The present study was conducted between the months of April to June 2022. Databases like Pubmed and Google Scholar were mainly used to search the literature. Databases like Scopus and Web of Science were excluded. Most of the research publications taken into account for gathering the data were from 2013 to 2022. Research papers related to the use of robotics and artificial intelligence in healthcare were thoroughly studied with special emphasis on its viability in the Indian scenario. The relevant search terms used were artificial intelligence, robotics, healthcare, India, etc. It was a difficult task to explore the required information, as meager data is available regarding the use of robotics in the Indian healthcare sector which requires enhanced attention of researchers. 

Functioning of robotics in healthcare

Working of robotics in healthcare comprises AI applications like machine learning and deep learning. AI works with an amalgamation of vast amounts of data using fast and intelligent algorithms. This permits the software to quickly adapt the pattern of the data characteristics. Execution of AI is basically program oriented and the designed program consists of the basic information as to how it has to work. All the data is fed into web platforms such as the “cloud” which have the potential to store massive data and information to be used through the internet. There are immense possibilities for development in the healthcare sector through the use of AI in the future [ 7 ].

The main objective of AI is to solve problems by gathering and analyzing the information provided by the program and sensors. Another goal is to learn and respond in uncommon situations by taking alternate ways and remembering the successful alternative to be used in similar situations. It works for creating proficient arrangements so that it can learn, think, and suggest the best possible ways to the users. They work towards accomplishing intellect in machines so that they can perform just like human beings [ 8 ].

Artificial intelligence has the possibility to collide with most of the facets of the health system which may range from discovery to forecast and deterrence. Although the rate of adherence to the new technologies is much lower than their appearance, it is needed that all healthcare professionals be trained uniformly to adopt these new technologies which include techniques like robotic process automation, natural language processing, machine learning, etc. [ 9 ]. The interplay between artificial intelligence, machine learning, and deep learning ultimately leads to the working of robotics in healthcare, which can be seen in Figure ​ Figure1 1 .

An external file that holds a picture, illustration, etc.
Object name is cureus-0015-00000039416-i01.jpg

Use of robotics in healthcare

Assistance in Surgery

The application of robotics in surgery was first imagined in 1967, but it was just a dream for about 30 years until the United States defense department set up research organizations that gradually developed the first surgical robot designed to conduct different types of tasks. Initially, these robots were used during wars on the battlefields [ 10 ]. 

Today the most rapidly growing field with the application of robotics in healthcare is surgery. It aims to enhance the capabilities of humans and overcome human limitations in the field of surgery [ 11 ]. In India, the first urologic robot named da Vinci S was set up at the All India Institute of Medical Sciences, New Delhi in 2006. This initiation was followed by an exceptional expansion of robotic surgery in the country. Till July 2019 there were 66 centers and more than 500 skilled robotic surgeons in India who had successfully performed more than 12,800 surgeries with the assistance of robots [ 12 ]. This unexpected expansion of robotic surgery shows that the future of robotic surgery in India is very bright. The introduction of the da Vinci Surgical System is one of the biggest inventions in surgery [ 8 ]. The use of high-definition computer vision enables surgeons to get detailed information about the inner condition of the patients which enhances their performance during the surgery [ 13 ].

For many years engineers and medical researchers, are constantly trying to invent ways in which robotics can be used in surgery, as it has advantages like mechanical accuracy, permanence, and the ability to work in unsafe surroundings [ 14 ]. In the past few years, surgeries assisted by robots have played a significant role in boosting the Indian healthcare system. Reports show that hundreds of robotic surgeons are positioned at different hospitals in India. Surgeries performed with the help of robotics are thought to be better in comparison to other conventional methods due to their precision, shorter recovery periods, lesser pain, and blood loss. These kinds of surgeries are also preferred because they save traveling and boarding costs [ 15 ].

Robotic surgery has successfully sorted the limitations of laparoscopic surgery which is a big leap toward surgery with minimal access. As it may be predicted that almost all surgeries will be performed with robotic assistance in the future, a realistic training approach will be required to enhance the skills of surgeons, thus reshaping the knowledge curvature of the trainees by exposure to new methods like robotic surgical simulators and robotic telementoring [ 16 ]. The role of robotics is increasingly becoming crucial in surgeon training. For example, virtual reality simulators provide realistic situations and real training experiences to the trainees. Practicing the procedures becomes easy within the virtual environment [ 17 ].

Surgical robots are widely being used in over a million surgical actions related to various departments of the healthcare sector. AI helps the surgeon to get actual warnings and suggest appropriately during the process. Profound learning data helps a lot to provide the best surgical application suitable for the patient [ 18 ]. Robotics is also helpful in facilitating experts who are often concentrated in big cities and are not available for patients residing in small towns and rural areas. 

Support to Healthcare Workers

In addition to assistance in the operating room, robotics are also useful in clinics and Outdoor Patient Departments to enhance patient care. For example, robots were used to screen suspected patients at the entrance of health facilities during the COVID-19 pandemic. The use of automation and robots can also be seen in research laboratories where they are used to conduct many manual and repetitive tasks so that scientists can focus on more deliberate tasks and move faster towards discoveries. Remedial treatment after strokes, paralysis, traumatic brain injuries, etc. can be ensured with the help of therapeutic robots. These robots can monitor the patients as they perform prescribed exercises, and measure degrees of motion in various positions in a better way compared to the human eye. Social robots can also be used to interact with patients and also encourage them [ 19 ].

Logistic Arrangements

Medical robots efficiently streamline workflows and reduce risk which makes them more feasible to be used for many purposes. For example, robots can clean and organize patients' rooms autonomously, thus lowering the risk of interpersonal contact in infectious disease wards. Thus, for cleaning purposes, human support robots (HSR) are used [ 20 ]. Enabled medicine identifier software in robots helps in the distribution of medicines to patients in hospitals. Due to this kind of support hospital staff can devote more time to giving direct care to the patients.

Advantages of using robotics in healthcare

Exclusive Patient Care

Socially assistive robots (SARs) are the result of the development of AI along with physically assisted technologies. SARs are emotionally intelligent machines that lead to exclusive patient care, as these are capable of communicating with patients through a communicative range that makes them respond emotionally. The different types of response include interaction, communication, companionship, and emotional attachment [ 12 ]. Judicious use of robotics in the healthcare system ensures excellent patient care, perfect processes in medical surroundings, and a secure atmosphere for patients and medical professionals. Chances of human error and negligence are meager with the use of automated robots in healthcare. The health and social care sector is redefined by the invention and continuous development of SARs [ 12 ].

Protected Working Conditions

The role of nurses, ward boys, receptionists, and other healthcare workers can be easily performed by robots. The different types of robots: (i) receptionist robots, (ii) medical servers, (iii) nurse robots, etc., are capable of performing the above-mentioned roles very efficiently [ 15 ]. Automated mobile robots (AMRs) are used in many health facilities such as to distribute medical supplies and linen, collect data and information about patients, and serve food and water to patients in hospitals in order to keep medical professionals safe from pathogen exposure and thus prevent the spread of infections. Therefore, these robots were vigorously used during the recent COVID-19 pandemic. According to Podpora et al., hospitality robots like Wegree and Pepper developed by SoftBank Robotics in Japan were the most used robots during the pandemic, as they were helpful to control the rate of spreading of disease [ 15 ]. During the COVID-19 pandemic, excellent work was done for pandemic preparedness, screening, contact tracing, disinfecting, and enforcing quarantine and social distancing. The Arogya Setu app which was developed by National Informatics Centre and Information Technology Ministry has proven to be a boon in the management of the COVID-19 pandemic. Social robots are used for doing strenuous work like lifting heavy beds or transferring patients, thus reducing the physical strain on healthcare workers.

 Organized Operational Tasks

Automated mobile robots (AMRs) regularize regular tasks, decrease the physical burden on health workers, and make sure that more precise procedures are used. These robots can address the shortage of staff, keep a trail of records and place orders on time. They ensure that medicines and other equipment are available as and when needed. Rooms can be quickly cleaned and sanitized and are timely ready for incoming patients by automated robots which enable health professionals to perform other important patient-related jobs. Robots can be efficiently used for making diagnoses of different diseases by using artificial intelligence. The radiologist robots, which are equipped with computational imaging capabilities, are used for making diagnoses with the help of AI through deep learning. These robots are also used for doing diagnosis procedures like MRIs and X-rays and hence are of great advantage for healthcare workers, as it protects them from harmful radiations used in these procedures [ 15 ].

Future perspective

The healthcare segment is globally recognized as one of the most dynamic and biggest industries. It aims to expedite development through modernization and original approaches. Previously this sector was reliant upon manual processes which required more time and were prone to human errors. The latest discoveries in machine learning have brought a revolution in the health sector which aims to create intelligent machines that work and respond like actual persons [ 8 ]. Although the application of AI and robotics in the healthcare sector is still in its infant stage, the future seems to be very bright in terms of acceptability and viability [ 21 ]. The fields prone to fast adaptability of AI and robotics in healthcare are as follows:

Care for Elderly People

It is predicted that the population of elderly people will double globally by 2050. Socially assistive robot technology may emerge as a solution to this growing demand. The major factors that enhance loneliness among older people living alone are ownership of the house, marital status, bad health, and lack of people to support. A study conducted by Abdi et al. has revealed that the role of social robots is crucial in healthcare of the elderly people [ 22 ]. Although many participants of the study were hesitant to accept the significance of robots taking their care, it was quite evident that they were equally apprehensive about having humans as caretakers. Many participants accepted that humanoid robots are programmed with positive human qualities and therefore are more reliable than humans. It can be said that role of robots in taking care of elderly people will prove to be a milestone in the present scenario where the number of elderly people is increasing in India due to improved health services and there is an apparent gap between the demand and supply of trained professionals in hospitals to address the surging need [ 22 ].

Mental commit robots are being developed for the therapy of elderly patients in hospitals. These robots are capable of providing a psychological, physiological, and social impact on human beings through physical contact. It was observed that the mood of elderly people improved with this input [ 23 ]. Several studies are underway to explore the possibilities of expanding the capabilities of social robots to improve their communication with human beings. The physical appearance of the robot largely influences its acceptability by elderly people. Positive results have been seen in older adults suffering from dementia when they were provided with companion animal robots. Studies demonstrate that companion animal robots of appropriate size, weight, and shape are capable of providing cognitive stimulation to elderly people having dementia [ 24 ]. Animal robot like seal PARO developed by Japan's National Institute of Advanced Industrial Science and Technology (AIST) have proven to be quite advantageous for improving the cognitive abilities and sleeping patterns of older adults [ 25 ].

Drug Discovery

One of the major areas where the use of AI can prove to be a boon is the field of drug discovery. It takes about 14 years and an average of 2.6 billion dollars for a new drug to reach the market through conventional procedures, whereas the same can be done using AI in a lesser amount of time. Recently in 2015, the outbreak of the Ebola virus in West Africa and some European countries were controlled with the application of AI which helped to discover an appropriate drug in a very meager time and prevented the outbreak from becoming a global pandemic [ 8 ]. In addition to this, it has been proven that it takes very little time to conduct clinical trials of newly discovered drugs using AI [ 8 ]. AI can also be used to recognize cardiotoxic and non-cardiotoxic drugs of the anticancer group. It is also capable of identifying probable antibiotics from a list of thousands of molecules and can be used as a medium to discover new antibiotics. These algorithms are also being used to identify the molecule with the potential to combat antimicrobial resistance leading to resistance from antibiotics. Studies are underway to explore the role of AI in combating fast-growing antibiotic resistance [ 26 ]. 

AI in Diagnosis

Reports say that about 80,000 people die every year due to wrong diagnoses of illnesses. Loads of excessive cases with partial details have led to severe mistakes in the past. As AI is resistant to these errors, it is capable of predicting and diagnosing diseases at a faster pace [ 27 ]. The use of AI is extensively explored in the detection of cancer where early detection and prediction are very important. Many companies are using AI-supported tools for diagnosing and detecting different kinds of cancer [ 28 ].

Boost in Clinical Trials

Previously the process of clinical trials was very slow and success rates were very poor. Before the year 2000, the success rate of completing the clinical trials via all three stages, for the candidates was only 13.8% [ 29 ]. The execution of AI has reduced the cycle time and has also impacted the production cost and outcome in a positive direction. The AI helps in ensuring the continuous flow of clinical trial data and also coding, storing, and managing them. Details of patients saved in the computer can be analyzed and the lessons learned can be used for future trials, thus saving time and cost [ 30 ]. It also works efficiently to observe the patients consistently and share the data across different computers. The self-learning capacity of AI enhances the accuracy of the trial and foresees the chances of dropouts [ 31 ].

Consultation in Digital Mode

The idea of digital consultation is aimed at lessening hospital visits for minor ailments, which can be cured easily at home with the guidance of a medical professional. Several apps are using AI for collecting information from patients based on a questionnaire and then facilitating the consultation with a medical practitioner [ 32 ]. In the future, digital consultation through AI will be the most viable and efficient way for the treatment of common diseases. It would also help people to find good doctors near their houses with the help of AI and internet hospitals.

Remote Patient Monitoring

The concept of remote patient monitoring has evolved very fast with the application of AI sensors and advanced predictive analysis. Apart from personal sensors and devices for monitoring health like glucometers, blood pressure monitors, etc., more advanced systems are now coming up like smart implants and smart prosthetics which are used for post-operative rehabilitation purposes to avoid complications after surgery. Smart implants help in monitoring the patient's conditions such as movements, muscle strength, etc which are important parameters for assessing the rate of recovery. Sensors implanted within the muscles or nerves are quite helpful in providing consistent information about the healing process of the patient.

In recent times many new forms are coming up for patient monitoring, such as digital pills, nanorobots, smart fabrics, etc. These monitoring tools are used for ensuring regular medication, wound management, and management of cardiac diseases by keeping track of patients' emotional, physiological, and mental status [ 33 ]. It is calculated that by 2025 the market of AI-based monitoring tools and other wearables will be widely accepted by 50% of the population in developed countries [ 34 ]. The initial data and the details during the time of discharge are collected through cell phones having Wi-Fi or Bluetooth. It is further stored in the cloud and constant monitoring is done to avoid complications and readmissions to the hospitals. The review is shared with the patient with recommendations through the internet [ 35 ].

AI in Nanotechnology Research

Recent advances have been made in the field of medicine using nanotechnology. AI tools can be successfully merged with nanotechnology to understand the various events happening in the nanosystems. This can help in designing and developing drugs by developing the nanosystems [ 36 ]. The field of nanomedicine has grown and continues to develop, numerous approaches have been experimented with successfully to provide several curative instruments in predetermined doses. This advancement has greatly helped in getting efficient results in combination therapy [ 37 ].

Prediction of an Epidemic Outbreak

One of the most amazing tasks of AI in healthcare is that it is capable of forecasting the outbreak of an epidemic. Although it cannot control or mitigate the outbreak, it can warn us beforehand to make preparations in time. It gathers, analyses and monitors the inflow of data through machine learning or social networking sites to locate the epicenter of the endemic. The calculation is done by generating an algorithm by collecting all the data from the news bulletins in all languages, airline ticketing, and reports related to plant and animal diseases [ 38 ]. On 30th December 2019, the AI engine Blue Dot found groups of uncommon pneumonia cases occurring in the wet and dry markets of Wuhan, China, and alerted the government and other stakeholders. This was the first warning signal of the novel COVID-19 pandemic [ 39 ]. Figure ​ Figure2 2 depicts the various future perspectives of AI and robotics in the field of healthcare.

An external file that holds a picture, illustration, etc.
Object name is cureus-0015-00000039416-i02.jpg

Barriers to using AI in India

Besides the innumerable benefits of employing robotics in health facilities, there are chances of errors and mechanical failures too. One mechanical breakdown can cost a precious human life. Apparently, there are several disadvantages of robots in the healthcare sector, especially in the Indian scenario.

High cost is the major limitation of introducing robotics in healthcare. In India priority is given to the large burden of contagious diseases like tuberculosis and malaria. The introduction of robotics will be an additional load on the meager budget of the healthcare sector for non-prioritized work. The cost of buying and maintaining the robots is very high. Besides this, the expenditure is very huge for setting up a unit appropriate for robotic operations. 

Another drawback of the present robotic systems used for different healthcare applications is their narrow spectrum for customization. Every patient is different and hence, customization of the healthcare service systems is the need of the hour, for both patients as well as healthcare professionals. Hence, the current healthcare system needs to be more flexible in respect of providing robotic services that can be easily acceptable as per the patient's needs [ 40 ]. The use of surgical robots is practically limited to developed countries, advanced research centers, and super specialty hospitals. Practically it is out of reach for patients from a very big section of society in India who actually need it. Expensive robotic interventions are not feasible at the small town and village hospitals where they are actually needed due to excessive workload and lack of health professionals in government-owned health facilities.

Studies related to adverse events in robotic surgery show that several undesirable events were recorded including injuries and deaths due to device fault. Robots are mechanical devices and are susceptible to breakdowns and errors. Shortage of power and lack of other infrastructural facilities do not permit access to the use of robotics universally in the Indian healthcare system. In addition to this, positions of medical professionals at the grass root level are largely vacant and the lack of a trained and skilled workforce for operating and maintaining the robotics and AI system is a challenge. The interconnection between AI and computer programming has a major impact on health and care innovation, where benevolent service delivery systems are increasingly becoming important. These mechanical systems focus on affinity, including the essence of passionate and moral relationships along with therapeutic considerations [ 12 ].

Due to its growing popularity, there is also a threat of an increase in irrational demand for robotic surgery in India where the literacy rate and awareness about health are poor. This may lead to hospitals buying robots for commercial publicity and push doctors into unethical use of robotics.

The use of robotics in healthcare also has major medico-legal problems. Like other computers, the surgical robot may also be affected by virus threats and may not adhere to the surgeon's commands, thus leading to a hazardous situation. The government has taken steps to strengthen the medical education system and the delivery of healthcare in rural areas. The introduction of robotics working with mechanical procedures in the healthcare sector in India will possibly deduct the empathy and humanitarian aspect of treatment which is highly appreciated in the Indian scenario where a big percentage of the population is illiterate with low socio-economic status.

Apart from this, there are insufficient laws to address security and privacy issues arising out of data storage through artificial intelligence in the healthcare sector in India [ 2 ]. Quality training of the huge and diversified workforce related to the use of AI and robotics in healthcare is another major challenge that needs to be addressed. More and more simulation-based trainings are required to be performed at all levels to enhance the skills of surgeons regarding minimally invasive and robotic colorectal surgery [ 18 ]. 

Conclusions

Although the introduction of robots in healthcare is in its infant stage, it offers a lot of opportunities for medical professionals, especially in the urban setting. The significant role of AI in areas like drug discovery, diagnosis of diseases, digital medical consultations, robotic surgeries, remote patient monitoring and prediction of epidemic outbreaks cannot be denied. The emerging role of robotics in care of elderly people has been recognized and is gradually being accepted by Indian society. In the present scenario it is not possible to think about implementation and monitoring of health services in the absence of AI and robotics. Many new techniques are underway in the use of robotics in the health sector which may be more cost-effective in the future. But the quality of robotic procedures needs to be controlled by establishing a stringent and continuous monitoring system. Use of AI and robotics in healthcare sector in India may prove to be a milestone in improving the present status of healthcare services. It has certainly helped in bridging the gap created by lack of skilled health professionals as well as the huge vacancies of doctors, nurses and paramedical staff. The main challenge is to reach the remote regions of the country with poor infrastructural facilities and lack of advanced technologies. The high cost of using AI and robotics in the healthcare sector stands as the major barrier in the path of reaching the disadvantaged community. Besides this, there are chances of errors and mechanical failures due to improper maintenance arrangements resulting in fatal consequences. The Indian government should support companies to invest in AI and encourage public-private partnership (PPP) in the domain of AI and health. The ethical issues must be addressed by the policy makers to enhance the use of AI and robotics in the healthcare sector. After considering the various facts and practicality, it can be said that the use of robotics in India should be expanded in a phased manner initiating with the reputed and equipped hospitals. It is viable only if used judiciously with a standardized reporting and monitoring system in place. 

The authors have declared that no competing interests exist.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • 06 March 2024

Why scientists trust AI too much — and what to do about it

A robotic arm moves through an automated AI-run laboratory

AI-run labs have arrived — such as this one in Suzhou, China. Credit: Qilai Shen/Bloomberg/Getty

Scientists of all stripes are embracing artificial intelligence (AI) — from developing ‘self-driving’ laboratories , in which robots and algorithms work together to devise and conduct experiments, to replacing human participants in social-science experiments with bots 1 .

Many downsides of AI systems have been discussed. For example, generative AI such as ChatGPT tends to make things up , or ‘hallucinate’ — and the workings of machine-learning systems are opaque .

artificial intelligence in robotics research paper

Artificial intelligence and illusions of understanding in scientific research

In a Perspective article 2 published in Nature this week, social scientists say that AI systems pose a further risk: that researchers envision such tools as possessed of superhuman abilities when it comes to objectivity, productivity and understanding complex concepts. The authors argue that this put researchers in danger of overlooking the tools’ limitations, such as the potential to narrow the focus of science or to lure users into thinking they understand a concept better than they actually do.

Scientists planning to use AI “must evaluate these risks now, while AI applications are still nascent, because they will be much more difficult to address if AI tools become deeply embedded in the research pipeline”, write co-authors Lisa Messeri, an anthropologist at Yale University in New Haven, Connecticut, and Molly Crockett, a cognitive scientist at Princeton University in New Jersey.

The peer-reviewed article is a timely and disturbing warning about what could be lost if scientists embrace AI systems without thoroughly considering such hazards. It needs to be heeded by researchers and by those who set the direction and scope of research, including funders and journal editors. There are ways to mitigate the risks. But these require that the entire scientific community views AI systems with eyes wide open.

artificial intelligence in robotics research paper

ChatGPT is a black box: how AI research can break it open

To inform their article, Messeri and Crockett examined around 100 peer-reviewed papers, preprints, conference proceedings and books, published mainly over the past five years. From these, they put together a picture of the ways in which scientists see AI systems as enhancing human capabilities.

In one ‘vision’, which they call AI as Oracle, researchers see AI tools as able to tirelessly read and digest scientific papers, and so survey the scientific literature more exhaustively than people can. In both Oracle and another vision, called AI as Arbiter, systems are perceived as evaluating scientific findings more objectively than do people, because they are less likely to cherry-pick the literature to support a desired hypothesis or to show favouritism in peer review. In a third vision, AI as Quant, AI tools seem to surpass the limits of the human mind in analysing vast and complex data sets. In the fourth, AI as Surrogate, AI tools simulate data that are too difficult or complex to obtain.

Informed by anthropology and cognitive science, Messeri and Crockett predict risks that arise from these visions. One is the illusion of explanatory depth 3 , in which people relying on another person — or, in this case, an algorithm — for knowledge have a tendency to mistake that knowledge for their own and think their understanding is deeper than it actually is.

artificial intelligence in robotics research paper

How to stop AI deepfakes from sinking society — and science

Another risk is that research becomes skewed towards studying the kinds of thing that AI systems can test — the researchers call this the illusion of exploratory breadth. For example, in social science, the vision of AI as Surrogate could encourage experiments involving human behaviours that can be simulated by an AI — and discourage those on behaviours that cannot, such as anything that requires being embodied physically.

There’s also the illusion of objectivity, in which researchers see AI systems as representing all possible viewpoints or not having a viewpoint. In fact, these tools reflect only the viewpoints found in the data they have been trained on, and are known to adopt the biases found in those data. “There’s a risk that we forget that there are certain questions we just can’t answer about human beings using AI tools,” says Crockett. The illusion of objectivity is particularly worrying given the benefits of including diverse viewpoints in research.

Avoid the traps

If you’re a scientist planning to use AI, you can reduce these dangers through a number of strategies. One is to map your proposed use to one of the visions, and consider which traps you are most likely to fall into. Another approach is to be deliberate about how you use AI. Deploying AI tools to save time on something your team already has expertise in is less risky than using them to provide expertise you just don’t have, says Crockett.

Journal editors receiving submissions in which use of AI systems has been declared need to consider the risks posed by these visions of AI, too. So should funders reviewing grant applications, and institutions that want their researchers to use AI. Journals and funders should also keep tabs on the balance of research they are publishing and paying for — and ensure that, in the face of myriad AI possibilities, their portfolios remain broad in terms of the questions asked, the methods used and the viewpoints encompassed.

All members of the scientific community must view AI use not as inevitable for any particular task, nor as a panacea, but rather as a choice with risks and benefits that must be carefully weighed. For decades, and long before AI was a reality for most people, social scientists have studied AI. Everyone — including researchers of all kinds — must now listen.

doi: https://doi.org/10.1038/d41586-024-00639-y

Grossmann, I. et al. Science 380 , 1108–1109 (2023).

Article   PubMed   Google Scholar  

Messeri, L. & Crockett, M. J. Nature 627 , 49–58 (2024).

Article   Google Scholar  

Rozenblit, L. & Keil, F. Cogn. Sci. 26 , 521–562 (2002).

Download references

Reprints and permissions

Related Articles

artificial intelligence in robotics research paper

  • Computer science
  • Information technology
  • Machine learning

Is ChatGPT making scientists hyper-productive? The highs and lows of using AI

Is ChatGPT making scientists hyper-productive? The highs and lows of using AI

News Explainer 28 FEB 24

Generative AI’s environmental costs are soaring — and mostly secret

Generative AI’s environmental costs are soaring — and mostly secret

World View 20 FEB 24

Cyberattacks on knowledge institutions are increasing: what can be done?

Cyberattacks on knowledge institutions are increasing: what can be done?

Editorial 07 FEB 24

Millions of research papers at risk of disappearing from the Internet

Millions of research papers at risk of disappearing from the Internet

News 04 MAR 24

Gender bias is more exaggerated in online images than in text

Gender bias is more exaggerated in online images than in text

News & Views 14 FEB 24

Computers make mistakes and AI will make things worse — the law must recognize that

Computers make mistakes and AI will make things worse — the law must recognize that

Editorial 23 JAN 24

Could AI-designed proteins be weaponized? Scientists lay out safety guidelines

Could AI-designed proteins be weaponized? Scientists lay out safety guidelines

News 08 MAR 24

AI-generated images and video are here: how could they shape research?

AI-generated images and video are here: how could they shape research?

News Explainer 07 MAR 24

Here’s what many digital tools for chronic pain are doing wrong

Here’s what many digital tools for chronic pain are doing wrong

World View 05 MAR 24

Faculty Recruitment, Westlake University School of Medicine

Faculty positions are open at four distinct ranks: Assistant Professor, Associate Professor, Full Professor, and Chair Professor.

Hangzhou, Zhejiang, China

Westlake University

artificial intelligence in robotics research paper

Post Doc Research Fellow

Looking for Postdoctoral Scholars in cancer epigenomics, immunology, and metabolism

Los Angeles, California

University of Southern California CCMB Dechen Lin Lab

artificial intelligence in robotics research paper

Group leader

We invite applications for a Group Leader in next-generation genome editing tools and technologies for innovative and translational applications.

Vilnius (LT)

Vilnius University Life Sciences Center

artificial intelligence in robotics research paper

Associate or Senior Editor (Social Impacts of Climate), Nature Communications

The Associate/Senior Editor role is ideal for researchers who love science but feel that a career at the bench isn’t enough.

London, New York City, Philadelphia, Jersey City or Washington DC - Hybrid Working Model

Springer Nature Ltd

artificial intelligence in robotics research paper

Faculty Positions& Postdoctoral Research Fellow, School of Optical and Electronic Information, HUST

Job Opportunities: Leading talents, young talents, overseas outstanding young scholars, postdoctoral researchers.

Wuhan, Hubei, China

School of Optical and Electronic Information, Huazhong University of Science and Technology

artificial intelligence in robotics research paper

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

3D point cloud-based place recognition: a survey

  • Open access
  • Published: 07 March 2024
  • Volume 57 , article number  83 , ( 2024 )

Cite this article

You have full access to this open access article

  • Kan Luo 1 , 2 ,
  • Hongshan Yu 2 ,
  • Xieyuanli Chen 3 ,
  • Zhengeng Yang 2 , 4 ,
  • Jingwen Wang 2 ,
  • Panfei Cheng 2 &
  • Ajmal Mian 5  

Place recognition is a fundamental topic in computer vision and robotics. It plays a crucial role in simultaneous localization and mapping (SLAM) systems to retrieve scenes from maps and identify previously visited places to correct cumulative errors. Place recognition has long been performed with images, and multiple survey papers exist that analyze image-based methods. Recently, 3D point cloud-based place recognition (3D-PCPR) has become popular due to the widespread use of LiDAR scanners in autonomous driving research. However, there is a lack of survey paper that discusses 3D-PCPR methods. To bridge the gap, we present a comprehensive survey of recent progress in 3D-PCPR. Our survey covers over 180 related works, discussing their strengths and weaknesses, and identifying open problems within this domain. We categorize mainstream approaches into feature-based, projection-based, segment-based, and multimodal-based methods and present an overview of typical datasets, evaluation metrics, performance comparisons, and applications in this field. Finally, we highlight some promising research directions for future exploration in this domain.

Avoid common mistakes on your manuscript.

1 Introduction

Where am I? Determining the place in a given reference database or map is still an ongoing challenge in computer vision, robotics, and autonomous driving (Masone and Caputo 2021 ). Place recognition is a perception-based method that can use images, 3D point clouds, and other information acquired by robots to identify previously visited places by comparing the similarity between query frame information and map database information. Place recognition can help robots improve the accuracy of loop-closure detection by providing reference places in the environment, that is, using the initial reference places provided by place recognition to lock the loop-closure area. This eliminates cumulative errors and helps to achieve high-precision and reliable simultaneous localization and mapping (SLAM). This critical task of place recognition has obtained significant research attention over the last few decades. Since GPS-based methods may not always be accurate and sometimes even completely fail in cities with high-rise buildings and bridges, numerous research efforts are dedicated to developing solutions for image-based and 3D point cloud-based place recognition.

Image-based place recognition, also known as visual place recognition (VPR), involves providing an image of a place and recognizing whether the image corresponds to a previously visited place (Lowry et al. 2015 ). Since the camera is the most commonly used sensor for this purpose, conventional image feature extraction algorithms such as SIFT (Lowe 2004 ), SURF (Bay et al. 2006 ), BRIEF (Calonder et al. 2010 ) and ORB (Rublee et al. 2011 ) have been conveniently applied to VPR. Consequently, VPR has received extensive attention from researchers, and numerous advancements have been made in the past two decades (Lowry et al. 2015 ; Zhang et al. 2021 ; Masone and Caputo 2021 ; Barros et al. 2021 ; Yin et al. 2022 ). Lowry (Lowry et al. 2015 ) defined the “place” concept in their survey and discussed how VPR solutions can implicitly or explicitly account for changes in the environment’s appearance.

With the advent of deep learning-based image classification methods, recent surveys (Zhang et al. 2021 ; Masone and Caputo 2021 ) have focused on their application in VPR. VPR methods can be classified into two categories, depending on the camera sensor modalities used: partial-observable camera and fully-observable camera (Yin et al. 2022 ). The partial-observable camera includes pin-hole, fish-eye, and stereo cameras. Most VPR methods and datasets are developed based on this modality (Zaffar et al. 2021 ). However, observing the same area under different perspectives is still a significant challenge in partial-observable camera-based VPR, which may result in significantly different observations of the same place. This problem is overcome by fully-observable camera systems which provide a 360-degree field of view (Scaramuzza 2014 ), and have the inherent advantage of viewpoint-invariant localization.

Although VPR has achieved great success, its performance is inevitably influenced by various environmental factors (Lai et al. 2022 ), such as lighting conditions, viewpoint variations, seasonal changes, weather conditions, etc. In contrast to image-based methods, 3D point cloud-based place recognition (3D-PCPR) methods utilize range sensors, such as LiDAR or RGB-D sensors, to acquire 3D geometric information about the surrounding environment. The obtained 3D information is then used to identify whether the place in the environment has been visited before. The use of 3D point clouds makes 3D-PCPR more robust to changes in lighting, viewpoint, seasons, and weather conditions (Uy and Lee 2018 ), enabling SLAM technology to adapt to these challenging scenarios. Driven by the recent advancements in point cloud sensor technology, there has been a surge of interest among researchers in exploring and advancing 3D-PCPR techniques. This has resulted in remarkable advancements in 3D-PCPR (Yin et al. 2018 ; Uy and Lee 2018 ; Liu et al. 2019 ; Du et al. 2020 ; Zhou et al. 2021 ; Komorowski 2021 ; Sun et al. 2020 ; Fan et al. 2020 ; Xiang et al. 2021 ; Hou et al. 2022 ; He et al. 2016 ; Kim and Kim 2018 ; Kim et al. 2021 ; Yin et al. 2020 , 2021 ; Jiang et al. 2020 ; Schaupp et al. 2019 ; Chen et al. 2021 ; Xu et al. 2021 ; Wang et al. 2020 ; Dubé et al. 2017 ; Dube et al. 2020 ; Vidanapathirana et al. 2021 ; Li et al. 2021 ; Zhu et al. 2020 ; Lu et al. 2020 ; Pan et al. 2021 ; Komorowski et al. 2021 ; Cramariuc et al. 2021 ; Yin et al. 2021 ; Chen et al. 2020a ; Ma et al. 2023 ).

In the face of such rapid advancements in 3D-PCPR techniques, there is a pressing need for a comprehensive and up-to-date survey that encompasses the broader scope of 3D data sources beyond just LiDAR sensors. While existing surveys (Wang et al. 2019 ; Elhousni and Huang 2020 ; Yin et al. 2022 , 2023 ) have made valuable contributions, they either focus on specific aspects of 3D-PCPR or provide limited coverage of the topic. For example,  Wang et al. ( 2019 ) only provide a summary of loop-closure detection methods with 3D data sources, but their discussion is confined to a restricted number of methods. Similarly, Elhousni (Elhousni and Huang 2020 ) provide a brief survey on 3D LiDAR-based localization methods, primarily centered around LiDAR-aided pose tracking for autonomous vehicles. Yin et al. ( 2022 ) conduct a general place recognition survey with a focus on real-world autonomy, offering limited coverage of 3D-PCPR methods. Even the recent survey (Yin et al. 2023 ) does not cover the topic comprehensively and is restricted to LiDAR-based global localization topics.

Considering the existing literature, it becomes evident that a comprehensive survey specifically dedicated to 3D-PCPR methods, encompassing a broader range of 3D data sources, while also giving insights into the limitations of existing methods and highlighting promising future directions to explore in this domain is lacking. This survey covers the gap and serves as an invaluable resource for researchers, enabling them to grasp the current state-of-the-art, identify research gaps, and drive further advancements in the rapidly evolving field of 3D-PCPR.

 Our survey covers more than 180 important works related to place recognition. We mainly considered papers published in well-known journals or conferences in the fields of robotics, computer vision and artificial intelligence, such as IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), International Journal of Computer Vision (IJCV), IEEE Transactions on Robotics (T-RO), IEEE International Conference on Intelligent Robots and Systems (IROS), IEEE International Conference on Robotics and Automation (ICRA), and IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Besides that, we also considered some latest preprint papers on arXiv that already gained much attention. Based on an extensive and comprehensive literature survey, we propose a novel categorization scheme that classifies 3D-PCPR methods into four distinct categories. By categorizing existing methods and examining each category in detail, our survey provides a deeper understanding of the current state of the field and identifies promising avenues for future research.

figure 1

Four categories of 3D-PCPR methods namely feature-based, projection-based, segment-based, and multimodal-based, as well as future research trends

The rest of this article is organized as follows: Sect.  2 briefly introduces the background of 3D point clouds, including the acquisition, development, and applications of 3D point clouds, etc. Sections  3 to 6 respectively introduce the four categories of 3D-PCPR methods, namely feature-based, projection-based, segment-based, and multimodal-based methods, as shown in Fig.  1 . Section  7 describes the main datasets, evaluation metrics and performance comparisons commonly used in 3D-PCPR. Section  8 introduces related applications of 3D-PCPR. Particularly, we discuss the future promising research directions of 3D-PCPR technology. Finally, we conclude our survey in Sect.  9 .

2 Acquisition of 3D point cloud

A point cloud is a set of geometric points situated on the surfaces of 3D objects in Euclidean space. These points are typically captured using 3D sensors like LiDAR, laser scanners, structured light scanners, or Time-of-Flight (ToF) RGB-D cameras. A point \(\textbf{p}_i\) in a point cloud \(\mathcal {P}\) can be represented by its x , y , and z Cartesian coordinates, denoted as \(\mathcal {P}=\{\textbf{p}_i = (x_i, y_i, z_i)\}_{i=1}^N\) , where N is the number of points in \(\mathcal {P}\) . In this section, we provide a brief overview of the acquisition and applications of 3D point clouds.

The acquisition of a 3D point cloud involves measuring the depth or range of obstacles (from the sensor) and then calculating the 3D coordinates and attributes of points in Euclidean space (Xu and Stilla 2021 ). Various sensors are available to acquire 3D point clouds. These sensors are typically grouped as being either active or passive (Lillesand et al. 2015 ) as shown in Fig.  2 .

Active sensors include structured light technology (e.g. Kinect-1 Zennaro 2014 ; Lun and Zhao 2015 ) whereby an infrared light pattern is projected onto the scene and then sensed by a camera to measure the 3D distance using the principle of triangulation. Another type of sensor is laser scanners (Wang et al. 2020 ; Vosselman and Maas 2010 ) that project a single light stripe and scan it over the scene to generate dense point clouds, using the triangulation principle. ToF principle is used by LiDARs (Wandinger 2005 ) that transmits and scans multiple laser beams to generate sparse point clouds of a large scene. Radar (Knott and Skolnik 2008 ) sensors also use the ToF principle but they transmit electromagnetic radiation and then measure the reflections returned from the target object. Another class of ToF sensors transmit a modulated single light (usually IR) and measure the phase difference of the reflected light from various points in the scene to measure the time and hence their distance from the sensor. Kinect-2 (Fankhauser et al. 2015 ; Wasenmüller and Stricker 2016 ) and PrimeSense (Breuer et al. 2014 ; Kuan et al. 2019 ) sensors use this technology. Only lasers based and radar sensors work in outdoor environments due to the strong sunlight. Structured light and ToF sensors are designed for indoor use only. These sensors measure the distances to obstacles to compute their 3D coordinates, resulting in a point cloud of xyz coordinates.

Passive sensors, such as photogrammetry (Lillesand et al. 2015 ) and stereo (Beltran and Basañez 2014 ) cameras, capture 3D data of the environments without actively emitting any energy. These sensors typically measure the geometry structure of environments using multiple observations, estimate the depth of objects within the scene through photogrammetric approaches such as multi-view geometry, and finally generate point clouds from the 3D reconstructions. In addition to 3D coordinates, a point cloud can also contain other information such as intensity and color, and normal vectors can be calculated using the local neighborhood geometry (Chen et al. 2020a , b ). We show more details of different sensors in Table  1 .

figure 2

Common 3D point cloud acquisition sensor types along with sample acquired point clouds

Early 2D laser scanners (Thrun 2002 ), also known as single-line LiDAR, have a single-line beam emitted by the laser source to generate low-resolution 2D planar scans. Due to its simple structure and high reliability, it has been widely studied and used in real-world robots (Hess et al. 2016 ; Kuang et al. 2023 ). However, 2D laser scanners can only perform plane scanning and generate low-resolution point cloud information, limiting its use for place recognition (Zhang and Ghosh 2000 ; Olson 2009 ; Zimmerman et al. 2023 ). The development of sensor technology has promoted point cloud sensing from 2D to 3D. Compared to 2D scans, 3D point clouds present more information for robots to better understand their surroundings. Therefore, 3D sensors have developed rapidly in the past three decades.

3D point cloud data finds applications in many fields (Guo et al. 2020 ), including computer vision, autonomous driving, robotics, remote sensing, medical treatment, cultural relic reconstruction, etc. The rest of this article will mainly discuss research and applications of 3D point clouds in place recognition.

3 Feature based methods

We show a chronological overview of the four main categories of 3D-PCPR methods in Fig.  3 . As depicted, feature-based approaches are fundamental methods for 3D-PCPR. The main idea of these methods is to extract features from the 3D point clouds and then match such features to perform subsequent place recognition. We divide the feature-based methods into two categories: hand-crafted feature-based methods and deep learning feature-based methods.

3.1 Hand-crafted feature-based methods

Hand-crafted feature-based methods have been extensively researched for several decades, resulting in significant advancements in 3D-PCPR. These advancements have played a crucial role in driving the continuous development of this field. Magnusson et al. ( 2009a , 2009b ) conducted early research inspired by NDT (Biber and Straßer 2003 ) and proposed a loop detection approach based on surface shape and orientation histograms using only 3D point cloud data. The main idea behind their method is to calculate the similarity of two scans from the histogram of the NDT descriptors and achieve good recall rates at low false negative (For more detailed description about recall and false negative, please refer to Sect.  7.2 ) in environments with different characteristics.

figure 3

Chronological overview of the four main categories of 3D-PCPR methods. Each category of representative methods is marked with different colors

Steder et al. ( 2010 ) presented a robust approach for 3D place recognition using range data. Their method uses interest feature points and scores candidate transformations. Although this method produces accurate relative pose transformations between two scans and has high recognition rates, it cannot achieve real-time performance and orientation invariance. To overcome these shortcomings, they later proposed another method (Steder et al. 2011 ) using a combination of a bag-of-words and a point-feature-based estimation of relative poses, which is more efficient and rotational invariant compared to the former approach.

A loop closure detection method using small-sized signatures from 3D LiDAR data was presented by Muhammad and Lacroix ( 2011 ). This method extracts histogram-based signatures from 3D LiDAR data and uses them for loop closure detection. These features are based on histograms of local surface normals for 3D point clouds. To design a 3D-PCPR method recognizing complex indoor scenes and tackling moving objects’ disturbance effectively, Zhuang et al. ( 2012 ) proposed an approach that can extract and match the Speed Up Robust Features (SURF) (Bay et al. 2006 ) from bearing-angle images generated by a self-built rotating 3D laser scanner. Using both the local SURF features and the global spatial features, their place recognition framework has shown validity and robustness in dynamic indoor environments.

Collier et al. ( 2012 ) presented a LiDAR-based place recognition system that can extract highly descriptive features called the variable dimensional local shape descriptors from 3D point cloud data to encode environmental features. Their system can run on a military research vehicle equipped with a highly accurate, 360-degree field of view LiDAR and detect loops regardless of the sensor’s orientation.

Bosse and Zlot ( 2013 ) presented a noteworthy method for place recognition in large 3D point cloud datasets, utilizing keypoint voting. This approach involves extracting keypoints directly from the 3D point cloud and describing them using handcrafted 3D gestalt descriptors. The keypoints then vote to their nearest points, and based on the resulting scores, loops are detected.

Röhling et al. ( 2015 ) proposed a fast histogram-based similarity measure for detecting loop closures in 3D point cloud data. Their method can avoid computationally expensive features and compute histograms from simple global statistics of the LiDAR scans. Hence, high precision and recall rates are achieved in a computationally efficient manner.

Another fast, complete, 3D point cloud-based loop closure for LiDAR odometry and mapping method was proposed by  Lin and Zhang ( 2019 ). They compute 2D histograms of local map patches and then use the normalized cross-correlation of the 2D histograms as the similarity metric. This method selects some keyframes from the LiDAR input and the offline map, and can quickly evaluate the similarity between keyframes to form a relatively simple and practical system for place recognition. However, this method is mainly based on a small field of view and does not propose a very effective calculation method for relative pose estimation between the keyframes.

3.2 Deep learning feature-based methods

The rapid advancement of technological innovations along with the proliferation of big data and the exponential enhancement of computational capabilities, has significantly propelled the widespread adoption of deep learning techniques (LeCun et al. 2015 ) in a myriad of domains. In this section, we discuss some deep learning feature-based methods in 3D-PCPR. These methods mainly learn from the raw 3D point clouds to extract features that are useful for performing subsequent place recognition.

In the absence of any prior knowledge, the task of 3D point cloud-based global localization poses a formidable challenge. To tackle this issue, Yin et al. ( 2018 ) proposed a semi-handcrafted approach that leverages siamese LocNets for representation learning from LiDAR point clouds. By employing LocNet representations in the Euclidean space, a crucial global prior map can be constructed, which helps in enhancing robustness. Nonetheless, achieving global localization in dynamic environments remains a daunting task even with their method. The inherent disorderliness of point clouds complicates the extraction of local features, making the encoding of these features into global descriptors for addressing the 3D-PCPR problem more challenging.

 Uy and Lee ( 2018 ) proposed a deep learning network called PointNetVLAD (see Fig.  4 ) to extract local features using PointNet (Qi et al. 2017 ) which are then passed to NetVLAD (Arandjelovic et al. 2016 ) to generate the final discriminative global descriptor. This work presents the first end-to-end trainable network for extracting global descriptors directly from raw 3D point clouds. However, PointNetVLAD overlooks the spatial distribution of similar local features, which is of significant importance in capturing the static structural information in expansive dynamic environments. To address this limitation, Liu et al. ( 2019 ) presented LPD-Net, which can extract more discriminative and generalizable global descriptors by employing an adaptive local feature extraction module and a graph-based neighborhood aggregation module for extracting local structures and revealing the spatial distribution of local features within large-scale point clouds.

For implementing practical vehicle platforms that possess limited computing and storage resources, the author proposed SeqLPD (Liu et al. 2019 ), a lightweight variant derived from LPD-Net. SeqLPD aims to tackle the place recognition problem by integrating deep learning-based point cloud description and a coarse-to-fine sequence matching strategy, resulting in notable improvements in loop closure detection. Despite the success of LPD-Net, it is still resource-intensive. In an effort to enhance performance while mitigating resource demands,  Hui et al. ( 2022 ) proposed EPC-Net, an efficient point cloud learning network specifically designed for 3D-PCPR tasks. EPC-Net achieves commendable performance while effectively reducing memory requirement and inference time.

figure 4

A lightweight architecture of PointNetVLAD (from Uy and Lee ( 2018 )). N denotes the number of input 3D points, D denotes the dimension of the learned point features, and f denotes the final global descriptor vector

 Yin et al. ( 2018 ) proposed an end-to-end framework that utilizes low-dimensional feature matching instead of geometry matching for LiDAR-based long-term place recognition. This approach combines dynamic octree mapping and place feature inference modules and the feature learning is performed in a fully unsupervised manner. In the aggregation of local features into a global descriptor, it is important to reweigh the contributions of each local point feature, thereby allocating greater attention to regions that are more relevant to the task. Drawing inspiration from this concept,  Zhang and Xiao ( 2019 ) proposed the Point Contextual Attention Network (PCAN), which leverages point context to predict the significance of individual local point features, offering an efficient means to encode the local features into a discriminative global descriptor.

Indoor place recognition represents an important yet relatively less explored area. The SpoxelNet (Chang et al. 2020 ) neural network architecture was proposed as a 3D-PCPR method tailored for crowded indoor spaces. SpoxelNet effectively encodes input voxels into global descriptor vectors. The method involves voxelizing point clouds in spherical coordinates and defining voxel occupancy through ternary values. SpoxelNet has been evaluated on diverse indoor datasets, yielding promising results for the task of place recognition.

 Du et al. ( 2020 ) proposed DH3D, the first approach that unifies global place recognition and local 6DoF pose refinement. DH3D incorporates a deep hierarchical network and utilizes NetVLAD to generate more discriminative global descriptors. However, the obtained descriptors lack rotational invariance and often exhibit shortcomings in reverse revisits. Zhou et al. ( 2021 ) introduced NDT-Transformer, a real-time and large-scale 3D-PCPR method. Taking inspiration from the success of the NDT (Biber and Straßer 2003 ) and Transformer (Vaswani et al. 2017 ) models, NDT-Transformer condenses raw point clouds through 3D NDT representation and subsequently learns global descriptors through a novel NDT-Transformer network. Notably, this approach obviates the need for handcrafted features and can serve as a crucial module in real-time SLAM systems.

Acquiring high-quality point cloud data along with ground truth registration in real-world scenarios for training place recognition models is time-consuming and resource-intensive.  Qiao et al. ( 2021 ) address this problem by proposing a novel registration-aided 3D domain adaptation network named VLPD-Net (Virtual Large-Scale Point Cloud Descriptor Network) for 3D-PCPR. Recognizing the importance of the neighborhood context of each point, the method takes into account the tactical contributions of different local features, which may vary unevenly.  Xia et al. ( 2021 ) adopted a point orientation encoding module to capture neighborhood information from various orientations. Additionally, a self-attention unit is employed to encode the spatial relationships of local features for weighted aggregation. This end-to-end architecture enables one-stage training, generating a discriminative and compact global descriptor directly from a given 3D point cloud by exploring the relationships between raw 3D point clouds and the varying importance of local features to perform large-scale 3D-PCPR tasks.

Existing PointNet-like methods primarily process unordered point clouds and may not adequately capture local geometric structures. Consequently, a large-scale 3D-PCPR method named MinkLoc3D was introduced by Komorowski ( 2021 ). MinkLoc3D leverages a sparse voxelized point cloud representation and sparse 3D convolutions to compute a discriminative 3D point cloud descriptor, as depicted in Fig.  5 . The efficacy of this method can be attributed to two key factors. Firstly, the sparse convolutional architecture effectively generates informative local features. Secondly, enhancements in the training process facilitate efficient and effective training by accommodating larger batch sizes. However, MinkLoc3D solely utilizes the geometry of 3D point clouds for place recognition. To address this limitation, the author proposed MinkLoc3D-SI (Żywanowski et al. 2021 ), which integrates both spherical representation and measurement intensities. MinkLoc3D-SI improves performance when a single 3D LiDAR scan is used. Experimental results demonstrate the superior performance of MinkLoc3D-SI on single scans from 3D LiDAR and its excellent generalization ability.

Komorowski ( 2022 ) has recently proposed an improved 3D-PCPR method that incorporates a ranking-based loss and large batch training technique. This method employs a simple and efficient 3D convolutional feature extraction process to enhance channel attention blocks. The network architecture is an advancement over the MinkLoc3D Komorowski ( 2021 ) point cloud descriptor and surpasses the performance of most recent methods with more complex architectures.

However, many existing algorithms tend to overlook long-range contextual properties and exhibit large model sizes, thereby limiting their widespread applicability. To overcome these challenges, Fan et al. ( 2022 ) introduced SVT-Net, which is a lightweight sparse voxel transformer designed for large-scale 3D-PCPR tasks. To mitigate issues related to moving objects, size disparities among objects, and long-range contextual information,  Xu et al. ( 2021 ) proposed TransLoc3D, another large-scale 3D-PCPR method that employs adaptive receptive fields. TransLoc3D achieves impressive results across multiple datasets, including Oxford RobotCar (Maddern et al. 2017 ), USRABD dataset(including a university sector (U.S.), a residential area (R.A.), and a business district (B.D.)) (Uy and Lee 2018 ).

figure 5

An overview of MinkLoc3D architecture (from Komorowski 2021 ). Raw 3D point cloud is quantized into a sparse, single-channel 3D tensor. Based on the extracted local feature, a global point cloud descriptor is generated using Generalized-mean (GeM) pooling

Sun et al. ( 2020 ) proposed DAGC that leverages dual attention and graph convolution techniques to perform 3D-PCPR. The dual attention and residual graph convolution network modules contribute to the extraction of discriminative and generalizable features for describing a point cloud. By simultaneously considering the importance of points and features, DAGC utilizes the point relationships to extract local features, which are subsequently passed through a feature fusion block to generate global descriptors by a NetVLAD (Arandjelovic et al. 2016 ) module. Whereas DAGC effectively captures the relationship between points and the discriminative power of different features in generating global descriptors, it does not account for the spatial relationships between local features nor the long-range dependence of different features.

To take full advantage of the contextual semantic features of the scene and mitigate the influence of dynamic noise, such as moving cars and pedestrians,  Fan et al. ( 2020 ) proposed SRNet, a 3D scene recognition network using static graphs and dense semantic fusion. SRNet comprises Static Graph Convolution, a Spatial Attention Module, and Dense Semantic Fusion. These modules help the network learn a deep understanding of the contextual scene semantics. After obtaining naive embedded features, the final global descriptors used for recognition are aggregated by an additional NetVLAD module. Benefiting from strong local feature learning, contextual semantics understanding, and dynamic noise avoidance capabilities, combined with network flexibility, SRNet can be easily integrated into other point cloud architectures for tasks beyond place recognition.

Most existing algorithms struggle when dealing with reverse loops.  Cattaneo et al. ( 2022 ) proposed LCDNet for simultaneous deep loop closure detection and point cloud registration in LiDAR-based SLAM. LCDNet leverages a shared encoder and two heads for generating global descriptors and estimating the relative pose, enabling simultaneous identification of previously visited places and estimation of the 6-DoF relative transformation between the current scan and the map. Considering the sparsity of point clouds, Hui et al. ( 2021 ) proposed PPT-Net, a pyramid point cloud transformer network designed for large-scale place recognition. PPT-Net extracts discriminative local features to form a global descriptor. It incorporates a pyramid point transformer module, which adaptively learns the spatial relationships among different KNN neighboring points, and a pyramid VLAD module, which aggregates the multi-scale feature maps of point clouds into comprehensive global descriptors.

 Habich et al. ( 2021 ) proposed an extension of graph-based SLAM to exploit the potential of 3D laser scans for loop detection. Their method extracts global features from the point cloud and then uses a trained detector to determine the presence of a loop. Their algorithm is considered an extension of the widely used state-of-the-art RTAB-Map (Labbé and Michaud 2019 ) library. In the domain of indoor LiDAR mobile mapping, Xiang et al. ( 2021 ) proposed FastLCD, a compact and efficient method for precise loop closure detection using comprehensive descriptors such as statistics, geometry, planes, range histogram, and intensity histogram. These features are invariant to rotation and are encoded to uniquely describe each point cloud scan, making FastLCD a feasible and reliable loop closure detection algorithm.

Hou et al. ( 2022 ) introduced a novel Hierarchical Transformer for Place Recognition (HiTPR), specifically designed to address the challenges of LiDAR-based large-scale place recognition such as robustness and real-time performance. HiTPR avoids the use of the memory-intensive and inefficient approach of global information aggregation through NetVLAD (Arandjelovic et al. 2016 ). HiTPR comprises four main components namely, point cell generation, short-range transformer, long-range transformer, and global descriptor aggregation, enabling HiTPR to achieve superior performance in terms of average recall rate.

Many existing methods fail to produce consistent descriptors for the same scene under different viewpoints, making rotation invariance crucial. Therefore, Li et al. ( 2022 ) designed an efficient 3D LiDAR-based place recognition using a rotation invariant neural network that exploits the fact that autonomous robots generally rotate only in the yaw direction. Their method combines semantic and geometric features to improve descriptiveness and employs a rotation-invariant siamese neural network to predict the similarity of descriptor pairs. To acquire more repeatable global descriptors and improve performance in 3D place recognition, Vidanapathirana et al. ( 2022 ) presented an end-to-end trainable locally guided global descriptor learning network (LoGG3D-Net) for 3D-PCPR.  To tackle both tasks of loop closing and relocalization,  Shi et al. ( 2023 ) proposed a novel multi-head network namely LCR-Net. Based on the input 3D point clouds, the method utilizes a novel feature extraction and pose-aware attention mechanism to accurately estimate the similarities and 6-DoF poses between pairs of LiDAR scans.

Compression techniques have become popular to store large-scale point cloud maps (Golla and Klein 2015 ; Huang and Liu 2019 ; Wiesmann et al. 2021 ). To address the problem of place recognition in a compressed point cloud map, Wiesmann et al. ( 2022 ) presented Retriever, a novel deep neural network architecture that directly operates on compressed feature representation, then uses a NetVLAD (Arandjelovic et al. 2016 ) layer to aggregate local features with an attention mechanism between local features and a latent code.

3.3 Summary

In summary, feature-based methods are the most common methods in 3D-PCPR. Their main idea is to directly use hand-crafted or deep learning-based methods on 3D point clouds to extract local or global features, then similarity matching is performed on the extracted point cloud query features and 3D reference map features, and finally achieve the task of subsequent place recognition. The hand-crafted place recognition methods have good interpretability and high computational efficiency. Representative algorithms include AL3D (Magnusson et al. 2009a ), Robust PR (Steder et al. 2010 ), Keypoint Voting PR (Bosse and Zlot 2013 ), etc. However, they are unable to extract all relevant features from 3D point clouds. On the other hand, deep learning features-based place recognition methods have gained more popularity as they can automatically learn and characterize relevant features from the raw 3D point clouds. Deep learning feature-based place recognition methods can be divided into local feature-based methods and global descriptor-based methods. Local feature-based methods, such as LPD-Net (Liu et al. 2019 ), PCAN (Zhang and Xiao 2019 ), PPT-Net (Hui et al. 2021 ), MinkLoc3D (Komorowski 2021 ), etc., first extract local features based on pointnet-like (Qi et al. 2017 ) methods, and then aggregate them into global descriptors through VLAD-like (Arandjelovic et al. 2016 ) methods for subsequent place recognition. Global descriptor-based methods generally use transformer-like (Vaswani et al. 2017 ) methods to directly represent global features from the raw 3D point clouds, such as NDT-Transformer (Zhou et al. 2021 ), LCDNet (Cattaneo et al. 2022 ), RINet (Li et al. 2022 ), etc. Overall, deep learning feature-based place recognition methods have achieved advanced performance, such as HiTPR (Hou et al. 2022 ), MinkLoc3Dv2 (Żywanowski et al. 2021 ), etc. However, the point cloud features they extract are not easy to interpret (Minh et al. 2022 ) and require a large amount of training data and powerful computing hardware (Han et al. 2023 ). Nonetheless, feature-based place recognition methods, especially end-to-end deep learning based 3D-PCPR methods, will remain the preferred research direction in the foreseeable future due to their state-of-the-art performance.

4 Projection based methods

Projection-based methods are another category of methods in 3D-PCPR where the main idea is to project the raw 3D point clouds to 2D planes, 2D images, or bird-eye view (BEV) information. These projections are then subsequently processed to achieve place recognition. In this section, we discuss the projection-based methods by dividing them into three categories: 2D planes based methods, 2D image-based methods, and bird-eye view-based methods.

4.1 2D Planes based

The 2D planes based 3D-PCPR methods involve an initial step of projecting the raw 3D point cloud onto a 2D representation which is then utilized for subsequent place recognition tasks. A pioneering approach in this domain is M2DP (He et al. 2016 ), which projects a 3D point cloud onto a sequence of 2D planes that capture various viewpoints of the cloud. By characterizing the point projections, M2DP extracts multiple density distributions or signatures from a single point cloud.

Scan Context (Kim and Kim 2018 ) is an egocentric spatial descriptor, which summarizes a place as a plane matrix for 3D-PCPR and offers robustness to structural changes such as dynamic objects and seasonal changes, as shown in Fig.  6 . It projects the maximum height of the point cloud in different bins to generate a 2D global descriptor. However, using only the maximum height information does not offer much invariance in the lateral direction. It also uses brute-force search which is highly inefficient. Wang et al. ( 2020 ) proposed another method called Intensity Scan Context (ISC) which codes intensity information and geometry relations for loop closure detection. It explores the intensity information properties for place recognition and to reduce the computational cost, it performs a two-stage hierarchical re-identification process including a binary-operation-based fast geometric relation retrieval and an intensity structure re-identification.

Cai and Yin ( 2021 ) introduced a robust global descriptor, known as Weighted Scan Context (WSC), for 3D-PCPR by leveraging the enhanced information provided by intensity data in comparison to sparse height features. WSC employs the intensity information of the points to sparsify geometric features in the height direction. Furthermore, it utilizes a hybrid distance metric that combines cosine distance and Euclidean distance to quantify the similarity between two scenes. This integration of distance metrics reduces the sensitivity typically associated with cosine distance and enhances the overall performance of the method. Due to the absence of a unified reference frame and the usage of a simplified vector instead of a complete matrix in Scan Context, Shi et al. ( 2021 ) introduced a robust global place recognition method by enhancing the Scan Context approach. Their improved Scan Context employs a three-stage matching algorithm, which effectively enhances the performance of place recognition. To further advance the concept of a rotation invariance introduced by Scan Context, Kim et al. ( 2021 ) proposed Scan Context++ (SC++) which is capable of generating a versatile descriptor that is resilient to rotation and translation. SC++ extends the capabilities of its predecessor by incorporating two sub-descriptors, enabling topological place retrieval, and facilitating 1-DOF semi-metric localization thereby bridging the gap between topological place retrieval and metric localization. An additional benefit is that SC-like methods can easily integrate into existing LiDAR-based SLAM systems (Kim et al. 2022 ).

figure 6

A brief illustration of Scan Context algorithm (from Kim and Kim ( 2018 )). A raw 3D point cloud is encoded into scan context and a 1-dimensional vector is used for retrieving the nearest candidates. For loop detection, the retrieved candidates are compared to the query Scan Context

To recognize places by analyzing the projection of observed points along the gravity direction, Sánchez-Belenguer et al. ( 2020 ) proposed a robust global matching technique specifically designed for 3D mapping applications. By leveraging a global projection direction, the method introduces a matching algorithm that effectively compares places in a two-dimensional (2D) space and retrieves the corresponding relative three-dimensional (3D) transformations between maps.

 Yin et al. ( 2020 ) proposed SeqSphereVLAD for orientation-invariant 3D-PCPR. This method can recognize places from previous trajectories regardless of variations in viewpoint and temporal observation differences. SeqSphereVLAD achieves this by projecting a 3D point cloud onto a spherical view to extract place descriptors which are then utilized in a coarse-to-fine sequence matching module designed to enhance the accuracy of 3D-PCPR. To achieve viewpoint-invariant 3D-PCPR while simultaneously balancing matching accuracy and search efficiency, Yin et al. ( 2021 ) introduced a fast sequence-matching enhanced viewpoint-invariant 3D-PCPR framework. This framework comprises two key modules: a spherical harmonic place descriptor extraction (SphereVLAD) and fast sequence matching. By leveraging this framework, the authors aimed to emulate human-like place recognition abilities by employing a novel 3D feature learning method. The SphereVLAD module is responsible for extracting unique place descriptors using spherical harmonics, while the fast sequence matching module focuses on efficient and accurate sequence matching.

 Jiang et al. ( 2020 ) introduced LiPMatch for 3D LiDAR point cloud-based loop-closure detection and loop-closure correction. LiPMatch formulates each keyframe as a fully connected graph, where nodes represent planes. The method constructs a plane graph for each keyframe and leverages the geometric properties of the planes and their relative positions to detect loop closures. By identifying matched planes between keyframes, LiPMatch enhances the accuracy and robustness of SLAM algorithms, thereby improving the overall performance of the system.

To overcome the limitations of existing methods in terms of real-time loop recognition and full 6-DoF loop pose correction,  Cui et al. ( 2023 ) introduced BoW3D, a novel bag-of-words approach for 3D LiDAR SLAM. BoW3D addresses these challenges by leveraging the LinK3D (Cui et al. 2024 ), which is an efficient, pose-invariant, and accurate point-to-point matching method specifically designed for 3D LiDAR data. By building a bag-of-words representation based on LinK3D, BoW3D efficiently recognizes revisited loop locations while also enabling real-time correction of the full 6-DoF loop pose.

4.2 2D Images based

The 2D images based methods in 3D-PCPR first project the raw 3D point clouds to 2D images, and then use the 2D images for place recognition. A notable method in this category was proposed by  Cao et al. ( 2018 ), which accomplishes loop closure detection for SLAM. This method adopts an image model named Bearing Angle (BA) to convert 3D laser point clouds to 2D images. It then utilizes the ORB features (Rublee et al. 2011 ) extracted from BA images to perform scene matching and uses a visual Bag of Words (BoW) approach (Angeli et al. 2008 ; Gálvez-López and Tardos 2012 ) to improve the search efficiency. However, the performance of this method in large-scale unstructured environments has not been fully verified.

Cop et al. ( 2018 ) introduced DELIGHT, a highly efficient global localization descriptor that solely relies on LiDAR data without requiring robot motion information. This pioneering work leverages the LiDAR intensity image data and encodes it into a unique descriptor comprising a collection of histograms. DELIGHT stands out as the first solution that offers a near real-time approach to global localization by utilizing global intensity descriptors.  Guo et al. ( 2019 ) introduced ISHOT, a local descriptor designed to enhance robust place recognition by integrating the advantages of both geometry and appearance using LiDAR intensity image data. ISHOT combines geometric and texture information obtained from calibrated LiDAR intensity images to form a comprehensive local descriptor. The method then employs a probabilistic keypoint voting strategy for place recognition.

 Kim et al. ( 2019 ) proposed a long-term LiDAR localization technique that leverages the structural information of an environment by projecting a raw point cloud to an image. They present a novel Scan Context Image descriptor for point clouds and an end-to-end CNN that effectively summarizes the unstructured point cloud into a structured form for robust long-term place recognition. Unlike existing methods such as M2DP (He et al. 2016 ) and PointNetVLAD like (Uy and Lee 2018 ) approaches, which rely on pairwise comparisons between a query and scans in a database, this method offers faster processing. Moreover, experimental results demonstrate consistent and robust year-round localization performance, even when trained in just a single day.

Schaupp et al. ( 2019 ) proposed OREOS, oriented place recognition in outdoor scenarios using LiDAR scans. Their approach involves several stages: firstly, the current raw 3D LiDAR point cloud is projected onto a 2D range image. Next, a CNN is employed to extract compact descriptors, followed by yaw estimation and local point cloud registration. To enhance performance, retrieve nearby place candidates, and estimate yaw discrepancy, the method utilizes a triplet loss function during training and incorporates a hard negative mining strategy. Cao et al. ( 2020 ) proposed a season-invariant and viewpoint-tolerant 3D-PCPR method to achieve long-term robust localization. To achieve robust place recognition across seasons, the method designs a compact cylindrical image model to project 3D point clouds to 2D images representing the prominent geometric relationships of scenes. The structure of the algorithm mainly consists of two parts: a novel cylindrical image representation of a 3D point cloud and an efficient descriptor based on contexts and layouts of the scenes. Additionally, a sequence-based temporal consistency check is introduced to handle similar scenes and local structural changes.

OverlapNet (Chen et al. 2021 ) is a loop closing method for 3D LiDAR-based SLAM. It exploits the different cues generated from the point cloud such as range, normal, and intensity images, and semantic data to provide overlap and relative yaw angle estimates between pairs of 3D scans. The 3D point cloud is first converted to a 2D image, and the rotation information is represented as translation of the image. This translation is estimated by a differentiable phase correlation. OverlapNet estimates an image overlap generalized to range images and provides a relative yaw angle estimate between pairs of scans. Ma et al.proposed OverlapTransformer (Ma et al. 2022 ), an efficient yaw-angle-invariant transformer network for LiDAR-based place recognition. OverlapTransformer has three modules: Range Image Encoder, Attentional Feature Transformer, and VLAD. It is a lightweight transformer network that leverages range images projected from raw 3D point clouds to achieve faster online inference. In follow-up works, Ma et al.process sequential LiDAR scans with a transformer network, named SeqOT (Ma et al. 2023 ), and multiple different views (depth view and BEV) with another transformer network, named CVTNet (Ma et al. 2023 ), for more robust and reliable long-term place recognition.

Wang et al. ( 2020 ) proposed a global 3D LiDAR point cloud descriptor to improve the speed and accuracy of loop-closure detection. Their method projects a point cloud to a binary signature image after a couple of Gabor-filtering and thresholding operations on the LiDAR-Iris image representation. Point cloud pairs are matched by calculating the Hamming distance of their corresponding binary signature images. This work is somewhat similar to Scan Context (Kim and Kim 2018 ) but differs in three main ways: Firstly, it encodes the height information as the pixel intensity of the LiDAR-Iris image. Secondly, it extracts a binary feature map from the LiDAR-Iris image for loop-closure detection. Thirdly, the loop-closure detection is rotation-invariant with respect to the LiDAR’s pose.

Leveraging high-resolution 3D LiDAR point clouds, Shan et al. ( 2021 ) proposed a method for robust, real-time place recognition. Their method extracts ORB features from the intensity images of point clouds and encodes them into bag-of-words vectors. Candidate frames are found by matching visual feature descriptors, and outliers are rejected by applying PnP and RANSAC. This method is specifically designed for LiDAR imaging and is the first to use projected LiDAR intensity images for place recognition.  Di Giammarino et al. ( 2021 ) used different datasets to investigate the practicality of applying techniques from VPR to LiDAR intensity data. Their results suggest that visual representations (such as intensity images) of places are useful for place recognition and are an effective means for determining loop closures.

In order to solve the problem that the network models of place recognition methods with higher detection accuracy are usually very large, while the application speed of the methods with smaller network models are not fast enough in actual scenarios,  Ye et al. ( 2022 ) proposed an efficient 3D-PCPR approach based on feature point extraction and transformer(FPET-Net). The method first projects the raw 3D point cloud to range image to get the horizontal index and scan index of each point and then calculates the curvature value to filter the feature points. Then, a point transformer module is developed to extract global descriptors. Finally, a feature similarity network module is used to calculate global descriptor similarity.

 Ma et al. ( 2023 ) presented SeqOT, a transformer-based network designed for place recognition using sequential 3D LiDAR scans obtained from an onboard sensor. The method aims to effectively utilize the temporal and spatial information present in the sequential range images derived from the LiDAR data. SeqOT is an end-to-end network for long-term place recognition and uses a multiscale transformer to generate global descriptors for each LiDAR range image sequence. It finds similar places by matching the descriptor of the current query sequence with the descriptors stored in the map.

4.3 Bird-eye view based

Bird-eye view (BEV) based methods first project the raw 3D point clouds to BEV, then use the BEV information for subsequent place recognition. A prominent method in this category is DiSCO (Xu et al. 2021 ) (Differentiable Scan Context with Orientation) which can simultaneously find the scan at a similar place and estimate the relative orientation. The main idea of DiSCO is to convert the rotation-invariant signature to the translation-invariant frequency spectrum. It efficiently estimates the global optimal relative orientation by projecting a 3D point cloud to a polar BEV image and reorganizes the same height voxel values into image channels to construct a multi-layer BEV. Low overlap between input point clouds may lead to registration failures, especially in scenes where non-overlapping regions contain similar structures. To solve this problem and inspired by DiSCO (Xu et al. 2021 ),  Li et al. ( 2023 ) presented a unified BEV model for jointly learning of 3D local features and overlap estimation for simultaneous pairwise registration and loop closure.

figure 7

Illustration of BEVPlace modules (from Luo et al. ( 2023 )). The BEVPlace network projects point clouds to BEV images and extracts rotation-invariant global features. The position estimator module recovers geometry distances from feature space and estimates the positions of query point clouds

BVMatch (Luo et al. 2021 ) is a LiDAR-based frame-to-frame place recognition method that is able to estimate 2D relative poses. Since the ground area can be approximated as a plane, BVMatch employs a BEV image which is projected from the raw 3D point cloud as the intermediate representation and introduces the BVFT descriptor to perform matching. Leveraging the BVFT descriptors, the method unifies the 3D-PCPR task and pose estimation. However, BVMatch cannot generalize well to unseen environments. In a follow-up work, the authors proposed a rotation-invariant network called BEVPlace (Luo et al. 2023 ), as shown in Fig.  7 . It uses group convolution (Cohen and Welling 2016 ) to extract rotation-equivariant local features from BEV images, and NetVLAD (Arandjelovic et al. 2016 ) for global feature aggregation. Furthermore, BEVPlace observes that the distance between BEV features correlates with the geometric distance of point clouds. Based on the above structure, BEVPlace is able to estimate the position of the query point cloud for place recognition.

4.4 Summary

To summarize, the main idea of the projection-based 3D-PCPR methods is to first project the raw 3D point clouds to 2D planes, images, or BEV information, and then use the projected information for subsequent processing to achieve place recognition. Based on the planar projection, it lays the foundation for hand-crafted descriptors, such as Scan Contex (Kim and Kim 2018 ), BOW3D (Cui et al. 2023 ), and other methods. Projection is followed by image feature extraction or a deep learning network to construct place recognition algorithms, such as OREOS (Schaupp et al. 2019 ), OverlapNet (Chen et al. 2020a ), OverlapTransformer (Ma et al. 2022 ), SeqOT (Ma et al. 2023 ), etc. Furthermore, the projected image or BEV information sequence (or their combination) can also be used to construct new place recognition algorithms, such as BEVPlace (Luo et al. 2023 ), etc. Projection-based methods have achieved great success recently, however, in the process of projecting 3D point clouds to planes, images, or BEV, there is inevitable information loss, which can undermine the place recognition accuracy. Multi-projection-based methods (Ma et al. 2023 ) can mitigate the information loss, however, such methods increase the processing time, resulting in a trade-off between accuracy and time cost.

5 Segment based methods

Segment-based methods are another popular category in 3D-PCPR. The main idea is to segment the raw point cloud and then use the post-segment features, semantic information, or graph structure for subsequent processing to realize place recognition. We divide segment-based methods into three categories: post-segment features-based, semantic-based, and graph-based methods.

5.1 Post-segment features based

Post-segment features-based methods utilize the post-segment features of the raw 3D point clouds for subsequent place recognition. A pioneering method in this category is SegMatch (Dubé et al. 2017 ) which is the first to present real-time loop-closure detection and localization in 3D point clouds (see Fig.  8 ). SegMatch first segments the raw 3D point cloud into sets of point clusters and then uses post-segment features encoded by a CNN on the clusters to find place matches. Finally, a geometric verification step is applied to turn the candidate matches into place recognition candidates. Since segmentation provides a good compromise between local and global descriptions by combining their advantages while mitigating their disadvantages, this method not only reduces the matching time but also decreases the likelihood of false matches. In a follow-up work, Dubé et al. ( 2018 ) proposed an incremental segment-based localization for 3D-PCPR, which utilizes an incremental segmentation algorithm to track the evolution of single segments. It is the first work to propose combining incremental solutions to normal estimation, segmentation, and recognition for finding global associations in 3D point clouds. It is interesting to investigate incremental updates of learning-based descriptors that can potentially gain discriminative power and reliability over time. To precisely estimate a robot’s pose in unstructured, dynamic environments,  Dube et al. ( 2020 ) also put forward SegMap, a 3D segment mapping method using Data-Driven Descriptors. SegMap decomposes the robot’s surroundings into a set of segments, each represented by a distinctive, low-dimensional learning-based descriptor. It is the first work on robot localization proposing to reuse the extracted features for reconstructing 3D environments and extracting semantic information.

figure 8

Illustration of SegMatch block (from Dubé et al. ( 2017 )). SegMatch is a modular place recognition algorithm composed of 4 main modules: point cloud segmentation, feature extraction, segment matching, and geometric verification

Tinchev et al. ( 2018 ) proposed Natural Segmentation and Matching (NSM), an extension of SegMatch (Dubé et al. 2017 ), for place recognition in both urban and natural environments. Their method first uses a feature extraction module to extract stable and reliable object-sized segments from point clouds. Next, repeatable oriented key poses are extracted and matched with a reliable shape descriptor using Random Forests to estimate the current sensor’s position within the target map. The key poses extraction module segments and defines consistent orientated coordinate frames for object-sized segments, and the descriptor is employed to recognize different instances of the same segment. To adapt to online applications, Tinchev et al. ( 2019 ) then explored laser-based localization in both urban and natural environments and proposed a deep learning approach capable of learning meaningful descriptors directly from 3D point clouds as well as a feature space representation for the set of segmented point clouds. Their main contribution is a novel description method for segment-based 3D-PCPR, using a lightweight model that can be deployed using only a CPU.

SEED (Fan et al. 2020 ) is a segmentation-based egocentric 3D point cloud descriptor for loop closure detection. For robustness to noise and low/varying resolution, the method first obtains different segmented objects and then encodes their topological information into descriptors. The method is rotation invariant and insensitive to translation variations. However, its performance drops significantly when there are fewer objects in the scene.

Tomono ( 2020 ) proposed a method that uses geometric segments, such as planes, lines, and balls, to reduce the number of matching elements in the point cloud registration process for loop detection. Their method uses geometric constraints between the segments to achieve robustness and reduce matching element combinations, for real-time loop detection. However, when the environment lacks a sufficient number of salient objects and physical features, it struggles to find good loop hypotheses due to the lack of geometric segments. Locus (Vidanapathirana et al. 2021 ) is another 3D-PCPR method for large-scale environments. It encodes topological and temporal information related to components (obtained through segmentation) of the scene. To generate a fixed-length global descriptor, a second-order pooling along with a nonlinear transform is used to aggregate the extracted multi-level features.

Wietrzykowski and Skrzypczyński ( 2021 ) proposed an extension to the segment-based global localization method for LiDAR SLAM using descriptors learned from the visual context of the segments. This method represents one of the pioneering approaches that utilize intensity images to enhance the learned descriptors of 3D segments and investigate the learning of segment descriptions that are visible in images. The solution falls between learning to describe segments that occupy part of the image and finding the context in the description. This method is inherited from SegMap (Dube et al. 2020 ) but achieves better performance.

5.2 Semantic based

Semantic-based methods for 3D-PCPR utilize the semantic information of the segmented 3D point cloud for subsequent place recognition. For example, Zaganidis et al. ( 2019 ) presented a SLAM pipeline based on semantic-assisted NDT and PointNet++ (Qi et al. 2017 ) for place recognition and loop closure detection. Their method first segments the raw 3D point cloud and then utilizes geometric and semantic information of the environment and a single deep semantic segmentation network for registration and loop closure detection.

Semantic scan context (SSC) (Li et al. 2021 ) is a large-scale place recognition method that leverages high-level semantics features and corrects the translation between point clouds for improved accuracy. The algorithm framework mainly consists of two parts: a two-stage global semantic iterative closest point (ICP) (Besl et al. 1992 ) algorithm and a semantic scan context (SSC). Semantic segmentation is first performed on the raw 3D point cloud, and then the semantic information is used to preserve representative objects and project them into the x-y plane. The two-stage global semantic ICP is performed on the projected point cloud to obtain the 3D pose which is used to align the original point cloud and generate global descriptors. Finally, the similarity score is obtained by matching the global descriptors. Similar to most place recognition methods, the SSC method does not consider pitch and roll angles, leading to a possible failure in some extreme cases. Li et al. ( 2021 ) presented a global semantic descriptor for 3D-PCPR. To resolve ambiguous geometric features in scenes containing similar objects, their algorithm mainly relies on static semantic information such as trunks, poles, traffic signs, buildings, roads, and sidewalks. The descriptors not only record the geometrical structure of a 3D LiDAR scan but also encode the semantic distribution information.

 Yin et al. ( 2021 ) proposed PSE-Match, a viewpoint-free place recognition method with parallel semantic embedding. PSE-Match incorporates a divergence place learning network to capture different semantic attributes in parallel through the spherical harmonics. This way, the observed variance of semantic attributes is smaller than the original point cloud.

5.3 Graph based

Graph-based 3D-PCPR methods utilize the graph structure information of the segmented point cloud for subsequent place recognition. Semantic graph-based place recognition (SGPR) (Kong et al. 2020 ) is a pioneer semantic graph representation and graph matching method for 3D-PCPR. Getting its inspiration from how humans perceive the environment by distinguishing scenes through semantic objects and their topological relations, SGPR utilizes semantic segmentation on raw 3D point clouds to obtain instances and further collects semantic and topological information together to acquire nodes forming the semantic graph. It leverages the semantic level to achieve superior robustness to environmental changes. The method is rotation invariant since the network can capture topological and semantic information from the point cloud. However, given its reliance on semantic segmentation, SGPR still suffers from bottlenecks, such as the test dataset’s pre-defined semantic classes.

Zhu et al. ( 2020 ) proposed GOSMatch, a graph-of-semantics matching method for loop detection and 3D-PCPR. GOSMatch leverages the spatial relationship between semantics to generate descriptors and employs a coarse-to-fine strategy to efficiently search for loop closures. Once the loop closure is confirmed, GOSMatch can give an accurate 6-DOF initial pose estimate. This is the first method that leverages object-level semantics graphs to detect loop closures in 3D laser data. Instead of manually constructing the graph, Shi et al. ( 2021 ) employed an extension of graph data analysis methods Graph Neural Network (GNN) (Waikhom and Patgiri 2022 ) to facilitate the keypoint matches between two point clouds, which were subsequently utilized for point cloud registration and place recognition (Shi et al. 2023 ). Utilizing a GNN-based approach allows for the extraction of improved point matches, leading to enhanced accuracy and robustness in pose estimation and place recognition outcomes. SA-LOAM (Li et al. 2021 ) is a semantic-aided LiDAR SLAM method with loop closure detection. It leverages a semantic-assisted ICP, including semantic matching, downsampling, and planar constraint, and integrates a semantic graph-based place recognition method in the loop closure detection module. SA-LOAM exploits semantic information to improve the accuracy of point cloud registration and designs a semantic-graph-based loop closure detection module to eliminate the accumulated error.

To leverage the spatial relations of internal structures for place recognition, Gong et al. ( 2021 ) presented a two-level framework based on a spatial relation graph. The framework first segments the 3D point cloud into multiple clusters, then extracts features from each cluster and the spatial relation descriptors between clusters to represent the 3D point cloud scene. Finally, a two-level matching model is proposed for accurately and efficiently matching the spatial relation graph. Dai et al. ( 2022 ) proposed a new place recognition method named SC-LPR, which uses spatiotemporal contextual information from LiDAR scans to increase the capacity of feature representation. A semantic graph is constructed to represent the topological geometric map, and an end-to-end network is designed to predict similarity.

5.4 Summary

To sum up, the main idea of the segment-based 3D-PCPR methods is to segment the raw 3D point cloud and then use the segmented point cloud features, semantic information, or graph structure information for place recognition. Post-segment features-based place recognition methods can reduce the number of calculations and extract more effective point cloud features, such as SegMatch (Dubé et al. 2017 ), Locus (Vidanapathirana et al. 2021 ), etc. Semantic-based place recognition methods introduce high-level semantic information after segmentation to improve the accuracy of place recognition, such as SSC (Li et al. 2021 ), PSE-Match (Yin et al. 2021 ), etc. Graph-based place recognition methods use the instance information formed after segmentation to construct the graph structure and recognize the scene by identifying object-object relationships, such as SGPR (Kong et al. 2020 ), GOSMatch (Zhu et al. 2020 ), etc. Segment-based 3D-PCPR methods have further promoted the development of place recognition algorithms. However, this category of methods relies heavily on the accuracy of point cloud segmentation and semantic recognition. Further research is required to overcome this bottleneck.

6 Multimodal based methods

In adverse conditions, place recognition with a single-sensor or single-method could become challenging. Therefore, multimodal-based methods are also popular where the main idea is to combine 3D point clouds with other data (or sensor) modalities, such as RGB images, range images, and/or BEV information, etc. The combined multimodal data is then used as input for the subsequent place recognition processing. We divide multimodal-based methods into three categories: camera-LiDAR based, radar-LiDAR based, and multi-view fusion based methods.

6.1 Camera-LiDAR based

Camera-LiDAR based methods for 3D-PCPR mainly combine 3D point clouds with camera image information as input data for subsequent place recognition. LiDAR suffers from limitations such as motion distortion, degenerate environment, and limited range (since the laser may not reflect back with sufficient strength from far off objects). On the other hand, cameras do not have these limitations but encounter problems associated with varying illumination, occlusions, and season changes. Therefore, increasing attention has been paid to developing methods for fusing the information from cameras and LiDAR sensors. For example, Żywanowski et al. ( 2020 ) made a comparison of camera-based, 3D LiDAR-based, and joint camera-LiDAR-based place recognition across different weather conditions and concluded the need for more research on loop closures performed with multi-sensory fusion.

Xie et al. ( 2020 ) proposed a fusion algorithm that robustly captures the image and point cloud descriptors to solve the place recognition problem. In their method, point cloud descriptors are obtained with PointNetVLAD (Uy and Lee 2018 ) and image-based descriptors are extracted using ResNet50. A fully-connected layer is then employed to produce a compact global multimodal descriptor for each place. Their network finally learns an optimal metric to describe the similarity of the fused global descriptors for end-to-end place recognition.  Lu et al. ( 2020 ) proposed PIC-Net, a point cloud and image collaboration network for large-scale place recognition. PIC-Net uses spatial attention VLAD to fuse the discriminative points and pixels, and mines the complementary information between the image and point cloud. Comparative results show that PIC-Net outperforms the image-based and point cloud-based methods.  Pan et al. ( 2021 ) presented CORAL, a bi-modal descriptor place recognition method that can extract a compound global descriptor from camera and LiDAR data. It first builds an elevation image generated from the 3D point cloud as a structural representation. The elevation image is then enhanced with projected RGB image features and processed using a deep neural network. A NetVLAD layer is employed to aggregate the extracted local features.

MinkLoc++ (Komorowski et al. 2021 ) was proposed by Komorowski et al.as a LiDAR and monocular image fusion method for place recognition. As shown in Fig.  9 , MinkLoc++ puts forward a discriminative multimodal descriptor based on a point cloud from a LiDAR and an image from an RGB camera. The method uses a fusion approach, where each modality is processed separately and fused in the final part of the processing pipeline. MinkLoc++ is an effective solution for the problem of dominating modality which adversely affects the discriminability of a multimodal descriptor.

figure 9

Illustration of MinkLoc++ architecture. 3D point cloud and RGB image are processed by separate networks to extract their respective descriptors which are then aggregated to produce a fused multimodal descriptor

By leveraging the benefits of semantic understanding, Cramariuc et al. ( 2021 ) introduced SemSegMap, an extension of SegMap (Dube et al. 2020 ), which seamlessly combines color and semantic information from an RGB camera with LiDAR data in real-time. SemSegMap introduces novel processes for segmentation and descriptor extraction. The integration of cameras into a LiDAR-equipped platform is typically straightforward in real-world robotic applications. This method has demonstrated commendable performance and holds promising prospects for practical applications.

Many existing camera-LiDAR fusion methods simply combine the two sensors without considering their performance characteristics in different environments. To address this limitation, Lai et al. ( 2022 ) introduced AdaFusion, an adaptive weighting visual-LiDAR fusion method. AdaFusion goes beyond conventional approaches by dynamically learning the weights for both image and 3D point cloud features. By utilizing an attention branch network, AdaFusion adaptively assigns weights to the camera and LiDAR sensors based on the current environmental conditions which enhances the system’s recognition accuracy and robustness across various environments. AdaFusion represents a significant improvement in fusion techniques, enabling more effective utilization of camera and LiDAR data.

6.2 Radar-LiDAR based

Radar-LiDAR based 3D-PCPR methods combine 3D point clouds obtained from a radar and a LiDAR to perform subsequent place recognition. A notable multimodal range dataset for this line or research is MulRan (Kim et al. 2020 ) that contains radar and LiDAR data of urban environments. MulRan focuses on range sensor-based place recognition and provides 6D baseline trajectories of a vehicle for ground truth place recognition. This dataset is expected to promote the development of range-LiDAR based place recognition technology.

 Yin et al. ( 2021 ) introduced Radar-to-LiDAR, a heterogeneous measurement-based framework for long-term place recognition. This method retrieves query radar scans from an existing LiDAR map. Initially, the radar and LiDAR points are encoded using Scan Context (Kim and Kim 2018 ) and then a shared U-Net transforms the handcrafted features to learned representations. Applying this method on a current radar scan, a robot can recognize the revisited LiDAR submaps.

Traditional 3D-PCPR methods assume that reliable prior maps are available. Tang et al. ( 2021 ) proposed a different approach which assumes that an overhead view of the workspace is available instead. The overhead view is used as a map for radar and LiDAR based localization. Their method consists of three steps: rotation inference, image generation, and pose estimation. To compare overhead imagery with ground-range sensor data, they propose a learned metric localization method that handles modality differences. This metric is cost-effective to train and can learn in a self-supervised manner without the need for metric-accurate ground truth. Based on this idea, off-the-shelf, publicly available overhead imagery (such as Google satellite imagery) can become a ubiquitous, low-cost, and powerful localization tool when prior maps are not available or convenient.

6.3 Multi-view fusion based

Multi-view fusion based 3D-PCPR methods mainly combine 3D point clouds and multi-view fusion information for place recognition. A notable method in this category is FusionVLAD (Yin et al. 2021 ), which is a parallel fusion network structure that learns the point cloud representations from multi-view projections and embeds them into viewpoint-free low-dimensional place descriptors for efficient global recognition. This method consists of a spherical-view branch for orientation invariant feature extraction and a top-down view branch for translation insensitive feature extraction. Moreover, a parallel fusion module is designed to enhance the combination of region-wise feature connection between the two branches.

Many existing 3D-PCPR methods adopt a shared representation of the input point cloud, disregarding different views and potentially underutilizing the LiDAR sensor’s information. Ma et al. ( 2023 ) proposed CVTNet, a novel approach based on cross-view transformers, aimed at fusing range image views and Bird’s Eye View (BEV) representations derived from LiDAR data. CVTNet leverages intra-transformers to capture correlations within each view and inter-transformers to capture correlations between the two distinct views. By utilizing CVTNet, a yaw-invariant global descriptor is generated for each LiDAR point cloud in an end-to-end fashion. This descriptor enables the retrieval of previously visited places by matching descriptors between the current query scan and a pre-built database.

The task of localizing images on a large-scale point cloud map is still relatively unexplored. To address this challenge, Li et al. ( 2023 ) introduced I2P-Rec, a method designed for image recognition on large-scale point cloud maps using BEV Projections. The BEV image serves as an intermediate representation, which is then fed into a CNN to extract global descriptors for matching purposes.

6.4 Summary

In summary, multimodal-based 3D-PCPR methods aim to overcome the limitations of single-sensor or single-modality approaches by fusing point cloud information with other modalities. This fusion leverages the complementary nature of multimodal information, such as Camera-LiDAR, Radar-LiDAR, and Multi-view Fusion, among others. These methods, exemplified by MinkLoc++ (Komorowski et al. 2021 ), AdaFusion (Lai et al. 2022 ), Radar-to-LiDAR (Yin et al. 2021 ), CVTNet (Ma et al. 2023 ), and others, strive to achieve robust and adaptable place recognition in complex and dynamic environments. Whereas multimodal 3D-PCPR methods enhance the robustness of place recognition, they also require careful synchronization and calibration of sensors, which can be a challenging task. Continued development in the field of multimodal-based 3D-PCPR holds promise for further advancements and improvements in place recognition capabilities.

7 Datasets and performance

Given the emergence of numerous advanced algorithms for 3D-PCPR, conducting a comprehensive and unbiased performance evaluation and comparison of existing methods becomes crucial. In this section, we present a selection of prominent 3D-PCPR datasets and evaluation metrics commonly utilized for assessing the performance of these methods.  Additionally, we present a performance comparison of mainstream algorithms in the field of 3D-PCPR to aid readers in gaining a clearer understanding of the strengths and limitations associated with different existing approaches.

7.1 Datasets

Public datasets play a pivotal role in advancing 3D-PCPR research. Numerous 3D point cloud datasets have been utilized to evaluate the performance of place recognition algorithms, serving as benchmark baselines and providing valuable ground truth information. These datasets enable researchers worldwide to conduct their investigations without being constrained by system or data limitations. In the following, we introduce a selection of popular and representative public datasets within the field, which are listed in Table 2 . These datasets serve as valuable resources for evaluating and comparing different approaches, fostering progress and innovation in the domain of 3D-PCPR.

Ford Campus (Pandey et al. 2011 ): Ford Campus Vision and LiDAR dataset was collected by an autonomous ground vehicle testbed. The dataset consists of time-registered data from sensors mounted on the vehicle, collected while driving around the Ford Research campus and downtown Dearborn, Michigan during November-December 2009. The vehicle paths in the Ford campus dataset contain several large and small-scale loop closures, to assist in developing and testing place recognition algorithms. The dataset contains the vehicle’s ground truth pose in the local coordinate system, including the vehicle’s 3D rotation angle (roll, pitch, and yaw), 3D acceleration, 3D velocity, and timestamp.

KITTI (Geiger et al. 2013 ): KITTI (Karlsruhe Institute of Technology and Toyota Technological Institute) is one of the most popular datasets in mobile robotics, autonomous driving, and computer vision research. It consists of hours of traffic scenarios recorded with a variety of sensor modalities, including high-resolution RGB, grayscale stereo cameras, and a 3D laser scanner. 3D-PCPR methods mainly use the 3D LiDAR data. The dataset provides 11 sequences with ground truth trajectories for training (00–10) and 11 sequences without real trajectories for evaluation (11–21).

NCLT (Carlevaris-Bianco et al. 2016 ): NCLT is a large-scale, long-term autonomy dataset, including 34.9 h of logs covering 147.4 km of robot trajectory collected on the University of Michigan’s North Campus. It consists of omnidirectional imagery, 3D LiDAR, planar LiDAR, GPS, and ground-truth pose information. It has 27 sessions, each containing both indoor and outdoor environments, spaced approximately biweekly over the course of 15 months. Although the same area is repeatedly explored, the path for each session is varied, as is the time of the day for each session-from early morning to just after dusk. NCLT can facilitate long-term place recognition research in challenging environments such as moving obstacles, changing lighting, varying viewpoints, seasonal and weather changes, and long-term structural changes caused by construction projects. The dataset uses LiDAR scan matching and high-precision RTK GPS to provide ground truth robot pose.

Oxford RobotCar (Maddern et al. 2017 ): This dataset was collected by repeatedly traversing an approximately 10 km route in central Oxford, UK for over one year. It contains 100+ traversals of a consistent route, capturing the large variation in appearance and structure of a dynamic city environment over long periods of time. The dataset contains images, LiDAR, GPS, and INS ground truth data, captured in many different combinations of weather, traffic, and pedestrians, along with longer-term changes such as construction and roadworks.

USRABD (Uy and Lee 2018 ): USRABD dataset is a collection of three datasets proposed by the authors of PointNetVLAD (Uy and Lee 2018 ) for 3D-PCPR. The three datasets include a university sector (U.S.), a residential area (R.A.), and a business district (B.D.) dataset. USRABD dataset was collected using a LiDAR sensor mounted on a car. The data collection vehicle traveled through three areas of U.S., R.A., and B.D. covering a distance of 10, 8, and 5 km repeatedly at different time periods. This dataset has been used as a mainstream benchmark often together with the Oxford RobotCar dataset (Maddern et al. 2017 ). Ground truth GPS coordinates for the three datasets can be found in the corresponding csv files.

Oxford Radar RobotCar (Barnes et al. 2020 ): This dataset is a radar extension to The Oxford RobotCar dataset. It mainly utilizes a Navtech CTS350-X Millimetre-Wave FMCW radar and Dual Velodyne HDL-32E LiDARs for 280 km of driving around Oxford, UK. The dataset was gathered in January 2019 over 32 traversals of a central Oxford route and includes a variety of weather, traffic, and lighting conditions. In addition to the raw sensor recordings from all sensors, this dataset provides an updated set of calibrations, ground truth trajectories for the radar sensor as well as MATLAB and Python development tools for leveraging the data.

MulRan (Kim et al. 2020 ): MulRan is a multimodal range dataset for radar and LiDAR specifically targeting the urban environment. It focuses on the 3D-PCPR problem and provides 6D baseline trajectories of a vehicle for place recognition ground truth. MulRan captures both temporal and structural diversities for 3D place recognition research.

Haomo (Ma et al. 2022 ): This dataset was collected in urban environments of Beijing by a mobile robot built by HAOMO.AI Technology company equipped with a HESAI PandarXT 32-beam LiDAR sensor, a SENSING-SG2 wide-angle camera, and an ASENSING-INS570D RTK GNSS. There are currently five sequences: seq 1–1 and 1–2 were collected from the same route on 8th December 2021 with opposite driving directions. An additional seq 1–3 from the same route is utilized as the online query with respect to both 1–1 and 1–2 respectively to evaluate place recognition performance of forward and reverse driving. Seq 2–1 and 2–2 are collected along a much longer route from the same direction, but on different dates i.e. 28th December 2021 and 13th January 2022, respectively. The former is used as a database while the latter one is used as query. The two sequences are for evaluating the performance for large-scale long-term place recognition.

HPointLoc (Yudin et al. 2023 ): HPointLoc is a point cloud-based indoor place recognition dataset with synthetic RGB-D images. It is based on the popular Habitat (Savva et al. 2019 ) simulator from 49 photorealistic indoor scenes from the Matterport3D (Chang et al. 2017 ) dataset and contains 76,000 frames. The HPointLoc dataset is split into two parts: the validation HPointLoc-Val, which contains only one scene, and the complete HPointLoc-All dataset, containing all 49 scenes, including HPointLoc-Val. Although the dataset does not have ground truth poses, it provides an estimate of an average registration error between corresponding surface points of 1 cm or less.

Perth-WA (Ibrahim et al. xxx ): Perth-WA dataset was first presented in (Ibrahim et al. 2023 ), and contains 6DoF annotations for localization in a 3D point cloud map of the Perth city in Western Australia. The 3D map is constructed using a 64-channel LiDAR and covers 4 km \(^{2}\) region of the Perth Central Business District.  The dataset scenes contain commercial structures, residential areas, food streets, complex routes, and hospital buildings etc. Perth-WA was collected in 3 different 2-hour sessions under day/night conditions with sunny and cloudy weather. Particularly, its labels come directly from the LiDAR frames themselves. This dataset leverages the map creation process itself to extract ground truth poses and contains loop data with LiDAR frames and their ground truth pose labels in text files.

 Wild-Places (Knights et al. 2023 ): Wild-Places is a challenging large-scale dataset for 3D-PCPR in unstructured, natural environments. It contains 8 LiDAR sequences collected with a handheld sensor payload over the course of 14 months, containing a total of 63K undistorted LiDAR submaps along with accurate 6DoF ground truth. Wild-Places contains multiple revisits and uses Wildcat (Ramezani et al. 2022 ) system to generate accurate intra-sequence ground truth.

7.2 Evaluation metrics

Numerous evaluation metrics have been proposed to test the effectiveness of place recognition methods (Li et al. 2021 ; Cui et al. 2023 ; Ferrarini et al. 2020 ). Here, we introduce some of the commonly used evaluation metrics.

Precision ( P ): Precision denotes the ratio between the correct matches and the total of the predicted positive matches. Precision is defined as:

where TP are the number of True Positives (i.e. correct matches), FP are False Positives (i.e. incorrect matches), FN are False Negatives (i.e. matches erroneously excluded from the query results).

Recall ( R ): Recall is the proportion of real positive cases that are correctly identified as positive matches. Formally:

Recall@ N is also commonly used which measures the proportion of relevant items retrieved in the top N results. Particularly, Average Recall@1 (AR@1) measures the proportion of relevant items retrieved as the first item in the list of results and Average Recall@1% (AR@1%) is calculated taking into account top- k matches, where k is 1% of the database size. Higher values of Recall@ N , Recall@1, and Recall@1% indicate better performance.

PR -Curve: PR -Curve is a graph with recall values on the x -axis and precision values on the y -axis. It shows the relationship between precision and recall values.

\(F_{1}\) score: \(F_{1}\) score combines the precision and recall values into a single metric by taking the harmonic mean of P and R . It treats P and R as equally important and measures the overall performance of the test systems. The \(F_{1}\) score is defined as:

where P and R represent the Precision and Recall values, respectively.

Extended Precision ( EP ): Extended Precision ( EP ) provides more comprehensive insights into place recognition performance. It is designed specifically for evaluating place recognition algorithms. The Extended Precision is defined as:

where \(P_{R 0}\) is the precision at minimum Recall value, and \(R_{P 100}\) is the max Recall at 100% Precision, i.e. it is the highest value of the recall that can be reached without any False Positives ( FP ).

7.3 Performance comparison

We present the comparative performance of some representative algorithms in 3D-PCPR on typical public datasets, including Oxford RobotCar dataset (Maddern et al. 2017 ), USRABD dataset (Uy and Lee 2018 ): University Sector (U.S.), Residential Area (R.A.), Business District (B.D.), and KITTI dataset (Geiger et al. 2013 ). The results are collected from the original papers (Uy and Lee 2018 ; Komorowski 2021 , 2022 ; Kong et al. 2020 ; Li et al. 2021 ; Cui et al. 2023 ; Liu et al. 2019 ; Luo et al. 2023 ; Hou et al. 2022 ; Xu et al. 2021 ; Lai et al. 2022 ).

In Table  3 , we present a performance comparison of some state-of-the-art 3D-PCPR methods (including PointNetVLAD Footnote 1 (Uy and Lee 2018 ), PCAN Footnote 2 (Zhang and Xiao 2019 ), LPD-Net Footnote 3 (Liu et al. 2019 ), EPC-Net Footnote 4 (Hui et al. 2022 ), SOE-Net Footnote 5 (Xia et al. 2021 ), HiTPR (Hou et al. 2022 ), MinkLoc3D Footnote 6 (Komorowski 2021 ), NDT-Transformer Footnote 7 (Zhou et al. 2021 ), PPT-Net Footnote 8 (Hui et al. 2021 ), SVT-Net Footnote 9 (Fan et al. 2022 ), TransLoc3D Footnote 10 (Xu et al. 2021 ), MinkLoc3Dv2 Footnote 11 (Komorowski 2022 ), OREOS (Schaupp et al. 2019 ), Scan Context Footnote 12 (Kim and Kim 2018 ), DiSCO Footnote 13 (Xu et al. 2021 ), BEVPlace Footnote 14 (Luo et al. 2023 ), PIC-Net (Lu et al. 2020 ), MinkLoc++ Footnote 15 (Komorowski et al. 2021 ), CORAL (Pan et al. 2021 ), AdaFusion Footnote 16 (Lai et al. 2022 ) ) according to the categories of feature-based, projection-based, and multimodal-based methods.The evaluation is based on the AR@1 and AR@1% metrics. The performance of each method shown in this table is mainly evaluated on the Oxford RobotCar dataset (Maddern et al. 2017 ) and USRABD dataset (Uy and Lee 2018 ). These results provide valuable insights into the performance of the examined methods under specific conditions and facilitate comparison and analysis within the field of 3D-PCPR.

Table 3 provide a comprehensive overview of the advancements in the field of 3D-PCPR, highlighting the emergence of numerous state-of-the-art algorithms in recent years. Starting from the pioneering algorithm PointNetVLAD (Uy and Lee 2018 ), significant improvements have been made, as seen in the LPD-Net algorithm (Liu et al. 2019 ), which exhibits enhanced performance. More recently, the MinkLoc3D series algorithms (Komorowski 2021 , 2022 ), the BEVPlace algorithms (Luo et al. 2023 ), and the AdaFusion (Lai et al. 2022 ) have further advanced the state-of-the-art. These algorithms demonstrate impressive performance on standard benchmark datasets and continue to progress and evolve in the field of 3D-PCPR.

 According to the categories of feature-based, projection-based, and segment-based methods, Table  4 gives a performance comparison of some state-of-the-art methods (PointNetVLAD Footnote 17 (Uy and Lee 2018 ), M2DP Footnote 18 (He et al. 2016 ), ISC Footnote 19 (Wang et al. 2020 ), LiDAR Iris Footnote 20 (Wang et al. 2020 ), Scan Context Footnote 21 (Kim and Kim 2018 ), OverlapNet Footnote 22 (Chen et al. 2021 ), BoW3D Footnote 23 (Cui et al. 2023 ), SGPR Footnote 24 (Kong et al. 2020 ), SSC-RN Footnote 25 (Li et al. 2021 ) ) in terms of the \(F_1\) max scores and Extended Precision ( \(F_1\) / EP ), along with their capability to accurately correct the full 6-DoF loop pose on the KITTI dataset (Geiger et al. 2013 ). The evaluation focuses specifically on the sequences with loop closures (00, 02, 05, 06, 07, and 08) from the KITTI dataset. These sequences are selected to facilitate a convenient and standardized evaluation process. By examining the results in Table  4 , valuable insights can be gained regarding the performance and effectiveness of the analyzed state-of-the-art methods in the context of loop pose correction on the KITTI dataset.

As can be seen from Table  4 , based on the KITTI dataset and \(F_1\) max scores and Extended Precision ( EP ) evaluation metrics, many advanced 3D-PCPR algorithms have emerged (He et al. 2016 ; Uy and Lee 2018 ; Wang et al. 2020 , 2020 ; Kong et al. 2020 ; Kim and Kim 2018 ; Chen et al. 2021 ; Li et al. 2021 ; Cui et al. 2023 ). The BOW3D (Cui et al. 2023 ) algorithm recently proposed by Cui et al.has not only achieved excellent performance (mean \(F_1\) / EP : 0.885/0.906) but also can be used to correct the full 6-DoF loop pose.

8 Applications and future trends

This section delves deeper into the downstream applications of 3D-PCPR technology and highlights the anticipated trends in future development. By exploring these applications and trends, researchers can gain a more holistic and expedited understanding of the potential uses and advancements within the realm of 3D-PCPR methods.

8.1 Applications

3D-PCPR is a key task in the navigation and localization of mobile robots, especially in large-scale, long-term, and complex scenes with closed loops. It has a wide range of applications, ranging from land to air and even interstellar exploration (Yin et al. 2022 ).

Firstly, major on land applications of 3D-PCPR on a large scale include robotics and autonomous driving. Autonomous driving is crucial for achieving intelligent transportation in the future. Currently, most research and development vehicles for autonomous driving are equipped with high-precision LiDAR sensors, enabling real-time acquisition of 3D point cloud data of the surrounding environment. We can expect that 3D-PCPR will play a key role in realizing Simultaneous Localization and Mapping (SLAM) (Chen et al. 2020a ; Kim et al. 2022 ) in autonomous vehicles. Additionally, in the domain of robotics, there are numerous smart application scenarios that can benefit from 3D-PCPR, such as smart logistics distribution (Wang et al. 2019 ) and smart indoor navigation (Xiang et al. 2018 ), among others. These applications demonstrate the versatility and potential impact of 3D-PCPR in advancing various robotics-related endeavors.

In aerial settings, the widespread adoption and utilization of unmanned aerial vehicles (UAVs) equipped with high-precision LiDAR sensors have paved the way for the extensive application of place recognition technology. This technology holds great potential in various fields such as smart agriculture, aerial photography localization, rapid delivery, and even military reconnaissance (Maffra et al. 2018 ; Patel et al. 2020 ; Hongming et al. 2022 ; Aslan et al. 2022 ). With the aid of UAVs, place recognition technology can significantly contribute to enhancing aerial navigation and mapping capabilities, enabling efficient and accurate operations in these domains.

Finally, in the realm of interstellar exploration, where traditional positioning signals like GPS or Beidou are unavailable in outer space or on alien planets, the significance and criticality of autonomous localization and navigation based on 3D point clouds become paramount. This technology finds practical application in well-known interstellar missions such as NASA’s robotic rover (Perseverance) operating on Mars and CNSA’s teleoperated rover (Yutu-2) on the Moon (Witze 2020 ; Ding et al. 2022 ). The utilization of 3D-PCPR technology in these missions demonstrates its crucial role in enabling precise positioning, navigation, and mapping in extraterrestrial environments. As humankind ventures further into space exploration, the reliance on 3D-PCPR for autonomous localization and navigation will continue to grow, making it an essential component of interstellar missions and the exploration of alien worlds.

8.2 Research trends

3D-PCPR technology has witnessed widespread application and rapid development. Single frame-based 3D-PCPR methods, exemplified by high-accuracy algorithms like MinkLoc3Dv2 (Komorowski 2022 ) and others, have achieved remarkable progress. However, despite these advancements, numerous challenges and open issues remain to be addressed in this field.  Based on our comprehensive analysis of over 180 research works, we now delve into the future research trends of 3D-PCPR. By providing a concise overview, we aim to inspire and guide future researchers in their exploration of this domain. By tackling these challenges, we can further enhance the accuracy, robustness, and efficiency of 3D-PCPR methods. The identified research trends are poised to shape the future of 3D-PCPR and drive its continued evolution as an essential technology for robotics, autonomous vehicles, aerial mapping, interstellar exploration, and beyond.

8.2.1 Based on sequence frames

Sequence-based 3D-PCPR methods leverage serialized multi-frame point clouds as input, enabling spatio-temporal feature fusion and descriptor generation. These methods surpass single-frame approaches by incorporating a broader range of information, mitigating the risk of overemphasizing intra-frame features. By employing inter-frame continuous consistency detection, sequence-based methods can capture more comprehensive and discriminative features, resulting in superior recognition performance over extended time periods. Prominent examples of sequence-based approaches, such as SeqLPD (Liu et al. 2019 ), SeqsphereVLAD (Yin et al. 2020 ), FSEPR (Yin et al. 2021 ), SeqOT (Ma et al. 2023 ), among others, have showcased inspiring and representative work in this direction. Figure  10 illustrates the structural diagram of SeqOT (Ma et al. 2023 ), a sequence-based method proposed by our team members. We anticipate that future research will yield further advancements in sequence-based 3D-PCPR, facilitating even more robust and accurate place recognition capabilities.

figure 10

Illustration of our SeqOT (Ma et al. 2023 ) which utilizes sequential range images projected from LiDAR sensors as input, simultaneously extracts spatial and temporal features, and generates a final sequence-augmented global descriptor

8.2.2 Based on long-term learning

To address the rapid decline in robustness exhibited by many classic place recognition algorithms when there is a significant time gap between the given map data and the input query data, researchers have recently proposed long-term learning-based strategies for 3D-PCPR. Long-term 3D-PCPR aims to dynamically update the model as new data streams in, enabling continuous learning of the evolving environment. This approach also tackles the challenge of catastrophic forgetting, which involves preserving the memory of the original environment while incorporating new information.

Minimizing catastrophic forgetting is a key challenge in long-term learning place recognition. Several noteworthy approaches have emerged in this area, including 1-Day Learning 1-Year Localization (Kim et al. 2019 ), Radar-to-LiDAR (Yin et al. 2021 ), SVLPR (Cao et al. 2020 ), InCloud (Knights et al. 2022 ), CCL (Cui and Chen 2023 ), among others. These methods have undertaken meaningful and valuable explorations, yielding promising results in the context of long-term learning for 3D-PCPR.

8.2.3 Cross-modal localization

As the application scenarios for place recognition continue to expand, there are situations where the sensors used to collect offline map data and query data differ or multiple sensors are employed for data collection. In such cases, conventional 3D-PCPR methods that rely on single-modal information suffer from severe performance degradation. Hence, cross-modal place recognition has emerged as a promising research direction. Cross-modal place recognition aims to address the challenges of integrating data from different modalities for improved performance. Several notable approaches have paved the way for future cross-modal 3D-PCPR, including PIC-Net (Lu et al. 2020 ), MinkLoc3D++ (Komorowski et al. 2021 ), Get to the point (Tang et al. 2021 ), AdaFusion (Lai et al. 2022 ), Text2Pos (Kolmet et al. 2022 ), (LC) \(^{2}\) (Lee et al. 2023 ), I2P-Rec (Li et al. 2023 ), UnLoc (Ibrahim et al. 2023 ), among others. However, several challenges remain in achieving more efficient cross-modal data synchronization, calibration, fusion, and the integration of high-dimensional semantic information with 3D point cloud data. Cross-modal place recognition still has a long way to go and researchers need to overcome these challenges to unlock its full potential in the future.

8.2.4 Global metric localization

Conventionally, place recognition focused only on identifying the current localization within a given map. However, to enable more advanced navigation and localization tasks, there is a growing demand for place recognition methods that can estimate precise poses or 6-DoF (degrees of freedom) while recognizing the place, ultimately providing global pose localization. Fortunately, some researchers have recognized this need and made significant contributions in this direction. Methods such as DH3D (Du et al. 2020 ), LCDNet (Cattaneo et al. 2022 ), BoW3D (Cui et al. 2023 ), Slice Transformer (Ibrahim et al. 2023 ), and others have emerged, offering the capability to estimate 6-DoF poses alongside place recognition. The development of these methods has accelerated the progress of 3D-PCPR based on global pose localization. It is foreseeable that this will become a prominent research trend in the future as the demand for precise pose estimation and global localization continues to grow in robotics and related fields.

9 Conclusion

This article presents a comprehensive survey of 3D-PCPR (3D Point Cloud-based Place Recognition) methods, aiming to provide readers with a thorough understanding of the field. The survey categorizes 3D-PCPR methods into four main categories based on the source of extracted features: feature-based, projection-based, segment-based, and multimodal-based methods. Each category is discussed in detail, providing relevant introductions and explanations. To enhance readers’ understanding, the survey also introduces common public datasets and evaluation methods used in the field of 3D-PCPR.  Additionally, it compares the performance of mainstream methods in 3D-PCPR to highlight the algorithm performance of various methods. The article further explores the technical applications and future development directions in the field of 3D-PCPR. Importantly, this survey represents the first comprehensive overview of 3D-PCPR methods that utilize 3D point clouds from different resources. It is intended to provide future researchers with a comprehensive view of the field, enabling them to contribute to the further advancement of 3D-PCPR.

https://github.com/mikacuy/pointnetvlad.git .

https://github.com/XLechter/PCAN.git .

https://github.com/Suoivy/LPD-net.git .

https://github.com/fpthink/EPC-Net.git .

https://github.com/Yan-Xia/SOE-Net.git .

https://github.com/jac99/MinkLoc3D.git .

https://github.com/dachengxiaocheng/NDT-Transformer.git .

https://github.com/fpthink/PPT-Net.git .

https://github.com/ZhenboSong/SVTNet.git .

https://github.com/slothfulxtx/TransLoc3D.git .

https://github.com/jac99/MinkLoc3Dv2.git .

https://github.com/irapkaist/scancontext.git .

https://github.com/MaverickPeter/DiSCO-pytorch.git .

https://github.com/zjuluolun/BEVPlace.git .

https://github.com/jac99/MinkLocMultimodal.git .

https://github.com/MetaSLAM/AdaFusion.git .

https://github.com/LiHeUA/M2DP.git .

https://github.com/wh200720041/iscloam.git .

https://github.com/BigMoWangying/LiDAR-Iris.git .

https://github.com/PRBonn/OverlapNet.git .

https://github.com/YungeCui/BoW3D.git .

https://github.com/kxhit/SG_PR.git .

https://github.com/lilin-hitcrt/SSC.git .

Angeli A, Filliat D, Doncieux S, Meyer J-A (2008) Fast and incremental method for loop-closure detection using bags of visual words. IEEE Trans Robot 24:1027–1037

Article   Google Scholar  

Arandjelovic R, Gronat P, Torii A, Pajdla T, Sivic J (2016) NetVLAD: CNN architecture for weakly supervised place recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)

Aslan MF, Durdu A, Sabanci K, Ropelewska E, Gültekin SS (2022) A comprehensive survey of the recent studies with UAV for precision agriculture in open fields and greenhouses. Appl Sci 12:1047

Article   CAS   Google Scholar  

Barnes D, Gadd M, Murcutt P, Newman P, Posner I (2020) The oxford radar Robotcar dataset: a radar extension to the oxford Robotcar dataset. In: IEEE International conference on robotics and automation (ICRA)

Barros T, Pereira R, Garrote L, Premebida C, Nunes UJ (2021) Place recognition survey: an update on deep learning approaches. arXiv:2106.10458

Bay H, Tuytelaars T, Van Gool L (2006) Surf: speeded up robust features. Lect Notes Comput Sci 3951:404–417

Beltran D, Basañez L (2014) A comparison between active and passive 3D vision sensors: Bumblebeexb3 and Microsoft Kinect. In: First Iberian robotics conference: advances in robotics

Besl PJ, McKay ND (1992) Method for registration of 3-d shapes. In: Sensor fusion IV: control paradigms and data structures, vol 1611, pp 586–606

Biber, P, Straßer W (2003) The normal distributions transform: a new approach to laser scan matching. In: IEEE/RSJ international conference on intelligent robots and systems (IROS) (Cat. No. 03CH37453)

Bosse M, Zlot R (2013) Place recognition using keypoint voting in large 3d lidar datasets. In: IEEE international conference on robotics and automation (ICRA)

Breuer T, Bodensteiner C, Arens M (2014) Low-cost commodity depth sensor comparison and accuracy analysis. In: Electro-optical remote sensing, photonic technologies, and applications VIII; and military applications in hyperspectral imaging and high spatial resolution sensing II, pp 77–86

Cai X, Yin W (2021) Weighted scan context: global descriptor with sparse height feature for loop closure detection. In: International conference on computer, control and robotics (ICCCR)

Calonder M, Lepetit V, Strecha C, Fua P (2010) Brief: binary robust independent elementary features. In: European conference on computer vision (ECCV)

Cao F, Zhuang Y, Zhang H, Wang W (2018) Robust place recognition and loop closing in laser-based SLAM for UGVs in urban environments. IEEE Sens J 18:4242–4252

Cao F, Yan F, Wang S, Zhuang Y, Wang W (2020) Season-invariant and viewpoint-tolerant lidar place recognition in GPS-denied environments. IEEE Trans Ind Electron 68:563–574

Carlevaris-Bianco N, Ushani AK, Eustice RM (2016) University of Michigan north campus long-term vision and lidar dataset. Int J Robot Res 35:1023–1035

Cattaneo D, Vaghi M, Valada A (2022) Lcdnet: deep loop closure detection and point cloud registration for lidar slam. IEEE Trans Robot 38:2074–2093

Chang MY, Yeon S, Ryu S, Lee D (2020) Spoxelnet: spherical voxel-based deep place recognition for 3d point clouds of crowded indoor spaces. In: IEEE/RSJ international conference on intelligent robots and systems (IROS)

Chang A, Dai A, Funkhouser T, Halber M, Niebner M, Savva M, Song S, Zeng A, Zhang Y (2017) Matterport3d: Learning from RGB-D data in indoor environments. In: International conference on 3D vision (3DV)

Chen X, Läbe T, Milioto A, Röhling T, Behley J, Stachniss C (2021) OverlapNet: a siamese network for computing LiDAR scan similarity with applications to loop closing and localization. Auton Robots 46:61–81

Chen X, Läbe T, Milioto A, Röhling T, Vysotska O, Haag A, Behley J, Stachniss C (2020) Overlapnet: loop closing for lidar-based slam. In: Proceedings of robotics: science and systems (RSS), pp 1–10

Chen X, Läbe T, Nardi L, Behley J, Stachniss C (2020) Learning an overlap-based observation model for 3D LiDAR localization. In: Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (IROS)

Cohen T, Welling M (2016) Group equivariant convolutional networks. In: International conference on machine learning (ICML)

Collier J, Se S, Kotamraju V, Jasiobedzki P (2012) Real-time lidar-based place recognition using distinctive shape descriptors. In: Unmanned systems technology XIV, vol 8387, pp 271–281

Cop KP, Borges PV, Dubé R (2018) Delight: an efficient descriptor for global localisation using lidar intensities. In: IEEE international conference on robotics and automation (ICRA)

Cramariuc A, Tschopp F, Alatur N, Benz S, Falck T, Brühlmeier M, Hahn B, Nieto J, Siegwart R (2021) Semsegmap–3D segment-based semantic localization. In: IEEE/RSJ international conference on intelligent robots and systems (IROS)

Cui Y, Chen X, Zhang Y, Dong J, Wu Q, Zhu F (2023) Bow3d: bag of words for real-time loop closing in 3d lidar slam. IEEE Robot Autom Lett 8:2828–2835

Cui Y, Zhang Y, Dong J, Sun H, Chen X, Zhu F (2024) Link3d: linear keypoints representation for 3d lidar point cloud. IEEE Robot Autom Lett. https://doi.org/10.1109/LRA.2024.3354550

Cui J, Chen X (2023) Ccl: continual contrastive learning for lidar place recognition. arXiv:2303.13952

Dai D, Wang J, Chen Z, Bao P (2022) SC-LPR: spatiotemporal context based lidar place recognition. Pattern Recognit Lett 156:160–166

Di Giammarino L, Aloise I, Stachniss C, Grisetti G (2021) Visual place recognition using lidar intensity information. In: IEEE/RSJ international conference on intelligent robots and systems (IROS)

Ding L, Zhou R, Yuan Y, Yang H, Li J, Yu T, Liu C, Wang J, Li S, Gao H et al (2022) A 2-year locomotive exploration and scientific investigation of the lunar farside by the Yutu-2 rover. Sci Robot 7:6660

Dubé R, Gollub MG, Sommer H, Gilitschenski I, Siegwart R, Cadena C, Nieto J (2018) Incremental-segment-based localization in 3-d point clouds. IEEE Robot Autom Lett 3:1832–1839

Dube R, Cramariuc A, Dugas D, Sommer H, Dymczyk M, Nieto J, Siegwart R, Cadena C (2020) SegMap: segment-based mapping and localization using data-driven descriptors. Int J Robot Res 39:339–355

Dubé R, Dugas D, Stumm E, Nieto J, Siegwart R, Cadena C (2017) Segmatch: segment based place recognition in 3D point clouds. In: IEEE international conference on robotics and automation (ICRA)

Du J, Wang R, Cremers D (2020) Dh3d: deep hierarchical 3d descriptors for robust large-scale 6dof relocalization. In: European conference on computer vision (ECCV)

Elhousni M, Huang X (2020) A survey on 3D lidar localization for autonomous vehicles. In: IEEE intelligent vehicles symposium (IV), pp 1879–1884

Endres F, Hess J, Sturm J, Cremers D, Burgard W (2013) 3-D mapping with an RGB-D camera. IEEE Trans Robot 30:177–187

Fan Y, He Y, Tan U-X (2020) Seed: a segmentation-based egocentric 3d point cloud descriptor for loop closure detection. In: IEEE/RSJ international conference on intelligent robots and systems (IROS)

Fankhauser P, Bloesch M, Rodriguez D, Kaestner R, Hutter M, Siegwart R (2015) Kinect v2 for mobile robot navigation: evaluation and modeling. In: International conference on advanced robotics (ICAR), pp 388–394

Fan Z, Liu H, He J, Sun Q, Du X (2020) Srnet: a 3d scene recognition network using static graph and dense semantic fusion. In: Computer graphics forum, vol 39, pp 301–311

Fan Z, Song Z, Liu H, Lu Z, He J, Du X (2022) Svt-net: super light-weight sparse voxel transformer for large scale place recognition. In: Proceedings of the AAAI conference on artificial intelligence

Ferrarini B, Waheed M, Waheed S, Ehsan S, Milford MJ, McDonald-Maier KD (2020) Exploring performance bounds of visual place recognition using extended precision. IEEE Robot Autom Lett 5:1688–1695

Gálvez-López D, Tardos JD (2012) Bags of binary words for fast place recognition in image sequences. IEEE Trans Robot 28:1188–1197

Geiger A, Lenz P, Stiller C, Urtasun R (2013) Vision meets robotics: the Kitti dataset. Int J Robot Res 32:1231–1237

Golla T, Klein R (2015) Real-time point cloud compression. In: IEEE/RSJ international conference on intelligent robots and systems (IROS)

Gong Y, Sun F, Yuan J, Zhu W, Sun Q (2021) A two-level framework for place recognition with 3d lidar based on spatial relation graph. Pattern Recognit 120:108171

Guo J, Borges PV, Park C, Gawel A (2019) Local descriptor for robust place recognition using lidar intensity. IEEE Robot Autom Lett 4:1470–1477

Guo Y, Wang H, Hu Q, Liu H, Liu L, Bennamoun M (2020) Deep learning for 3d point clouds: a survey. IEEE Trans Pattern Anal Mach Intell 43:4338–4364

Habich T-L, Stuede M, Labbé M, Spindeldreier S (2021) Have I been here before? learning to close the loop with lidar data in graph-based slam. In: IEEE/ASME international conference on advanced intelligent mechatronics (AIM)

Han X-F, Feng Z-A, Sun S-J, Xiao G-Q (2023) 3D point cloud descriptors: state-of-the-art. Artif Intell Rev 56:12033–12083

Hess W, Kohler D, Rapp H, Andor D (2016) Real-time loop closure in 2D lidar SLAM. In: IEEE international conference on robotics and automation (ICRA)

He L, Wang X, Zhang H (2016) M2dp: a novel 3D point cloud descriptor and its application in loop closure detection. In: IEEE/RSJ international conference on intelligent robots and systems (IROS)

Hongming S, Qun Z, Hanchen L, Zhang X, Bailing T, Lei H (2022) A distributed approach for lidar-based relative state estimation of multi-UAV in GPS-denied environments. Chin J Aeronaut 35:59–69

Hou Z, Yan Y, Xu C, Kong H (2022) Hitpr: hierarchical transformer for place recognition in point cloud. In: International conference on robotics and automation (ICRA)

Huang T, Liu Y (2019) 3d point cloud geometry compression on deep learning. In: Proceedings of the 27th ACM international conference on multimedia

Hui L, Cheng M, Xie J, Yang J, Cheng M-M (2022) Efficient 3d point cloud feature learning for large-scale place recognition. IEEE Trans Image Process 31:1258–1270

Hui L, Yang H, Cheng M, Xie J, Yang J (2021) Pyramid point cloud transformer for large-scale place recognition. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV)

Ibrahim M, Akhtar N, Anwar S, Mian A (2023) Unloc: a universal localization method for autonomous vehicles using lidar, radar and/or camera input. arXiv:2307.00741

Ibrahim M, Akhtar N, Anwar S, Wise M, Mian A (2023) Slice transformer and self-supervised learning for 6dof localization in 3d point cloud maps. arXiv:2301.08957

Ibrahim M, Akhtar N, Anwar S, Wise M, Mian A (2023) Perth-WA localization dataset in 3D point cloud maps. IEEE DataPort. https://doi.org/10.21227/s2p2-2e66

Jiang J, Wang J, Wang P, Bao P, Chen Z (2020) Lipmatch: lidar point cloud plane based loop-closure. IEEE Robot Autom Lett 5:6861–6868

Kim G, Park B, Kim A (2019) 1-day learning, 1-year localization: long-term lidar localization using scan context image. IEEE Robot Autom Lett 4:1948–1955

Kim G, Choi S, Kim A (2021) Scan context++: structural place recognition robust to rotation and lateral variations in urban environments. IEEE Trans Robot 38:1856–1874

Kim G, Kim A (2018) Scan context: egocentric spatial descriptor for place recognition within 3D point cloud map. In: IEEE/RSJ international conference on intelligent robots and systems (IROS)

Kim G, Park YS, Cho Y, Jeong J, Kim A (2020) Mulran: multimodal range dataset for urban place recognition. In: IEEE International conference on robotics and automation (ICRA)

Kim G, Yun S, Kim J, Kim A (2022) Sc-lidar-slam: a front-end agnostic versatile lidar slam system. In: International conference on electronics, information, and communication (ICEIC)

Knights J, Moghadam P, Ramezani M, Sridharan S, Fookes C (2022) Incloud: incremental learning for point cloud place recognition. In: IEEE/RSJ international conference on intelligent robots and systems (IROS)

Knights J, Vidanapathirana K, Ramezani M, Sridharan S, Fookes C, Moghadam P (2023) Wild-places: a large-scale dataset for lidar place recognition in unstructured natural environments. In: IEEE international conference on robotics and automation (ICRA), pp 11322–11328

Knott E, Skolnik M (2008) Radar handbook. McGraw-Hill, New York

Google Scholar  

Kolmet M, Zhou Q, Ošep A, Leal-Taixé L (2022) Text2pos: text-to-point-cloud cross-modal localization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)

Komorowski J (2021) Minkloc3d: point cloud based large-scale place recognition. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision

Komorowski J (2022) Improving point cloud based place recognition with ranking-based loss and large batch training. In: International conference on pattern recognition (ICPR)

Komorowski J, Wysoczańska M, Trzcinski T (2021) Minkloc++: lidar and monocular image fusion for place recognition. In: International joint conference on neural networks (IJCNN)

Kong X, Yang X, Zhai G, Zhao X, Zeng X, Wang M, Liu Y, Li W, Wen F (2020) Semantic graph based place recognition for 3d point clouds. In: IEEE/RSJ international conference on intelligent robots and systems (IROS)

Kuan YW, Ee NO, Wei LS (2019) Comparative study of intel R200, Kinect v2, and primesense RGB-D sensors performance outdoors. IEEE Sens J 19:8741–8750

Kuang H, Chen X, Guadagnino T, Zimmerman N, Behley J, Stachniss C (2023) IR-MCL: implicit representation-based online global localization. IEEE Robot Autom Lett 8:1627–1634

Labbé M, Michaud F (2019) RTAB-Map as an open-source lidar and visual simultaneous localization and mapping library for large-scale and long-term online operation. J Field Robot 36:416–446

Lai H, Yin P, Scherer S (2022) Adafusion: visual-lidar fusion with adaptive weights for place recognition. IEEE Robot Autom Lett 7:12038–12045

LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444

Lee AJ, Song S, Lim H, Lee W, Myung H (2023) (lc) \(^{2}\) : lidar-camera loop constraints for cross-modal place recognition. IEEE Robot Autom Lett 8:3589–3596

Li L, Kong X, Zhao X, Huang T, Li W, Wen F, Zhang H, Liu Y (2022) RINet: efficient 3d lidar-based place recognition using rotation invariant neural network. IEEE Robot Autom Lett 7:4321–4328

Li L, Ding W, Wen Y, Liang Y, Liu Y, Wan G (2023) A unified BEV model for joint learning of 3d local features and overlap estimation. arXiv:2302.14511

Li L, Kong X, Zhao X, Huang T, Li W, Wen F, Zhang H, Liu Y (2021) SSC: semantic scan context for large-scale place recognition. In: IEEE/RSJ international conference on intelligent robots and systems (IROS)

Li L, Kong X, Zhao X, Li W, Wen F, Zhang H, Liu Y (2021) Sa-loam: semantic-aided lidar slam with loop closure. In: IEEE international conference on robotics and automation (ICRA)

Lillesand T, Kiefer RW, Chipman J (2015) Remote sensing and image interpretation

Lin J, Zhang F (2019) A fast, complete, point cloud based loop closure for lidar odometry and mapping. arXiv:1909.11811

Li Y, Su P, Cao M, Chen H, Jiang X, Liu Y (2021) Semantic scan context: global semantic descriptor for lidar-based place recognition. In: IEEE international conference on real-time computing and robotics (RCAR)

Liu Z, Suo C, Zhou S, Xu F, Wei H, Chen W, Wang H, Liang X, Liu Y-H (2019) Seqlpd: sequence matching enhanced loop-closure detection based on large-scale point cloud description for self-driving vehicles. In: IEEE/RSJ international conference on intelligent robots and systems (IROS)

Liu Z, Zhou S, Suo C, Yin P, Chen W, Wang H, Li H, Liu Y-H (2019) Lpd-net: 3D point cloud learning for large-scale place recognition and environment analysis. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV)

Li Y, Zheng S, Yu Z, Yu B, Cao S-Y, Luo L, Shen H-L (2023) I2p-rec: recognizing images on large-scale point cloud maps through bird’s eye view projections. arXiv:2303.01043

Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60:91–110

Lowry S, Sünderhauf N, Newman P, Leonard JJ, Cox D, Corke P, Milford MJ (2015) Visual place recognition: a survey. IEEE Trans Robot 32:1–19

Lun R, Zhao W (2015) A survey of applications and human motion recognition with Microsoft Kinect. Int J Pattern Recognit Artif Intell 29:1555008

Luo L, Cao S-Y, Han B, Shen H-L, Li J (2021) Bvmatch: Lidar-based place recognition using bird’s-eye view images. IEEE Robot Autom Lett 6:6076–6083

Luo L, Zheng S, Li Y, Fan Y, Yu B, Cao S-Y, Li J, Shen H-L (2023) Bevplace: learning lidar-based place recognition using bird’s eye view images. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 8700–8709

Lu Y, Yang F, Chen F, Xie D (2020) Pic-net: point cloud and image collaboration network for large-scale place recognition. arXiv:2008.00658

Ma J, Zhang J, Xu J, Ai R, Gu W, Chen X (2022) OverlapTransformer: an efficient and yaw-angle-invariant transformer network for lidar-based place recognition. IEEE Robot Autom Lett 7:6958–6965

Ma J, Xiong G, Xu J, Chen X (2023) CVTNet: a cross-view transformer network for lidar-based place recognition in autonomous driving environments. IEEE Trans Ind Inform. https://doi.org/10.1109/TII.2023.3313635

Ma J, Chen X, Xu J, Xiong G (2023) SeqOT: a spatial-temporal transformer network for place recognition using sequential lidar data. IEEE Trans Ind Electron 70:8225–8234

Maddern W, Pascoe G, Linegar C, Newman P (2017) 1 year, 1000 km: The oxford Robotcar dataset. Int J Robot Res 36:3–15

Maffra F, Chen Z, Chli M (2018) Tolerant place recognition combining 2d and 3d information for uav navigation. In: IEEE international conference on robotics and automation (ICRA)

Magnusson M, Andreasson H, Nüchter A, Lilienthal AJ (2009) Automatic appearance-based loop detection from three-dimensional laser data using the normal distributions transform. J Field Robot 26:892–914

Magnusson M, Andreasson H, Nuchter A, Lilienthal AJ (2009) Appearance-based loop detection from 3d laser data using the normal distributions transform. In: IEEE international conference on robotics and automation (ICRA)

Masone C, Caputo B (2021) A survey on deep visual place recognition. IEEE Access 9:19516–19547

Minh D, Wang HX, Li YF, Nguyen TN (2022) Explainable artificial intelligence: a comprehensive review. Artif Intell Rev 55:3503–3568

Muhammad N, Lacroix S (2011) Loop closure detection using small-sized signatures from 3d lidar data. In: IEEE International symposium on safety, security, and rescue robotics

Olson E (2009) Recognizing places using spectrally clustered local matches. Robot Auton Syst 57:1157–1172

Pandey G, McBride JR, Eustice RM (2011) Ford campus vision and lidar data set. Int J Robot Res 30:1543–1552

Pan Y, Xu X, Li W, Cui Y, Wang Y, Xiong R (2021) Coral: colored structural representation for bi-modal place recognition. In: IEEE/RSJ international conference on intelligent robots and systems (IROS)

Patel B, Barfoot TD, Schoellig AP (2020) Visual localization with google earth images for robust global pose estimation of uavs. In: IEEE international conference on robotics and automation (ICRA)

Qi CR, Su H, Mo K, Guibas LJ (2017) Pointnet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)

Qi CR, Yi L, Su H, Guibas LJ (2017) Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in neural information processing systems, vol 30

Qiao Z, Hu H, Shi W, Chen S, Liu Z, Wang H (2021) A registration-aided domain adaptation network for 3d point cloud based place recognition. In: IEEE/RSJ international conference on intelligent robots and systems (IROS)

Ramezani M, Khosoussi K, Catt G, Moghadam P, Williams J, Borges P, Pauling F, Kottege N (2022) Wildcat: online continuous-time 3d lidar-inertial slam. arXiv:2205.12595

Röhling T, Mack J, Schulz D (2015) A fast histogram-based similarity measure for detecting loop closures in 3-d lidar data. In: IEEE/RSJ international conference on intelligent robots and systems (IROS)

Rublee E, Rabaud V, Konolige K, Bradski G (2011) Orb: an efficient alternative to sift or surf. In: International conference on computer vision (ICCV)

Sánchez-Belenguer C, Ceriani S, Taddei P, Wolfart E, Sequeira V (2020) Global matching of point clouds for scan registration and loop detection. Robot Auton Syst 123:103324

Savva M, Kadian A, Maksymets O, Zhao Y, Wijmans E, Jain B, Straub J, Liu J, Koltun V, Malik J, et al. (2019) Habitat: a platform for embodied AI research. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV)

Scaramuzza D (2014) Omnidirectional camera. In: Ikeuchi K (eds), Computer Vision: A Reference Guide. ISBN: 978-0-387-30771-8. Springer

Schaupp L, Bürki M, Dubé R, Siegwart R, Cadena C (2019) Oreos: oriented recognition of 3D point clouds in outdoor scenarios. In: IEEE/RSJ international conference on intelligent robots and systems (IROS)

Shan T, Englot B, Duarte F, Ratti C, Rus D (2021) Robust place recognition using an imaging lidar. In: IEEE international conference on robotics and automation (ICRA)

Shi C, Chen X, Huang K, Xiao J, Lu H, Stachniss C (2021) Keypoint matching for point cloud registration using multiplex dynamic graph attention networks. IEEE Robot Autom Lett 6(4):8221–8228

Shi X, Chai Z, Zhou Y, Wu J, Xiong Z (2021) Global place recognition using an improved scan context for lidar-based localization system. In: IEEE/ASME international conference on advanced intelligent mechatronics (AIM)

Shi C, Chen X, Deng W, Lu H, Xiao J, Bin D (2023) RDMNet: reliable dense matching based point cloud registration for autonomous driving. In: IEEE Transactions on intelligent transportation systems

Shi C, Chen X, Xiao J, Dai B, Lu H (2023) Fast and accurate deep loop closing and relocalization for reliable lidar slam. arXiv:2309.08086

Steder B, Grisetti G, Burgard W (2010) Robust place recognition for 3d range data based on point features. In: IEEE international conference on robotics and automation (ICRA)

Steder B, Ruhnke M, Grzonka S, Burgard W (2011) Place recognition in 3d scans using a combination of bag of words and point feature based relative pose estimation. In: IEEE/RSJ international conference on intelligent robots and systems (IROS)

Sun Q, Liu H, He J, Fan Z, Du X (2020) Dagc: employing dual attention and graph convolution for point cloud based place recognition. In: Proceedings of the 2020 international conference on multimedia retrieval

Tang TY, De Martini D, Newman P (2021) Get to the point: Learning lidar place recognition and metric localisation using overhead imagery. In: Proceedings of robotics: science and systems

Tang TY, De Martini D, Wu S, Newman P (2021) Self-supervised learning for using overhead imagery as maps in outdoor range sensor localization. Int J Robot Res 40:1488–1509

Thrun S (2002) Probabilistic robotics. Commun ACM 45:52–57

Tinchev G, Penate-Sanchez A, Fallon M (2019) Learning to see the wood for the trees: deep laser localization in urban and natural environments on a CPU. IEEE Robot Autom Lett 4:1327–1334

Tinchev G, Nobili S, Fallon M (2018) Seeing the wood for the trees: reliable localization in urban and natural environments. In: IEEE/RSJ international conference on intelligent robots and systems (IROS)

Tomono M (2020) Loop detection for 3d lidar slam using segment-group matching. Adv Robot 34:1530–1544

Uy MA, Lee GH (2018) Pointnetvlad: deep point cloud based retrieval for large-scale place recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, vol 30

Vidanapathirana K, Moghadam P, Harwood B, Zhao M, Sridharan S, Fookes C (2021) Locus: lidar-based place recognition using spatiotemporal higher-order pooling. In: IEEE international conference on robotics and automation (ICRA)

Vidanapathirana K, Ramezani M, Moghadam P, Sridharan S, Fookes C (2022) Logg3d-net: locally guided global descriptor learning for 3d place recognition. In: International conference on robotics and automation (ICRA)

Vosselman G, Maas HG (eds) (2010) Airborne and terrestrial laser scanning

Waikhom L, Patgiri R (2022) A survey of graph neural networks in various learning paradigms: methods, applications, and challenges. Artif Intell Rev 56(7):6295–6364

Wandinger U (2005) In: Weitkamp C (ed), Introduction to lidar, pp 1–18. Springer, New York

Wang Q, Tan Y, Mei Z (2020) Computational methods of acquisition and processing of 3d point cloud data for construction applications. Arch Comput Methods Eng 27:479–499

Wang Z, Shen Y, Cai B, Saleem MT (2019) A brief review on loop closure detection with 3D point cloud. In: IEEE international conference on real-time computing and robotics (RCAR)

Wang Y, Sun Z, Xu C-Z, Sarma SE, Yang J, Kong H (2020) Lidar iris for loop-closure detection. In: IEEE/RSJ international conference on intelligent robots and systems (IROS)

Wang H, Wang C, Xie L (2020) Intensity scan context: coding intensity and geometry relations for loop closure detection. In: IEEE international conference on robotics and automation (ICRA)

Wang W, Zhao W, Wang X, Jin Z, Li Y, Runge T (2019) A low-cost simultaneous localization and mapping algorithm for last-mile indoor delivery. In: International conference on transportation information and safety (ICTIS)

Wasenmüller O, Stricker D (2016) Comparison of Kinect v1 and v2 depth images in terms of accuracy and precision. In: Computer vision–ACCV workshops, Taipei, Taiwan, November 20-24. Revised Selected Papers, Part II 13, pp 34–45

Wiesmann, L, Marcuzzi R, Stachniss C, Behley J (2022) Retriever: point cloud retrieval in compressed 3d maps. In: Proceedings of the IEEE international conference on robotics and automation (ICRA)

Wiesmann L, Milioto A, Chen X, Stachniss C, Behley J (2021) Deep compression for dense point cloud maps. IEEE Robot Autom Lett 6:2060–2067

Wietrzykowski J, Skrzypczyński P (2021) On the descriptive power of lidar intensity images for segment-based loop closing in 3-d slam. In: IEEE/RSJ international conference on intelligent robots and systems (IROS)

Witze A (2020) Nasa has launched the most ambitious mars rover ever built: here’s what happens next. Nature 584:15–16

Xiang G, Huang Y, Yu J, Zhu M, Su J (2018) Intelligence evolution for service robot: an ADRC perspective. Control Theory Technol 16:324–335

Article   MathSciNet   Google Scholar  

Xiang H, Shi W, Fan W, Chen P, Bao S, Nie M (2021) Fastlcd: a fast and compact loop closure detection approach using 3d point cloud for indoor mobile mapping. Int J Appl Earth Observ Geoinf 102:102430

Xia Y, Xu Y, Li S, Wang R., Du, J., Cremers, D., Stilla, U.: Soe-net: A self-attention and orientation encoding network for point cloud based place recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)

Xie S, Pan C, Peng Y, Liu K, Ying S (2020) Large-scale place recognition based on camera-lidar fused descriptor. Sensors 20:2870

Article   PubMed Central   Google Scholar  

Xu T-X, Guo Y-C, Lai Y-K, Zhang S-H (2021) Transloc3d: point cloud based large-scale place recognition using adaptive receptive fields. arXiv:2105.11605

Xu Y, Stilla U (2021) Toward building and civil infrastructure reconstruction from point clouds: a review on data and key techniques. IEEE J Select Top Appl Earth Obs Remote Sens 14:2857–2885

Xu X, Yin H, Chen Z, Li Y, Wang Y, Xiong R (2021) Disco: differentiable scan context with orientation. IEEE Robot Autom Lett 6:2791–2798

Ye T, Yan X, Wang S, Li Y, Zhou F (2022) An efficient 3-d point cloud place recognition approach based on feature point extraction and transformer. IEEE Trans Instrum Meas 71:1–9

Yin P, Wang F, Egorov A, Hou J, Jia Z, Han J (2021) Fast sequence-matching enhanced viewpoint-invariant 3-d place recognition. IEEE Trans Ind Electron 69:2127–2135

Yin P, Xu L, Feng Z, Egorov A, Li B (2021) Pse-match: a viewpoint-free place recognition method with parallel semantic embedding. IEEE Trans Intell Transp Syst 23:11249–11260

Yin P, Xu L, Zhang J, Choset H (2021) Fusionvlad: a multi-view deep fusion networks for viewpoint-free 3d place recognition. IEEE Robot Autom Lett 6:2304–2310

Yin H, Xu X, Wang Y, Xiong R (2021) Radar-to-lidar: heterogeneous place recognition via joint learning. Front Robot AI 8:661199

Yin H, Tang L, Ding X, Wang Y, Xiong R (2018) Locnet: global localization in 3d point clouds for mobile vehicles. In: IEEE intelligent vehicles symposium (IV), pp 728–733

Yin P, Wang F, Egorov A, Hou J, Zhang J, Choset H (2020) Seqspherevlad: sequence matching enhanced orientation-invariant place recognition. In: IEEE/RSJ international conference on intelligent robots and systems (IROS)

Yin P, Xu L, Liu Z, Li L, Salman H, He Y, Xu W, Wang H, Choset H (2018) Stabilize an unsupervised feature learning for lidar-based place recognition. In: IEEE/RSJ international conference on intelligent robots and systems (IROS)

Yin H, Xu X, Lu S, Chen X, Xiong R, Shen S, Stachniss C, Wang Y (2023) A survey on global lidar localization. arXiv:2302.07433

Yin P, Zhao S, Cisneros I, Abuduweili A, Huang G, Milford M, Liu C, Choset H, Scherer S (2022) General place recognition survey: towards the real-world autonomy age. arXiv:2209.04497

Yudin D, Solomentsev Y, Musaev R, Staroverov A, Panov AI (2023) Hpointloc: point-based indoor place recognition using synthetic RGB-D images. In: Neural information processing: 29th international conference

Zaffar M, Garg S, Milford M, Kooij J, Flynn D, McDonald-Maier K, Ehsan S (2021) Vpr-bench: an open-source visual place recognition evaluation framework with quantifiable viewpoint and appearance change. Int J Comput Vis 129:2136–2174

Zaganidis A, Zerntev A, Duckett T, Cielniak G (2019) Semantically assisted loop closure in slam using NDT histograms. In: IEEE/RSJ international conference on intelligent robots and systems (IROS)

Zennaro S (2014) Evaluation of Microsoft Kinect 360 and Microsoft Kinect One for robotics and computer vision applications

Zhang X, Wang L, Su Y (2021) Visual place recognition: a survey from deep learning perspective. Pattern Recognit 113:107760

Zhang L, Ghosh BK (2000) Line segment based map building and localization using 2d laser rangefinder. In: IEEE international conference on robotics and automation (ICRA). Symposia Proceedings (Cat. No. 00CH37065)

Zhang W, Xiao C (2019) Pcan: 3d attention map learning using contextual information for point cloud based retrieval. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)

Zhou Z, Zhao C, Adolfsson D, Su S, Gao Y, Duckett T, Sun L (2021) Ndt-transformer: large-scale 3d point cloud localisation using the normal distribution transform representation. In: IEEE international conference on robotics and automation (ICRA)

Zhuang Y, Jiang N, Hu H, Yan F (2012) 3-d-laser-based scene measurement and place recognition for mobile robots in dynamic indoor environments. IEEE Trans Instrum Meas 62:438–450

Zhu Y, Ma Y, Chen L, Liu C, Ye M, Li L (2020) Gosmatch: graph-of-semantics matching for detecting loop closures in 3D lidar data. In: IEEE/RSJ International conference on intelligent robots and systems (IROS)

Zimmerman N, Guadagnino T, Chen X, Behley J, Stachniss C (2023) Long-term localization using semantic cues in floor plan maps. IEEE Robot Autom Lett 8:176–183

Żywanowski K, Banaszczyk A, Nowicki MR, Komorowski J (2021) MinkLoc3D-SI: 3D lidar place recognition with sparse convolutions, spherical coordinates, and intensity. IEEE Robot Autom Lett 7:1079–1086

Żywanowski K, Banaszczyk A, Nowicki MR (2020) Comparison of camera-based and 3d lidar-based place recognition across weather conditions. In: International conference on control, automation, robotics and vision (ICARCV)

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant U2013203, 62373140, U21A20487, U1913202, 62103137, and 61973106, the National Key Research and Development Program of China(2023YFB4704503), the Project of Science Fund for Distinguished Young Scholars of Hunan Province (2021JJ10024), the Leading Talents in Science and Technology Innovation of Hunan Province (2023RC1040), the Natural Science Fund of Hunan Province (2022JJ40100, 2022JJ30024), the Project of Talent Innovation and Sharing Alliance of Quanzhou City(2021C062L), the Key Research and Development Project of Science and Technology Plan of Hunan Province (2022GK2014), the Changsha Normal University Cultivation Project(K84022011). Professor Ajmal Mian is the recipient of an Australian Research Council Future Fellowship Award (project number FT210100268) funded by the Australian Government.

Author information

Authors and affiliations.

Science Teaching and Research Section, Changsha Normal University, Changsha, 410100, China

National Engineering Laboratory for Robot Visual Perception and Control Technology, College of Electrical and Information Engineering, Hunan University, Changsha, 410082, China

Kan Luo, Hongshan Yu, Zhengeng Yang, Jingwen Wang & Panfei Cheng

College of Intelligence Science and Technology, National University of Defense Technology, Changsha, 410003, China

Xieyuanli Chen

College of Engineering and Design, Hunan Normal University, Changsha, 410081, China

Zhengeng Yang

Department of Computer Science, The University of Western Australia, Perth, WA, 6009, Australia

You can also search for this author in PubMed   Google Scholar

Contributions

K.L. wrote the main manuscript text, H.Y. conceived and provided overall revisions for the article, X.C. adjusted and refined the manuscript's structure, Z.Y. conducted the analysis and made revisions. J.W. contributed to the manuscript's framework, P.C. performed data analysis, and A.M. provided final polishing and revisions. All authors reviewed the manuscript.

Corresponding author

Correspondence to Hongshan Yu .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Luo, K., Yu, H., Chen, X. et al. 3D point cloud-based place recognition: a survey. Artif Intell Rev 57 , 83 (2024). https://doi.org/10.1007/s10462-024-10713-6

Download citation

Accepted : 23 January 2024

Published : 07 March 2024

DOI : https://doi.org/10.1007/s10462-024-10713-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • 3D point cloud
  • Place recognition
  • Localization
  • Find a journal
  • Publish with us
  • Track your research

Help | Advanced Search

Computer Science > Artificial Intelligence

Title: ai insights: a case study on utilizing chatgpt intelligence for research paper analysis.

Abstract: This paper discusses the effectiveness of leveraging Chatbot: Generative Pre-trained Transformer (ChatGPT) versions 3.5 and 4 for analyzing research papers for effective writing of scientific literature surveys. The study selected the \textit{Application of Artificial Intelligence in Breast Cancer Treatment} as the research topic. Research papers related to this topic were collected from three major publication databases Google Scholar, Pubmed, and Scopus. ChatGPT models were used to identify the category, scope, and relevant information from the research papers for automatic identification of relevant papers related to Breast Cancer Treatment (BCT), organization of papers according to scope, and identification of key information for survey paper writing. Evaluations performed using ground truth data annotated using subject experts reveal, that GPT-4 achieves 77.3\% accuracy in identifying the research paper categories and 50\% of the papers were correctly identified by GPT-4 for their scopes. Further, the results demonstrate that GPT-4 can generate reasons for its decisions with an average of 27\% new words, and 67\% of the reasons given by the model were completely agreeable to the subject experts.

Submission history

Access paper:.

  • Download PDF
  • HTML (experimental)
  • Other Formats

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

This paper is in the following e-collection/theme issue:

Published on 6.3.2024 in Vol 11 (2024)

Leveraging Generative AI Tools to Support the Development of Digital Solutions in Health Care Research: Case Study

Authors of this article:

Author Orcid Image

Original Paper

  • Danissa V Rodriguez 1 , PhD   ; 
  • Katharine Lawrence 1, 2 , MPH, MD   ; 
  • Javier Gonzalez 2 , BS   ; 
  • Beatrix Brandfield-Harvey 1 , BA   ; 
  • Lynn Xu 1 , MPH   ; 
  • Sumaiya Tasneem 1 , MPH   ; 
  • Defne L Levine 1 , MPH   ; 
  • Devin Mann 1, 2 , MS, MD  

1 Department of Population Health, New York University Grossman School of Medicine, New York, NY, United States

2 Medical Center Information Technology, Department of Health Informatics, New York University Langone Health, New York, NY, United States

Corresponding Author:

Danissa V Rodriguez, PhD

Department of Population Health

New York University Grossman School of Medicine

227 East 30th Street

New York, NY, 10016

United States

Phone: 1 646 501 2684

Email: [email protected]

Background: Generative artificial intelligence has the potential to revolutionize health technology product development by improving coding quality, efficiency, documentation, quality assessment and review, and troubleshooting.

Objective: This paper explores the application of a commercially available generative artificial intelligence tool (ChatGPT) to the development of a digital health behavior change intervention designed to support patient engagement in a commercial digital diabetes prevention program.

Methods: We examined the capacity, advantages, and limitations of ChatGPT to support digital product idea conceptualization, intervention content development, and the software engineering process, including software requirement generation, software design, and code production. In total, 11 evaluators, each with at least 10 years of experience in fields of study ranging from medicine and implementation science to computer science, participated in the output review process (ChatGPT vs human-generated output). All had familiarity or prior exposure to the original personalized automatic messaging system intervention. The evaluators rated the ChatGPT-produced outputs in terms of understandability, usability, novelty, relevance, completeness, and efficiency.

Results: Most metrics received positive scores. We identified that ChatGPT can (1) support developers to achieve high-quality products faster and (2) facilitate nontechnical communication and system understanding between technical and nontechnical team members around the development goal of rapid and easy-to-build computational solutions for medical technologies.

Conclusions: ChatGPT can serve as a usable facilitator for researchers engaging in the software development life cycle, from product conceptualization to feature identification and user story development to code generation.

Trial Registration: ClinicalTrials.gov NCT04049500; https://clinicaltrials.gov/ct2/show/NCT04049500

Introduction

Health care has undergone a digital transformation, resulting in a growing reliance on software engineering for medical use cases, including health care research. However, little guidance exists for health researchers on how to effectively develop digital health interventions [ 1 ]; in particular, software development challenges that include expertise gaps in coding, custom development needs, high costs, and time constraints result in multilevel barriers to designing and deploying a usable, scalable, and sustainable digital health product [ 1 ].

Generative artificial intelligence (GenAI) technologies such as ChatGPT can potentially support researchers in health technology endeavors by providing foundational frameworks and processes for the software development life cycle [ 2 ]. These systems can help reduce time and enhance precision for technology-based research projects by supporting both nonprogrammers and experienced programmers in code development, troubleshooting, and cleaning [ 2 ]. Moreover, the ability to use GenAI to generate content from different perspectives (expert or nonexpert) can facilitate and improve communication between technical and nontechnical team members of multidisciplinary teams. For example, a nontechnical team member can write their ideas in natural text and then use GenAI to request assistance in creating discussion points to communicate to a technical team audience. GenAI tools may also help health technology researchers refine research questions, identify appropriate theoretical frameworks and models, and leverage popular implementation strategies such as design thinking to build effective, theory-grounded, and evidence-based digital health interventions. ChatGPT (OpenAI, Microsoft Corporation) has already demonstrated feasibility as a support tool for clinical decision support development in health care [ 3 ], and more broadly as a coding copilot in programming and engineering [ 4 , 5 ].

This study explores the use of ChatGPT to recreate a personalized automatic messaging system (PAMS), which was developed as part of a digital health research initiative to support patient engagement with a commercial digital diabetes prevention program (dDPP). We examine the capacity, advantages, and limitations of ChatGPT to support product ideation and conceptualization, intervention content development, and the software engineering process including software requirement generation, software design, and code production. This paper provides insights to support the GenAI-assisted development of computational tools that are usable, reliable, extensible, and in line with the standards of modern coding practices. The framework includes prompts for both the intervention conceptualization as well as the main phases of the software development process.

Settings and Intervention Development Context

In previous work [ 6 ], we described the development of PAMS, a novel integrated multicomponent communications platform, to promote patient-provider communication and patient engagement in a commercial dDPP (Noom; Noom, Inc). The PAMS intervention included early prototyping and user testing, a technical development phase, and a randomized controlled trial. The core content and user experience features of PAMS were identified, prototyped, and evaluated using the well-established design thinking “discover, define, design, and test” approach to iteratively gather information, define, design, and refine the engagement intervention [ 7 ]. Stakeholders included: patients with prediabetes and their support network (eg, caregivers and partners), primary care providers, health technologists, programmers and computer scientists, behavioral change theorists and subject matter experts, the research administrative team, and dDPP product developers and coaches. The main components of this PAMS intervention include (1) a theory-driven behavior change messaging library, (2) a personalized automated message system delivery platform (SMS text messaging–based), and (3) EHR-integrated data visualizations. The PAMS messaging library uses an integrated framework that combines established theoretical models for behavior change with human-centered design strategies to maximize the evidence-based conditions for behavior change and the user acceptance and use of a digital health product. The technical development of PAMS followed an agile software development approach based on incremental 2-week sprint cycles consisting of requirement planning, design, development, and testing of a specific set of functional features. In this paper, we will recreate this development process using GenAI (ChatGPT).

ChatGPT-PAMS Experiment Design

To evaluate the effectiveness of using GenAI to support the development of digital tools in medical settings, our experiment is based on recreating PAMS using GenAI (ChatGPT) and evaluating human-generated vs ChatGPT-generated documentation. To accurately capture the ideation and development process, our multidisciplinary team reviewed all documentation and processes used in the early stages of PAMS conceptualization, including supporting theoretical models, content and features, and technical development. We then recreated these processes via a series of prompts for ChatGPT-4 to assist with the generation of theory, content, user stories, requirement documents, design diagrams, and the code for a subset of the requirements. Outputs from ChatGPT were reviewed and compared to human-generated documentation by 11 evaluating team members. Evaluators consisted of clinicians, behavioral scientists, programmers, and research staff working in digital health and technology for behavior change research. Collectively, they represent more than 50 years of clinical, research, design, and computer science experience. The evaluators independently rated the quality of various aspects of information provided by ChatGPT on a Likert scale, where higher ratings indicated greater quality of information ( 1: very poor; 2: poor; 3: acceptable; 4: good; 5: very good; N/A: not applicable ). Aspects of evaluation included: understandability (Does this output make sense given the context of the study and prompts?), novelty (Were new ideas generated?) [ 3 ], usability (Does this create a usable output?), relevance (Does this create a useful output?), efficiency (Would having these outputs have saved time?), and potential for bias (What unintended consequences might arise from these outputs?) [ 6 ]. Evaluators were also asked to give an overall score on the quality of the ChatGPT output (Overall, how good would you say this output is?). Post review, a group debrief was conducted, using a semistructured interview guide to facilitate discussion regarding perceptions of outputs and rationale for ratings.

Ethical Considerations

Ethical considerations helped guide the initial development of research methods and reduce potential risks for participants in the original study implementation with the PAMS intervention [ 7 ]. Recreating the technical development of a system previously built as part of the dDPP randomized controlled trial (NCT04049500) has not introduced any new risks to patients. Patients were not involved in this research examining the use of GenAI in the development of digital health care solutions. No patient data was used in the prompt generation phase.

Regarding ethical considerations for the methods used in this paper, as an attempt to mitigate evaluator biases, we worked with a diverse team of evaluators who were aware of the initial study but were not necessarily involved in the technical development. Additionally, we understand the limitations and concerns of the use of ChatGPT including possible hallucinations and incorrect answers. Thus, we emphasize the need for human expertise to identify correct and incorrect outputs and have flagged this as a study consideration. When developing the GenAI-based solution, we used the same considerations for data security, patient usability, accessibility, and data privacy used in the original human-developed solution.

Prompt Generation Framework

Prompt engineering focuses on the skill of designing and creating effective prompts that guide ChatGPT to produce the best possible output for your task. We followed existing literature [ 8 - 11 ] combined with our expertise and experimentation to provide a framework that yields the best result when developing a digital solution like PAMS ( Figure 1 ).

artificial intelligence in robotics research paper

PAMS Concept and User Experience Generation

Core components of the PAMS intervention were conceptualized and designed via an underlying behavior change theory, design principles and personas, and a message content library.

Underpinning Behavior Change Theory and Approach

Human-generated solution.

Leveraging behavior change literature review and interviews with behavior change theory content experts (n=4), the research team initially identified ten unique behavior change theories and six process models that were considered to be an appropriate fit for the aims of the overall intervention. A unique model was developed that captured (1) the relevant underlying behavior change theory, (2) implementation strategies, and (3) unique contexts of the technology environment ( Figure 2 A).

artificial intelligence in robotics research paper

GenAI Solution

When prompted, ChatGPT identified seven relevant well-accepted behavior change theories and frameworks to inform a “dDPP support intervention” ( Figure 2 B). It did not provide information on the transtheoretical domains framework, or the taxonomy of behavior change techniques, but when prompted on these, identified both as acceptable strategies for use.

User Experience: Design Principles, Personas, and Messaging Content

The research team used a human-centered design approach to identify key design principles, defined as the set of considerations that form the basis of the PAMS product ( Figure 3 B). These were developed from insights gathered via a review of relevant digital behavior change research, consultation with content and theoretical experts in digital health and implementation science (n=3), and two group interviews (n=9). From these insights, five relevant fictional personas were designed to capture the various phenotypes of user engagement with the commercial dDPP, along with unique user journeys developed to describe their projected engagement with the program over time ( Figure 3 D). Overall, over 193 unique messages were developed, each grounded by a relevant behavior change technique and tailored to an individual phenotype’s user journey. These elements were continuously revisited and refined during the testing phases of the dDPP research. This included a 6-month near-live user testing phase consisting of nine patients engaging with various iterations of the PAMS prototype, and a 12-month live single-arm pilot phase consisting of 25 patients using PAMS-beta with the commercial dDPP platform.

artificial intelligence in robotics research paper

ChatGPT was prompted from multiple perspectives (researcher, clinician, and patient) to identify key design principles ( Figure 3 A) and sample solutions for the PAMS intervention. It also provided common engagement phenotypes for digital health tool users, based on patterns of use, frequency, duration, and “other elements.” Of note, nonadopters were not identified within the initial round of phenotypes. ChatGPT also developed personas for each of the identified engagement phenotypes, including persona names, backgrounds, and individual journeys. ChatGPT was able to produce five to ten unique messages targeted toward each phenotype and to adapt these messages based on various additional prompts. The user types or personas generated by ChatGPT are consistent with the human-generated users and cover all the phenotypes identified in our previous research (eg, mapping to a specific behavior change technique and reflecting a key design principle; Figure 3 C).

PAMS Technical Development

The technical development includes a PAMS requirements document and architectural design and code.

Technical Requirements (User Stories)

Following the data collection and intervention design period, we created, as a team, a series of user stories ( Figure 4 B) which were followed by system requirements to describe the intended use cases, features, and challenges of the proposed PAMS software. Initial system requirements represent the “minimum viable product” that was developed, piloted, and further refined ( Figure 4 D). Our development team followed software engineering principles to generate the requirements document.

artificial intelligence in robotics research paper

We used the output of the “feature construction phase” to inform the GenAI output for requirements. During the initial stages of the prompting phase, we refrained from suggesting solutions, allowing ChatGPT to generate potential solutions autonomously. We reviewed and evaluated these outputs, eliminating impractical or incompatible solution paths that did not align with the intentions or capabilities of our team. Once we reached a satisfactory outcome but faced uncertainty regarding the next steps, we instructed ChatGPT to assume a different “personality” (eg, software architect) and used the previous outputs as a foundation for the new role’s initial prompts. Throughout this process, we encouraged each “personality” to seek clarifications by asking questions and provided feedback without biasing toward any predetermined solution. We repeated this process at least four times for each personality type, engaging in a back-and-forth roleplay with multiple personalities (researcher, architect, and developer), transitioning to a different personality when it became evident that the current one could no longer progress without additional feedback ( Figures 4 A and 4C).

Architectural Design

After the requirement phase, our software development team developed the PAMS architectural diagram, which is a graphical representation of the system that includes (1) a set of components (eg, a database and computational modules) that will perform a function required by the system; (2) the set of connectors that will help in coordination, communication, and cooperation between the components; and (3) conditions for how components can be integrated to form the system ( Figure 5 B).

artificial intelligence in robotics research paper

For the GenAI-generated architectural design, we leveraged the outputs of the requirement phase and the available ChatGPT plugins to designate the GenAI model as a software engineer and proceeded to develop an architectural diagram. During this process, we engaged in iterative prompting and provided explicit instructions to ChatGPT, specifying the use of Amazon Web Services (AWS) for development, integration of external systems such as Twilio (Twilio Inc) and REDCap (Research Electronic Data Capture; Vanderbilt University), and the adoption of a microservice approach to facilitate the efforts of our development team ( Figure 5 A).

PAMS components include several lambda functions that execute its engagement or adherence algorithm, messaging, and data manipulation functionalities. Most of the functions are coded and developed using Python (Python Software Foundation) and Scala (École Polytechnique Fédérale Lausanne) as programming languages. AWS was used for the development of PAMS [ 12 ]. Our developers followed our microservice approach design using an event-driven model [ 13 , 14 ]. The main components of PAMS are AWS lambda functions which are triggered by different events such as updates to S3 buckets, modifications on DynamoDB (AWS) tables, or CloudWatch (AWS) events. External interactions of PAMS use application programming interface calls, which secure effective data transfer ( Figure 6 B).

artificial intelligence in robotics research paper

To facilitate the generation of the coded solution using ChatGPT, we assigned the role of a software engineer to the model and specifically requested it to generate Scala code for a specific functionality, namely the “calculate engagement trends” function. Consistent with the iterative nature of the GenAI-based software development process, we engaged in a back-and-forth interaction with ChatGPT, iterating over the prompt and its output while providing expert guidance to ensure optimal results. While allowing ChatGPT to generate free text, we evaluated each output for accuracy and adherence to the desired specifications ( Figure 6 A).

Internal Review of Human Vs GenAI Outputs

The 11 evaluators participated in the output review process. All had familiarity or prior exposure to the original PAMS intervention. Overall, evaluators rated the ChatGPT-produced outputs as positive for the theoretical background and design phase in terms of understandability, usability, novelty, relevance, and efficiency. For these two components, the question about completeness showed the most variability with divided opinion among “agree” and “disagree” and the bias was mostly categorized as “neither agree nor disagree.” For the first part of the technical development (user stories and requirement documents), most of the raters found the ChatGPT output positive in terms of understandability, usability, and relevance. In terms of completeness and novelty, requirements were better rated than the user stories which represent an interesting output since requirements are derived from the user stories. We hypothesize that our raters were expecting better user stories, but once these were defined, they considered ChatGPT to be effective at turning these into the requirements. In terms of bias, similar to the theoretical background and design phase, the most popular answer was “neither agree nor disagree.” For the more technical pieces of the development that required software engineering knowledge, specifically the architectural diagram and code elements, results showed the highest N/A responses. These higher levels of N/As were associated with lower levels of expertise (eg, coding experience) since only 2 of the 11 evaluators had computer science backgrounds. However, the overall score excluding the N/As was positive for the technical component.

Results Summary

This study leveraged ChatGPT-4 to recreate content features and software development of PAMS. ChatGPT served as a usable facilitator for researchers engaging in the software development life cycle, from product conceptualization to feature identification, and user story development to code generation. GenAI technologies facilitated effective communication and understanding within our multidisciplinary team by providing well-described features and supporting the role of a software engineer. Our findings indicate that the ChatGPT-generated output is comprehensive, albeit with occasional ambiguities that required clarification or adjustment by the research team. The ChatGPT-generated output exhibited a high level of accuracy in capturing the intended requirements. We found that ChatGPT supported a highly efficient development process, producing over 5 days what initially required more than 200 human hours from content and technical experts. The results suggested that by efficiently prompting ChatGPT and leveraging the expertise of our team, we could have significantly reduced the time we invested in initial system modeling and conceptualization phases as well as technical phases of software development (coding). Overall, GenAI technologies like ChatGPT offer a promising approach to efficient software development.

While promising, some significant limitations to ChatGPT’s outputs should be noted. In the design phase, while ChatGPT was able to provide general guidance in tool design (eg, app vs web-based vs EHR solution) it was unable to provide evidence to support its rationale for these choices. This lack of reference support has been well-documented and has a material impact on researchers looking to build upon an evidence base for their health technology interventions. Similarly, when asked to provide theoretical frameworks to support behavior change, it offered only a partial list, initially excluding the COM-B (capability, opportunity, motivation, behavior) model upon which the original PAMS intervention was based, and needed prompting from our behavior change expert to provide more specific guidance. In the context of code generation, we focused on testing a specific function, namely the Calculating Patient Engagement feature, which is the core functionality of our software. Initially, we tasked ChatGPT with generating a function to compute a 3-week patient engagement trend. However, the initially generated code deviated from the intended objective and instead calculated a weekly engagement score. Through subsequent iterations, we were able to obtain the desired code. However, the initial attempts exhibited nonidiomatic constructs and contained bugs (no efficient loops and wrong logic). Finally, we observed that ChatGPT overlooked certain suggested features during the design phase, resulting in the generated code occasionally demonstrating unnecessary complexity and disregarding some of the best practices and features of the target programming language. We believe that further iterations would have improved the code quality, encompassing better adherence to coding standards and the inclusion of desired business features, such as handling edge cases and capturing more nuanced engagement trends. Nevertheless, we reached a point of diminishing returns with ChatGPT where we determined that engaging an experienced developer would have expedited the code generation process and ensured a more robust implementation.

These limitations highlight the ongoing importance of human expertise in the development process, especially in scenarios where theoretical expertise, intricate coding practices, and business-specific requirements are involved. The lack of rationale to support the generated results shows the value of having human experts on the team who can interpret the results. ChatGPT needs to be used as a support tool but not the source of truth; thus, we always trusted and relied on human experts to validate the ChatGPT-generated results before moving to the next phase. Overall, it is important to have human experts in the system development process to guide the outputs in terms of reprompting the system (support the decision-making on acceptable output) and ensuring their accuracy. Moreover, results are highly dependent on the quality of the prompts which emphasizes the role of prompt engineering. The results show that well-structured prompts (role + problem description + ask) that infuse human expertise into every iteration are key to obtaining good results ( Figure 1 ). As part of our prompt framework described in the methodology section, results showed that detailed problem explanations, clear asks, and roleplaying are an excellent combination to guide accurate results. We suggest asking ChatGPT questions using different roles, asking for clarification if needed, and in cases of wrong outputs, redirecting the prompts.

Related Work

There is near-universal interest in understanding the impacts of GenAI and large language models (LLMs) on human social structures, including the experience of work and the production of work-related outputs in health care and more broadly [ 15 , 16 ]. In health care, LLMs are poised to impact everything from care delivery experience, diagnostic reasoning and cognitive skills, training and education, and the overall composition of the workforce [ 17 ]. These theoretical disruptions are tempered, however, by acknowledging that in its current state, GenAI tools remain suboptimal, with ongoing issues in accuracy, reliability, usability, cost, equity, and ethics.

In commercial spaces, ChatGPT-enabled products designed to assist with coding and software development are already being developed (eg, OpenAI Codex [OpenAI] and CodeGPT [CodeGPT]). These tools can help generate novel code, debug and analyze code issues, assist in code refactoring, and provide code documentation. As yet, however, their usefulness in terms of quality has not been extensively evaluated, and costs and other considerations may make them inaccessible to health care researchers. ChatGPT-enabled tools for front-end design (eg, integrating ChatGPT with Figma [Figma, Inc]), user testing (including synthetic user testing), and prototyping have also been created, all allowing health technology research teams with limited design resources to take advantage of tools from product and experience design to create their interventions. Overall, commercial LLMs have been demonstrated to improve worker efficiency and productivity, through “co-pilot” support services that automate low-skills tasks, organize and present information, and surface insights [ 18 ]. Brynjolfsson et al [ 18 ] found that a ChatGPT-supported tool providing conversational guidance for customer support agents increased worker productivity by almost 14%. The authors further found that these productivity benefits accrued disproportionately to less-experienced and lower-skill workers, allowing less-skilled or newer workers to experience more rapid gains; the authors posit that high-skill workers may have less to gain from artificial intelligence assistance due to tacit knowledge reinforcement rather than new knowledge or skill development. Our work suggests that both less-experienced, lower-skill workers and high-skill workers can benefit, with novices benefitting more from new knowledge (if accurate) and skill development and experts benefiting from knowledge validation and offloading of high-effort low-value tasks.

In the academic computer science literature, ChatGPT has been evaluated as a tool for collaborative software design [ 4 ], including to improve code quality refactoring, requirements elicitation, and general design solutions [ 5 ], and fix programming bugs [ 19 ]. Similar findings are reflected in our work, including the caveats of requiring human oversight. Other authors have identified important ethical issues in using GenAI solutions for software engineering, which were not considered in this study [ 20 ].

Within health care, a growing body of research has explored the feasibility of GenAI tools (mostly ChatGPT) in a variety of use cases, including answering patient questions [ 3 , 21 ], creating suggestions to optimize clinical decision support [ 22 ], generating a history of present illness summaries [ 23 ], and overall examination performance [ 24 ]. In general, these papers find promising signals for the accurate and acceptable use of GenAI tools, but with many current-state caveats for their optimal, safe, and scaled use. Key areas of concern include reliability (particularly around hallucinations and citation fabrication), reproducibility, and recency of data inputs. While research in this area will continue to grow, as more test cases comparing GenAI performance to that of clinical staff will be undertaken, further work is needed to create validated and generalizable outcome measures. Future work must also ensure that the variety of GenAI tools (including general commercial LLMs, health care–specific LLMs, and internally developed tools) are equally evaluated.

Limitations

There are several limitations to this study. First, no research team members have expertise in prompt generation for GenAI tools; as a result, our prompting reflects the a priori perspectives, biases, and knowledge gaps of our team, and are therefore particularly subject to issues of framing, recall, and confirmation bias that may influence the interpretation of the results. Second, our research team members, who acted as prompt engineers in this study, were highly familiar with the project and participated in the human-based design process; thus, they were aware of what deviations from human-based design to address by reprompting the system. As a result, we have introduced bias in the prompting process and results reflect higher accuracy. Third, the absence of robust tools to objectively measure the “quality” of current ChatGPT outputs poses challenges to accurately and objectively assess its performance. Furthermore, in this case, the output reviewers were not blinded to the human vs ChatGPT outputs, given the complexity of this study and the difficulty in providing enough research context to support independent blind review. Finally, broader limitations of the technology, such as potential hallucinations and concerns about behavioral changes of responses over time, deserve acknowledgment, as they could have implications for the practical applications and long-term viability of GenAI in health care research contexts. Future research efforts should address these limitations to enhance and replicate our findings.

Implications and Future Directions for Exploration

We are considering several future directions for the use of ChatGPT in our digital health intervention development. We envision increasing our expertise in prompt engineering (add expert prompt engineers to the team) to actively use ChatGPT to further develop PAMS features, particularly for additional messaging content. We anticipate this will save our research team considerable time and effort. We may also use ChatGPT to facilitate more time-consuming aspects of our research documentation, including both coding documentation and larger research archival work (eg, meeting minutes and recording intervention decision-making). Overall, we feel ChatGPT and related tools can be effectively leveraged within health care technology research teams with a spectrum of technical expertise, serving to both augment existing skills and supplement skill gaps. For those with expertise in computer science or programming, we imagine ChatGPT can assist by automating high-effort, low-impact tasks or repetitive work that is considered important but often deprioritized as more urgent tasks arise (eg, code documentation). For those without preexisting programming skills, we imagine ChatGPT can offer technical support, including educational tools and skill-building opportunities. Overall, this process will both validate existing knowledge and create new knowledge for teams, as well as potentially improve interteam communication and collaboration.

Conclusions

In this study, we explored the use of the GenAI tool ChatGPT to recreate a novel digital behavior change intervention which our research team had previously developed to support patient engagement and adherence to a commercial dDPP. Specifically, we reviewed and evaluated the capacity and limitations of ChatGPT to support digital health research intervention ideation, design, and software development, finding it a feasible and potential time- and resource-saving tool to support research teams in developing novel digital health products and technologies. At the same time, we identified gaps in ChatGPT outputs that may limit its effective use for both novel and advanced technology developers, particularly around the completeness of outputs. Future directions will include the development of more targeted artificial intelligence–based tools to support health care researchers with all levels of software or engineering skills, as well as the development of improved tools to objectively evaluate GenAI outputs.

Acknowledgments

The National Institute of Diabetes and Digestive and Kidney Diseases (1R18DK118545-01A1, principal investigator: DM) funded this research.

Conflicts of Interest

None declared.

  • Risling TL, Risling DE. Advancing nursing participation in user-centred design. J Res Nurs. 2020;25(3):226-238. [ CrossRef ] [ Medline ]
  • Dave T, Athaluri SA, Singh S. ChatGPT in medicine: an overview of its applications, advantages, limitations, future prospects, and ethical considerations. Front Artif Intell. 2023;6:1169595. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Liu S, Wright AP, Patterson BL, Wanderer JP, Turer RW, Nelson SD, et al. Using AI-generated suggestions from ChatGPT to optimize clinical decision support. J Am Med Inform Assoc. 2023;30(7):1237-1245. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Ahmad A, Waseem M, Liang P, Fahmideh M, Aktar MS, Mikkonen T. Towards human-bot collaborative software architecting with ChatGPT. 2023 Presented at: EASE '23: Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering; June 14-16, 2023, 2023;279-285; Oulu, Finland. [ CrossRef ]
  • White J, Hays S, Fu Q, Spencer-Smith J, Schmidt DC. ChatGPT prompt patterns for improving code quality, refactoring, requirements elicitation, and software design. ArXiv. Preprint posted online on March 11 2023. [ FREE Full text ] [ CrossRef ]
  • Rodriguez DV, Lawrence K, Luu S, Yu JL, Feldthouse DM, Gonzalez J, et al. Development of a computer-aided text message platform for user engagement with a digital diabetes prevention program: a case study. J Am Med Inform Assoc. 2021;29(1):155-162. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Lawrence K, Rodriguez DV, Feldthouse DM, Shelley D, Yu JL, Belli HM, et al. Effectiveness of an integrated engagement support system to facilitate patient use of digital diabetes prevention programs: protocol for a randomized controlled trial. JMIR Res Protoc. 2021;10(2):e26750. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Liu V, Chilton LB. Design guidelines for prompt engineering text-to-image generative models. 2022 Presented at: CHI '22: Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems; April 29-May 5, 2022, 2022;1-23; New Orleans, LA. [ CrossRef ]
  • Zhou Y, Muresanu AI, Han Z, Paster K, Pitis S, Chan H, et al. Large language models are human-level prompt engineers. ArXiv. Preprint posted online on November 03 2022. [ FREE Full text ]
  • White J, Fu Q, Hays S, Sandborn M, Olea C, Gilbert H, et al. A prompt pattern catalog to enhance prompt engineering with chatGPT. ArXiv. Preprint posted online on February 21 2023. [ FREE Full text ] [ CrossRef ]
  • Wang J, Liu Z, Zhao L, Wu Z, Ma C, Yu S, et al. Review of large vision models and visual prompt engineering. Meta-Radiology. 2023;1(3):100047. [ FREE Full text ] [ CrossRef ]
  • Nadareishvili I, Mitra R, McLarty M, Amundsen M. Microservice Architecture: Aligning Principles, Practices, and Culture. Sebastopol, CA. O'Reilly Media, Inc; 2016.
  • Alshuqayran N, Ali N, Evans R. A systematic mapping study in microservice architecture. 2016 Presented at: 2016 IEEE 9th International Conference on Service-Oriented Computing and Applications (SOCA); November 4-6, 2016, 2016;44-51; Macau, China. [ CrossRef ]
  • Mathew S, Varia J. Overview of Amazon Web Services. Amazon Whitepapers. 2014. URL: https://d1.awsstatic.com/white papers/aws-overview.pdf [accessed 2024-02-01]
  • Eloundou T, Manning S, Mishkin P, Rock D. GPTs are GPTs: an early look at the labor market impact potential of large language models. ArXiv. Preprint posted online on March 17 2023. [ FREE Full text ] [ CrossRef ]
  • The impact of artificial intelligence on the future of workforces in the European Union and the United States of America. The White House. 2022. URL: https://www.whitehouse.gov/wp-content/uploads/2022/12/TTC-EC-CEA-AI-Report-1205 2022-1.pdf [accessed 2024-01-18]
  • Thirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting DSW. Large language models in medicine. Nat Med. 2023;29(8):1930-1940. [ CrossRef ] [ Medline ]
  • Brynjolfsson E, Li D, Raymond LR. Generative AI at work. National Bureau of Economic Research. 2023. URL: https://www.nber.org/papers/w31161 [accessed 2024-01-18]
  • Surameery NMS, Shakor MY. Use Chat GPT to solve programming bugs. Int J Inf Technol Comput Eng. 2023;3(1):17-22. [ FREE Full text ] [ CrossRef ]
  • Akbar MA, Khan AA, Liang P. Ethical aspects of ChatGPT in software engineering research. IEEE Trans Artif Intell. 2023.:1-14. [ CrossRef ]
  • Ayers JW, Poliak A, Dredze M, Leas EC, Zhu Z, Kelley JB, et al. Comparing physician and artificial intelligence Chatbot responses to patient questions posted to a public social media forum. JAMA Intern Med. 2023;183(6):589-596. [ CrossRef ] [ Medline ]
  • Biswas S, Logan NS, Davies LN, Sheppard AL, Wolffsohn JS. Assessing the utility of ChatGPT as an artificial intelligence-based large language model for information to answer questions on myopia. Ophthalmic Physiol Opt. 2023;43(6):1562-1570. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Nayak A, Alkaitis MS, Nayak K, Nikolov M, Weinfurt KP, Schulman K. Comparison of history of present illness summaries generated by a Chatbot and senior internal medicine residents. JAMA Intern Med. 2023;183(9):1026-1027. [ CrossRef ] [ Medline ]
  • Strong E, DiGiammarino A, Weng Y, Kumar A, Hosamani P, Hom J, et al. Chatbot vs medical student performance on free-response clinical reasoning examinations. JAMA Intern Med. 2023;183(9):1028-1030. [ CrossRef ] [ Medline ]

Abbreviations

Edited by A Kushniruk; submitted 18.09.23; peer-reviewed by S Mulvaney, I Oakley-Girvan; comments to author 20.11.23; revised version received 27.11.23; accepted 15.12.23; published 06.03.24.

©Danissa V Rodriguez, Katharine Lawrence, Javier Gonzalez, Beatrix Brandfield-Harvey, Lynn Xu, Sumaiya Tasneem, Defne L Levine, Devin Mann. Originally published in JMIR Human Factors (https://humanfactors.jmir.org), 06.03.2024.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Human Factors, is properly cited. The complete bibliographic information, a link to the original publication on https://humanfactors.jmir.org, as well as this copyright and license information must be included.

IMAGES

  1. (PDF) Editorial on robotics and artificial intelligence

    artificial intelligence in robotics research paper

  2. (PDF) Applications of Artificial Intelligence in Machine Learning

    artificial intelligence in robotics research paper

  3. Research paper artificial intelligence. 10 most impressive Research

    artificial intelligence in robotics research paper

  4. Research Paper: Robotics and AI

    artificial intelligence in robotics research paper

  5. Artificial Intelligence in Robotics

    artificial intelligence in robotics research paper

  6. Artificial intelligence in robotics research paper

    artificial intelligence in robotics research paper

VIDEO

  1. Robot Artificial Intelligence Technology

  2. Robotics and Artificial Intelligence ~ Snippets from a project by students of Class IX

  3. NEW Figure 01 Artificial Intelligence Powered Humanoid Robot

  4. Artificial Intelligence and Robotics

  5. Artificial Intelligence||Robotics||

  6. Next Gen Robots: NEW AI Unlocks 5 Key Abilities & SHOCKS Entire Industry

COMMENTS

  1. Artificial Intelligence in Robotics: From Automation to Autonomous Systems

    8.1 Summary of Findings. In conclusion, this research paper has explored the role of AI in robotics, the transition. from automation to autonomous systems, and the various AI techniques employed ...

  2. (PDF) Robotics and Artificial Intelligence

    Abstract and Figures. Artificial intelligence and robotics are very recent technologies and risks for our world. They are developing their capacity dramatically and shifting their origins of ...

  3. Artificial intelligence, machine learning and deep learning in advanced

    1. Introduction. Artificial intelligence (AI), machine learning (ML), and deep learning (DL) are all important technologies in the field of robotics [1].The term artificial intelligence (AI) describes a machine's capacity to carry out operations that ordinarily require human intellect, such as speech recognition, understanding of natural language, and decision-making.

  4. Augmented Reality Meets Artificial Intelligence in Robotics: A

    Recently, advancements in computational machinery have facilitated the integration of artificial intelligence (AI) to almost every field and industry. This fast-paced development in AI and sensing technologies have stirred an evolution in the realm of robotics. Concurrently, augmented reality (AR) applications are providing solutions to a myriad of robotics applications, such as demystifying ...

  5. Robotic Process Automation and Artificial Intelligence in Industry 4.0

    In this context, this paper aims to present a study of the RPA tools associated with AI that can contribute to the improvement of the organizational processes associated with Industry 4.0. ... which uses robotics as a “set of techniques concerning the operation and use of automata (robots) in the execution of multiple tasks in place of ...

  6. The Future of Robotics: How AI is revolutionizing this Field

    The combination of robotics and AI is transforming the discipline, leading to extraordinary advances and far-reaching repercussions. This abstract discusses the future of robotics in light of fast AI research, stressing its significant influence on numerous sectors and possible obstacles. AI-powered robots are learning and adapting. Advanced machine learning techniques allow robots to ...

  7. Exploring the impact of Artificial Intelligence and robots on higher

    Artificial Intelligence (AI) and robotics are likely to have a significant long-term impact on higher education (HE). The scope of this impact is hard to grasp partly because the literature is siloed, as well as the changing meaning of the concepts themselves. But developments are surrounded by controversies in terms of what is technically possible, what is practical to implement and what is ...

  8. PDF Artificial Intelligence and Robotics

    // Artificial Intelligence and Robotics Welcome to the UK-RAS White Paper Series on Robotics and Autonomous Systems (RAS). This is one of the core activities of UK-RAS Network, funded by the Engineering and Physical Sciences Research Council (EPSRC). By bringing together academic centres of excellence, industry, government, funding bodies and

  9. Human-centered AI and robotics

    The paper describes the requirements and state-of-the-art for a human-centered robotics research and development, including verbal and non-verbal interaction, understanding and learning from each other, as well as ethical questions that have to be dealt with if robots will be included in our everyday environment, influencing human life and ...

  10. Growth in AI and robotics research accelerates

    As the field of AI and robotics research grows in its own right, leading institutions such as Harvard University in the United States have increased their Share in this area since 2015. But such ...

  11. PDF Primer on artificial intelligence and robotics

    Abstract. NYU Stern School of Business, 44 West 4th Street, New York, NY. 10012, USA. This article provides an introduction to artificial intelligence, robotics, and research streams that examine the economic and organizational consequences of these and related technologies. We describe the nascent research on artificial intelligence and ...

  12. PDF The Impact of Artificial Intelligence on Innovation

    artificial intelligence, broadly defined, and divides these outputs into those associated with robotics, symbolic systems, and deep learning. Though preliminary in nature (and inherently imperfect given that key elements of research activity in artificial intelligence may not be

  13. Reinforcement learning for robot research: A comprehensive review and

    Reinforcement learning (RL), 1 one of the most popular research fields in the context of machine learning, effectively addresses various problems and challenges of artificial intelligence. It has led to a wide range of impressive progress in various domains, such as industrial manufacturing, 2 board games, 3 robot control, 4 and autonomous driving. 5 Robot has become one of the research hot ...

  14. Six researchers who are shaping the future of artificial intelligence

    Gemma Conroy, Hepeng Jia, Benjamin Plackett &. Andy Tay. As artificial intelligence (AI) becomes ubiquitous in fields such as medicine, education and security, there are significant ethical and ...

  15. Artificial intelligence-based robots in education: A systematic review

    With the rapid development of artificial intelligence, the application of AI robots (Artificial Intelligence-based robots) for instruction has become an attractive research topic. ... In this study, the characteristics of the AIRE research reported in the 13 identified papers were reviewed. Table 2 lists the occurrence of different research ...

  16. Frontiers

    Health care is shifting toward become proactive according to the concept of P5 medicine-a predictive, personalized, preventive, participatory and precision discipline. This patient-centered care heavily leverages the latest technologies of artificial intelligence (AI) and robotics that support diagnosis, decision making and treatment. In this paper, we present the role of AI and robotic ...

  17. The impact of artificial intelligence on human society and bioethics

    As Von der Leyen said in White Paper on AI - A European approach to excellence and trust: "AI must serve people, and therefore, AI must always comply with people's rights…. High-risk AI. ... Scoping study on the emerging use of Artificial Intelligence (AI) and robotics in social care published by Skills for Care. [Last accessed on 2019 ...

  18. Artificial Intelligence and Robotics: Impact & Open issues of

    In Today's Tech World Artificial Intelligence is an essential tool which provides effective analytical business solutions & plays significant role in the domain of robotics and have several similarities like human behavior which may drive the real world. This paper shows the significant blend of Artificial Intelligence and robotics which ...

  19. (PDF) Artificial Intelligence and Robotics.

    2.2 Perception. Robot perception is a prominent research field in AI and. visual perception systems. In fact, robots have to use other. kinds of sensors such as laser range finder, sonar, and so ...

  20. Artificial Intelligence With Robotics in Healthcare: A Narrative Review

    Research papers related to the use of robotics and artificial intelligence in healthcare were thoroughly studied with special emphasis on its viability in the Indian scenario. The relevant search terms used were artificial intelligence, robotics, healthcare, India, etc. It was a difficult task to explore the required information, as meager data ...

  21. Why scientists trust AI too much

    Scientists of all stripes are embracing artificial intelligence (AI) — from developing 'self-driving' laboratories, in which robots and algorithms work together to devise and conduct ...

  22. Primer on artificial intelligence and robotics

    Research on robotics and artificial intelligence builds off of the substantial body of literature surrounding innovation and technological development. Innovation is a key factor in contributing to economic growth (Solow 1957; Romer 1990) and has been an area of interest for both theorists and policymakers for decades.

  23. Artificial Intelligence

    The very basics of robotics like sensors and effectors are described, some basic problems encountered in modern robotics are introduced, and possible solutions to those problems are described without going deeply into theory. This document gives a short introduction to the basics of robotics in the context of artificial intelligence. It describes the very basics of robotics like sensors and ...

  24. Artificial Intelligence & Robotics

    The aim of this paper is to provide basic, background information on two emerging technologies: artificial intelligence (AI) and robotics and their scope in India. Thus, a first major feature of these two disciplines is product diversity. In addition, it is possible to characterize them as disruptive, enabling and interdisciplinary.

  25. 3D point cloud-based place recognition: a survey

    Place recognition is a fundamental topic in computer vision and robotics. It plays a crucial role in simultaneous localization and mapping (SLAM) systems to retrieve scenes from maps and identify previously visited places to correct cumulative errors. Place recognition has long been performed with images, and multiple survey papers exist that analyze image-based methods. Recently, 3D point ...

  26. PDF arXiv:2403.00833v1 [cs.AI] 28 Feb 2024

    Building artificial intelligence we can trust. Pan-theon.7 Pablo Martinez-Gonzalez, Sergiu Oprea, Alberto Garcia-Garcia, Alvaro Jover-Alvarez, Sergio Orts-Escolano, and Jose Garcia-Rodriguez. 2020. Un-realrox: an extremely photorealistic virtual reality environment for robotics simulations and synthetic data generation. Virtual Reality, 24:271 ...

  27. AI Insights: A Case Study on Utilizing ChatGPT Intelligence for

    This paper discusses the effectiveness of leveraging Chatbot: Generative Pre-trained Transformer (ChatGPT) versions 3.5 and 4 for analyzing research papers for effective writing of scientific literature surveys. The study selected the \\textit{Application of Artificial Intelligence in Breast Cancer Treatment} as the research topic. Research papers related to this topic were collected from ...

  28. Research on Control Technology of Brushless DC Motor for Robot

    This paper describes the design of brushless DC motor PWM drive control system, the proposed bipolar pulse width modulation brushless DC motor drive control has the advantages of continuous current; motor operation in four quadrants, motor stopping with micro-oscillating current, which can eliminate the static friction dead zone, and good ...

  29. JMIR Human Factors

    Background: Generative artificial intelligence has the potential to revolutionize health technology product development by improving coding quality, efficiency, documentation, quality assessment and review, and troubleshooting. Objective: This paper explores the application of a commercially available generative artificial intelligence tool (ChatGPT) to the development of a digital health ...

  30. Call for Papers

    AI & Robotics; Control Engineering ... This special issue seeks to explore and showcase innovative applications and methodologies of Statistical Relational Artificial Intelligence (StarAI) in the context of new electric power systems, with a particular focus on power electronics. ... are interested in leveraging AI to address the evolving ...