Experiment AI - ChatGPT runs a focus group
Can AI do qualitative research?
Introduction
Much has been written about the potential of AI in market research but this tends to be from the perspective of quantitive research and is focused around making surveys quicker, automated and better quality.
But what about qualitative research? Does the conversational capabilities of large language models, such as ChatGPT, mean that AI could perform a transformational shift in qualitative research?
To gain some insight into the possibility, we conducted a small experiment to see whether ChatGPT could simulate a focus group environment complete with interactive conversation between participants and moderator. This article examines the process we went through and identifies some of the strengths and weaknesses of the approach.
The group discussion consisted of a short conversation between six AI generated respondents on the subject of the cost of living crisis.
Focus groups are normally around 60-90 minutes duration but as the AI generated discussion was more about proof of concept (rather than about depth and breadth of content), it lasted less than 10 minutes. Even so, there are many lessons that can be learnt from the analysis.
The video in the link above contains a clip of the discussion to help place this analysis in context. The video uses AI generated avatars to reproduce ChatGPT's output. AI video is still in its infancy and the range of avatars and voices is limited at present. Also, there are short gaps between each participant's contribution which is again down to the technology used.
Our overall conclusion is that while AI's application offers substantial efficiencies and innovative capabilities in generating insights, it simultaneously introduces challenges that cannot be overlooked.
Persona generation: achievements and limitations
The first task was to generate typical respondents. ChatGPT (version 4) was prompted to generate a range of personas that reflected a diverse personality type in relation to demographics, lifestage, attitudes and experiences. This was an iterative process using prompts to ensure that the AI 'respondents' reflected society today.
The first prompt asked for a basic persona to be generated. This generated personas such as 'The Dominator' and 'The Quiet Observer,' reflecting different communication styles and viewpoints. A second and third layer of prompts asked ChatGPT to describe a typical person who illustrates the persona and to ensure a spread of participants by age, gender and ethnicity.
This simulated diversity was designed to ensure that the conversation encompassed a variety of perspectives that might be achieved from participants drawn from the general population.
The generated personas are listed below. All the information in the panel, including names and ages, was generated by ChatGPT.

For many focus groups a more targeted audience would be required. This could be people who buy a named product, shop a particular category or use a service. No such layer was applied in this experiment due to concerns that the ChatGPT training base would have more limited data available and would therefore display a greater likelihood to generate non factual statements.
The AI simulated a reasonable variety of personas reflecting different backgrounds although with hints of stereotyping. Although it was prompted to build personas that reflect society, there is a real risk that those generated are an oversimplification or not based on fact. ChatGPT issues a warning that some of its 'facts' might not actually be true. In addition, the generated personas may not fully capture subtle cultural and emotional nuances, a gap that only real-world interactions and experiences can authentically fill.
Prompting the discussion : precision versus spontaneity
During the discussion phase prompts were used to ensure that ChatGPT elicited detailed responses from the participants and maintained a natural conversational flow. This was an interactive process with prompts added so that the AI could learn how and when the moderator should come into the discussion and how to achieve an organic conversation.
The first prompt was for the moderator to ask an open question on how each person was being affected by the cost of living crisis. This resulted in each participant speaking one after another rather than interacting as in a conversation.
ChatGPT was then prompted to rerun the conversation with more exchanges between participants and some interjections from the moderator to bring other participants into the conversation or to stop others dominating it. A third layer of prompts was used to get the moderator to pick up on interesting themes and ask the participant to expand on these.
The prompts and topic question were the only human intervention in the discussion group. All of the content was generated by ChatGPT. No prompts were used that instructed that a participant said a certain thing or raised certain issues.
One of the dangers in the use of prompts is the potential for bias. It is very easy to steer the discussion in a particular direction by prompting the model to discuss themes or for the group to take a particular stance on issues. These types of prompts were avoided in this experiment but it is recognised that prompted bias is a big concern.
Noticeable too was an observed trade-off between the AI's precision in maintaining relevance to the topic and the spontaneity inherent in human-moderated discussions. ChatGPT could not pick up on emotional undercurrents or pivot the conversation based on nuanced non-verbal cues, potentially missing out on deeper or tangential insights that a human moderator might have pursued instinctively.
Also, one of AI's primary drawbacks is its reliance on existing data, potentially stifling the spontaneity or unforeseen, yet valuable, tangents that human moderators might intuitively pursue. There is also a danger that the generated conversation reinforces structural bias in the training dataset, particularly relating to gender, race and a 'West' narrative.
Themes discussed, content control and authenticity
In simulating interaction, ChatGPT was directed to manage conversational flow, ensuring that each participant contributed to the discussion, thereby controlling the dominance of more assertive characters. This method was largely effective in simulating a balanced discussion environment.
The AI produced a range of themes and issues discussed among the participants. It discussed the precariousness of financial positions and gave examples relevant to the respondent's lifestage. For example, one of the respondents who rents talked about her struggle to afford sky rocketing rents. Another spoke about difficulties in feeding the family. It also generated examples of how people had changed behaviour in relation to shopping and cooking as well as spending money on non essential items. There was also an extended discussion around getting the balance right between needs and wants as well as togetherness and the sharing of resources.
The discussion also touched on the impact on mental health and wellbeing and coping strategies adopted to mitigate these impacts.
Although the discussion was broad, the balance tipped at times towards an over-controlled interaction, lacking the organic and emotional unpredictability of human conversation. The AI, could not fully replicate the empathy and psychological safety provided by a human facilitator, subtly impacting the depth and authenticity of the discussion.
The predictability of AI responses might limit the organic unpredictability and emotional depth that human discussions typically present. Moreover, the absence of genuine emotional stake in discussions could result in a lack of authentic empathy, potentially affecting the conversation's depth.
Analysing the data
After the focus group, ChatGPT was used to analyse the discussion, drawing on its ability to process and identify themes within large volumes of text. The AI provided a short report drawing together the themes and selecting quotes to illustrate points made. This part of the experiment worked reasonably well and it's easy to see how ChatGPT can be use to quickly synthesise themes and issues arising from focus groups.
Nevertheless, the model's analysis is only as good as the data it has been trained on and without experiential learning, it might misinterpret sarcasm, humour or cultural references, leading to inaccuracies in thematic interpretation. This limitation underscores the importance of human oversight in validating ChatGPT's analysis.
Conclusion
The use of ChatGPT in focus group methodologies could mark a notable advance in qualitative research, offering efficiencies in participant creation, discussion moderation and data analysis. However, these advantages come at a price. The lack of authentic human experience and emotional intuition, the potential for oversimplification and the inability to capture the full spectrum of human spontaneity and nuance present challenges that researchers must navigate. Prompt related bias is also problematic.
The AI was able to generate content that could reasonably considered in keeping with the debate around the cost of living crisis. It helps that there is a lot of information online already and so ChatGPT's database covered many points that could have been raised in a real group discussion. This in itself raises a more problematic limitation for AI focus groups - the AI draws heavily on information that already exists. For discussions around new product development or innovation, this is not always the case. The use case for AI generated discussion groups, therefore, may be limited to the synthesis of data on general topics.
It was an interesting experiment with some of it being quite impressive, particularly around the range of issues generated. That said, the technology is not in a position to replace human led qualitative research. Yet.
If you're interested in reading an analysis of the findings of the focus group then click here.