Doing Thematic Analysis: A Step-by-Step Guide
Some of the phases of thematic analysis are similar to the phases of other qualitative research, so these stages are not necessarily all unique to thematic analysis. The process starts when the analyst begins to notice, and look for, patterns of meaning and issues of potential interest in the data – this may be during data collection. The endpoint is the reporting of the content and meaning of patterns (themes) in the data, where “themes are abstract (and often fuzzy) constructs the investigators identify [sic] before, during, and after analysis” (Ryan & Bernard, 2000: 780). Analysis involves a constant moving back and forward between the entire data set, the coded extracts of data that you are analysing, and the analysis of the data that you are producing. Writing is an integral part of analysis, not something that takes place at the end, as it does with statistical analyses. Therefore, writing should begin in phase one, with the jotting down of ideas and potential coding schemes, and continue right through the entire coding/analysis process. There are different positions regarding when you should engage with the literature relevant to your analysis – with some arguing that early reading can narrow your analytic field of vision, leading you to focus on some aspects of the data at the expense of other potential crucial aspects. Others argue that engagement with the literature can enhance your analysis by sensitising you to more subtle features of the data (Tuckett, 2005). Therefore, there is no one right way to proceed with reading, for thematic analysis, although a more inductive approach would be enhanced by not engaging with literature in the early stages of analysis, whereas a theoretical approach requires engagement with the literature prior to analysis.
We provide an outline to guide you through the six phases of analysis, and offer examples to demonstrate the process.7 The different phases are usefully summarised in Table 1. It is important to recognise that qualitative analysis guidelines are exactly that – they are not rules, and, following the basic precepts, will need to be applied flexibility to fit the research questions and data (Patton, 1990). Moreover, analysis is not a linear process where you simply move from one phase to the next. Instead, it is more recursive process, where you move back and forth as needed, throughout the phases. It is also a process that develops over time (Ely et al., 1997), and should not be rushed.
Familiarizing Yourself with Your Data
When you engage in analysis, you may have collected the data yourself, or it may have been given to you. If you collected it through interactive means, you will come to the analysis with some prior knowledge of the data, and possibly some initial analytic interests or thoughts. Regardless, it is vital that you immerse yourself in the data to the extent that you are familiar with the depth and breadth of the content. Immersion usually involves „repeated reading‟ of the data, and reading the data in an active way – searching for meanings, patterns and so on. It is ideal to read through the entire data set at least once before you begin your coding, as your ideas, identification of possible patterns will be shaped as you read through.
Whether or not you are aiming for an overall or detailed analysis, are searching for latent or semantic themes, or are data- or theoretically-driven will inform how the reading proceeds. Regardless, it is important to be familiar with all aspects of your data. At this phase, one of the reasons why qualitative research tends to use far smaller samples than, for example, questionnaire data will become apparent – the reading and re-reading of data is time-consuming. It is, therefore, tempting to skip over this phase or be selective. We would strongly advise against this, as this phase provides the bedrock for the rest of the analysis.
During this phase, it is a good idea to start taking notes or marking ideas for coding that you will then go back to in subsequent phases. Once you have done this, you are ready to begin the more formal coding process. In essence, coding continues to be developed and defined throughout the entire analysis.
Transcription of Verbal Data
If you are working with verbal data such as interviews, television programmes or political speeches, the data will need to be transcribed into written form in order to conduct a thematic analysis. The process of transcription, while it may seen time-consuming, frustrating, and at times boring, can be an excellent way to start familiarising yourself with the data (Riessman, 1993). Further, some researchers even argue it should be seen as “a key phase of data analysis within interpretative qualitative methodology” (Bird, 2005: 227), and recognised as an interpretative act, where meanings are created, rather than simply a mechanical one of putting spoken sounds on paper (Lapadat & Lindsay, 1999).
Various conventions exist for transforming spoken texts into written texts (see Edwards & Lampert, 1993; Lapadat & Lindsay, 1999). Some systems of transcription have been developed for specific forms of analysis – such as the „Jefferson‟ system for CA (see Atkinson & Heritage, 1984; Hutchby & Wooffitt, 1998). However, thematic analysis, even constructionist thematic analysis, does not require the same level of detail in the transcript as conversation, discourse or even narrative analysis. As there is no one way to conduct thematic analysis, there is no one set of guidelines to follow when producing a transcript. However, at a minimum it requires a rigorous and thorough orthographic‟ transcript – a „verbatim‟ account of all verbal (and sometimes nonverbal [e.g., coughs]) utterances.8 What is important is that the transcript retains the information you need, from the verbal account, and in a way which is „true‟ to its original nature (e.g., punctuation added can alter the meaning of data – for example ‘I hate it, you know. I do’ versus ‘I hate it. You know I do’, Poland, 2002: 632), and that the transcription convention is practically suited to the purpose of analysis (Edwards, 1993).
As we have noted, the time spent in transcription is not wasted, as it informs the early stages of analysis, and you will develop a far more thorough understanding of your data through having transcribed it. Furthermore, the close attention needed to transcribe data may facilitate the close- reading and interpretative skills needed to analyse the data (Lapadat & Lindsay, 1999). If your data have already been, or will be, transcribed for you, it is important that you spend more time familiarising yourself with the data, and also check the transcripts back against the original audio recordings for ‘accuracy’ (as should always be done).
Generating Initial Codes
Phase 2 begins when you have read and familiarised yourself with the data, and have generated an initial list of ideas about what is in the data and what is interesting about them. This phase then involves the production of initial codes from the data. Codes identify a feature of the data (semantic content or latent) that appears interesting to the analyst, and refer to “the most basic segment, or element, of the raw data or information that can be assessed in a meaningful way regarding the phenomenon” (Boyatzis, 1998: 63). See Figure 1 for an example of codes applied to a short segment of data. The process of coding is part of analysis (Miles & Huberman, 1994), as you are organising your data into meaningful groups (Tuckett, 2005). However, your coded data differs from the units of analysis (your themes) which are (often) broader. Your themes, which you start to develop in the next phase, are where the interpretative analysis of the data occurs, and in relation to which arguments about the phenomenon being examined are made (Boyatzis, 1998).
Coding will to some extent depend on whether the themes are more „data-driven‟ or „theory-driven‟ in the former, the themes will depend on the data, but in the latter, you might approach the data with specific questions in mind that you wish to code around. It will also depend on whether you are aiming to code the content of the entire data set, or whether you are coding to identify particular (and possibly limited) features of the data set. Coding can be done either manually or through a software programme (see, e.g., Kelle, 2004; Seale, 2000, for discussion of software programmes).
Work systematically through the entire data set, giving full and equal attention to each data item, and identify interesting aspects in the data items that may form the basis of repeated patterns (themes) across the data set. There are a number of ways of actually coding extracts. If coding manually, you can code your data by writing notes on the texts you‟re analysing, by using highlighters or coloured pens to indicate potential patterns, or by using „post-it‟ notes to identify segments of data. You may initially identify the codes, and then match them up with data extracts that demonstrate that code, but it is important in this phase to ensure that all actual data extracts are coded, and then collated together within each code. This may involve copying extracts of data from individual transcripts or photocopying extracts of printed data, and collating each code together in separate computer files or using file cards. If using computer software, you code by tagging and naming selections of text within each data item.
Key advice for this phase is: a) code for as many potential themes/patterns as possible (time permitting) – you never know what might be interesting later; b) code extracts of data inclusively – i.e., keep a little of the surrounding data if relevant, a common criticism of coding is that the context is lost (Bryman, 2001); and c) remember that you can code individual extracts of data in as many different „themes‟ as they fit into – so an extract may be uncoded, coded once, or coded many times, as relevant. Note that no data set is without contradiction, and a satisfactory thematic map‟ that you will eventually produce – an overall conceptualisation of the data patterns, and relationships between them9 – does not have to smooth out or ignore the tensions and inconsistencies within and across data items. It is important to retain accounts which depart from the dominant story in the analysis, so do not ignore these in your coding.
Searching for Themes
Phase 3 begins when all data have been initially coded & collated, and you have a long list of the different codes you have identified across your data set. This phase, which re-focuses the analysis at the broader level of themes, rather than codes, involves sorting the different codes into potential themes, and collating all the relevant coded data extracts within the identified themes. Essentially, you are starting to analyse your codes, and consider how different codes may combine to form an overarching theme. It may be helpful at this phase to use visual representations to help you sort the different codes into themes. You might use tables, mind-maps, or you might write the name of each code (and a brief description) on a separate piece of paper and play around with organising them into theme-piles. A thematic map of this early stage can be seen in Figure 2 (the examples in Figures 2 to 4 come from the analysis presented in Braun and Wilkinson (2003) of women‟s talk about the vagina). This is when you start thinking about the relationship between codes, between themes, and between different levels of themes (e.g., main overarching themes and sub-themes within them). Some initial codes may go on to form main themes, whereas others may form sub-themes, and others still may be discarded. At this stage you may also have a set of codes that do not seem to belong anywhere, and it is perfectly acceptable to create a „theme‟ called miscellaneous to house the codes – possibly temporarily – that do not seem to fit into your main themes.
You end this phase with a collection of candidate themes, and sub-themes, and all extracts of data that have been coded in relation to them. At this point, you will start to have a sense of the significance of individual themes. However, do not abandon anything at this stage, as without looking at all the extracts in detail (the next phase) it is uncertain whether the themes hold as they are, or whether some need to be combined, refined and separated, or discarded.
Phase 4 begins when you have devised a set of candidate themes, and it involves the refinement of those themes. During this phase, it will become evident that some candidate themes are not really themes (e.g., if there are not enough data to support them, or the data are too diverse), while others might collapse into each other (e.g., two apparently separate themes might form one theme). Other themes might need to be broken down into separate themes. Patton‟s (1990) dual criteria for judging categories – internal homogeneity and external heterogeneity – are worth considering here. Data within themes should cohere together meaningfully, while there should be clear and identifiable distinctions between themes.
This phase involves two levels of reviewing and refining your themes. Level one involves reviewing at the level of the coded data extracts. This means you need to read all the collated extracts for each theme, and consider whether they appear to form a coherent pattern. If your candidate themes appear to form a coherent pattern, you then move on to the second level of this phase. If your candidate themes do not fit, you will need to consider whether the theme itself is problematic, or whether some of the data extracts within it simply do not fit there – in which case, you would rework your theme, creating a new theme, finding a home for those extracts that do not currently work in an already-existing theme, or discarding them from the analysis. Once you are satisfied that your candidate themes adequately capture the contours of the coded data – once you have a candidate „thematic map‟ – you are ready to move on to level two of this phase. The outcome of this refinement process can be seen in the thematic map presented in Figure 3.
Level two involves a similar process, but in relation to the entire data set. At this level, you consider the validity of individual themes in relation to the data set, but also whether your candidate thematic map „accurately‟ reflects the meanings evident in the data set as a whole. To some extent, what counts as „accurate representation‟ depends on your theoretical and analytic approach. However, in this phase you re-read your entire data set for two purposes. The first is, as discussed, to ascertain whether the themes „work‟ in relation to the data set. The second is to code any additional data within themes that has been missed in earlier coding stages. The need for re- coding from the data set is to be expected as coding is an ongoing organic process.
If the thematic map works, then you move on to the next phase. However, if the map does not fit the data set, you need to return to further reviewing and refining your coding until you have devised a thematic map that you are satisfied with. In so doing, it is possible that you will identify potential new themes, and you might need to start coding for these as well, if you are interested in them. However, a word of warning: as coding data and generating themes could go on ad infinitum, it is important not to get over-enthusiastic with endless re-coding. It is impossible to provide clear guidelines on when to stop, but when your refinements are not adding anything substantial, stop! If the process of recoding is only fine-tuning and making more nuanced a coding frame that already works – i.e., it fits the data well – recognise this and stop. Consider it like editing written work – you could endlessly edit your sentences and paragraphs, but after a few editing turns, any further work is usually unnecessary refinement – like rearranging the hundreds and thousands on an already nicely decorated cake.
At the end of this phase, you should have a fairly good idea of what your different themes are, how they fit together, and the overall story they tell about the data.
Defining and Naming Themes
Phase 5 begins when you have a satisfactory thematic map of your data – see Figure 4 for the final refinements of Virginia‟s thematic map. At this point, you then define and further refine the themes that you will present for your analysis, and analyse the data within them. By „define and refine‟ we mean identifying the „essence‟ of what each theme is about (as well as the themes overall), and determining what aspect of the data each theme captures. It is important not to try and get a theme to do too much, or to be too diverse and complex. You do this by going back to collated data extracts for each theme, and organising them into a coherent and internally consistent account, with an accompanying narrative. It is vital that you do not just paraphrase the content of the data extracts presented, but identify what is interesting about them and why!
For each individual theme, you need to conduct and write a detailed analysis. As well as identifying the „story‟ that each theme tells, it is important to consider how it fits into the broader overall story‟ that you are telling about your data, in relation to your research question or questions, to ensure there is not too much overlap between themes. So you need to consider the themes themselves, and each theme in relation to the others.
As part of the refinement, you will identify whether or not a theme contains any sub-themes. Sub-themes are essentially themes-within-a- theme. They can be useful for giving structure to a particularly large and complex theme, and also for demonstrating the hierarchy of meaning within the data. For instance, in one of Virginia‟s analyses of women‟s talk about the vagina, she identified two overarching themes in women‟s talk: the vagina as liability, and the vagina as asset (Braun & Wilkinson, 2003). Within each theme, three sub-themes were identified: for liability the sub-themes were „nastiness and dirtiness‟, „anxieties‟ and „vulnerability‟; for asset the sub-themes were „satisfaction‟, „power‟ and „pleasure‟. However, these eventual final themes and sub-themes resulted from a process of refinement of initial themes and sub-themes, as shown in Figures 2 to 4.
It is important that by the end of this phase you can clearly define what your themes are, and what they are not. One test for this is to see whether you can describe the scope and content of each theme in a couple of sentences. If you cannot do this, further refinement of that theme may be needed. Although you will have already given your themes working titles, this is also the point to start thinking about the names that you will give them in the final analysis. Names need to be concise, punchy, and immediately give the reader a sense of what the theme is about.
Producing the Report
Phase 6 begins when you have a set of fully worked-out themes, and involves the final analysis and write-up of the report. The task of the write-up of a thematic analysis, whether it is for publication or for a research assignment or dissertation, is to tell the complicated story of your data in a way that convinces the reader of the merit and validity of your analysis. It is important that the analysis (the write-up of it, including data extracts) provides a concise, coherent, logical, non-repetitive, and interesting account of the story the data tell – within and across themes. Your write-up must provide sufficient evidence of the themes within the data – i.e., enough data extracts to demonstrate the prevalence of the theme. Choose particularly vivid examples or extracts that capture the essence of the point you are demonstrating, without unnecessary complexity. The extract should be easily identifiable as an example of the issue. However, your write-up needs to do more than just provide data. Extracts need to be embedded within an analytic narrative that compelling illustrates the story that you are telling about your data, and your analytic narrative needs to go beyond the description of the data and make an argument in relation to your research question.