The Imperial War Museums (IWM) partnered with Google Cloud and Capgemini to leverage generative AI, using Google’s Gemini models to transcribe and translate over 20,000 hours of audio in just weeks. This collaboration saved over 20 years of manual work and achieved a 99% word accuracy. The AI-powered solution enhances the accessibility of archives, allowing users to search interviews, access synchronized transcriptions, read AI-generated summaries, and ask questions in natural language, transforming the study of wartime experiences.
Challenge: A significant portion of IWM’s extensive oral history archives, comprising roughly 8,000 interviews, were previously only available as audio files. These invaluable recordings, some dating back to 1945 and stored on outdated audio formats like reel-to-reel and cassette tapes, couldn’t be indexed by search engines and were inaccessible to individuals with hearing impairments, making them difficult to discover and access. Manual efforts to transcribe these first-hand accounts were often slow and prone to error due to heavy accents, specialized military terminology, and inconsistent audio quality.
IWM partnered with Capgemini and Google Cloud to create an AI solution using Google’s Gemini models. This advanced system achieves 99% accuracy in transcriptions, even with poor audio quality and diverse accents. It not only provides transcription and translation but also extracts metadata like names and locations and generates detailed summaries of key events. Gemini’s long context window allows users to search across interviews, access synchronized transcriptions, and ask questions in natural language to receive direct answers from the interviews.
Results: Harnessing the power of gen AI, with help from Google Cloud and Capgemini, IWM transcribed and translated more than 20,000 hours of audio in weeks — saving over 20 years of manual transcription work. The project has breathed new life into IWM’s oral histories, making them more accessible, searchable, and understandable to researchers, academics, and the wider public than ever before.
→ “We have over 8,000 oral histories. We made those recordings available online, but someone with hearing loss won't be able to listen to that audio, and it's not very discoverable. The challenge really was the transcription, and a lot of people have very heavy accents. We started to talk to Google and Capgemini about the possibility of AI transcriptions. Using Gemini, we're able to transcribe over 8,000 oral histories — that's 45,000 unique recordings, over 20,000 hours of audio.”
→ “Our error rate is less than 1%, which is significantly more accurate than our human transcription, and AI was incredible at very quickly picking up these accents. It's also really good at understanding where there are disfluencies, so AI picks up on uncertainty, and the curator or researcher is then able to directly check that bit. The huge context window was really important, and it provided summaries, captured entities, the data, the people. We think we’ve saved 20 years of person-hours of transcription.”
As we distance ourselves from the First World War, accessing accounts becomes increasingly difficult. It's vital to provide tools that enhance understanding of war and conflict—its causes, consequences, and impacts on people. I'm proud of this work and its importance.
IWM’s extensive oral history archives, featuring around 8,000 interviews dating back to 1945, were previously only available as audio files on outdated formats like reel-to-reel and cassette tapes. This limited their discoverability, as they couldn't be indexed by search engines and were inaccessible to individuals with hearing impairments. Manual transcription efforts were slow and often inaccurate due to heavy accents, specialized terminology, and inconsistent audio quality.
