Tim Löhr

Tim Löhr

Master's Thesis

AI Trend Detection in Healthcare by applying Topic Clustering and Sentiment Analysis using Podcast Data

Philipp Dumbach (M.Sc.)Leo Schwinn (M.Sc.), Prof. Dr. Björn Eskofier

01 / 2022 – 07 / 2022

In recent years, the medium podcast has gained much attention. The increasing popularity of podcasts supports the production and therefore the amount of available podcast data. This data can be used for identifying trends of the past, presence and the future [1, 2].

Research has already been conducted to validate the usefulness of podcast data for analyzing trends [3]. A prior work from Long Do and colleagues [4] investigated healthcare podcasts as a research medium to elaborate on an AI trend analysis by retrieving the text data from audio files. The team collected podcast data from the year 2010 to 2020. Due to the exponential growth of podcast production, new data from the year 2021 will be collected to extent the already existing dataset. For transcribing the latest audio files from speech to text, the open-source technologies Vosk and Deepspeech will be used and compared. The creators of the most extensive corpus of podcast data [5, 3] suggest, that sentiment analysis and topic clustering could improve trend detection. This leads to the assumption that those approaches could also benefit future trend detection of AI in healthcare. Techniques like sentiment analysis have been thoroughly studied and have proven themselves many times to improve the quality of the prediction of trends [6]. For the detection process, a keyword list was used by Long and colleagues [4]. Topic clustering could also extend this list, based on the list from Long Do et al. plus the additional data from 2021. This extended keyword list possibly enhances the quality of trend prediction. Prior research on topic clustering for enhanced trend detection has already been successfully conducted by Zou [7].

[1] Mohammed Yousef Shaheen. Applications of Artificial Intelligence (AI) in healthcare: A review. 2021.
[2] DonHee Lee and Seong No Yoon. Application of Artificial Intelligence-Based Technologies in the Healthcare Industry: Opportunities and Challenges. International journal of environmental research and public health, 18(1), 2021.
[3] Ann Clifton, Sravana Reddy, Yongze Yu, Aasish Pappu, Rezvaneh Rezapour, Hamed Bonab, Maria Eskevich, Gareth Jones, Jussi Karlgren, Ben Carterette, and Rosie Jones. 100,000 Podcasts: A Spoken English Document Corpus. In Donia Scott, Nuria Bel, and Chengqing Zong, editors, Proceedings of the 28th International Conference on Computational Linguistics, pages 59035917, Stroudsburg, PA, USA, 2020. International Committee on Computational Linguistics.
[4] Long Do Ma, Philipp Dumbach, Leo Schwinn, and Bjoern Eskofier. AI Trend Analysis using Speech to Text Data from Healthcare Podcasts. 2021.
[5] Juliane Welz, Annamaria Riemer, Inga Döbel, Nora Dakkak, and Anna Sophie von Schwartzenberg. Identifying future trends by podcast mining: an explorative approach for Web-based horizon scanning. foresight, 23(1):1-16, 2021.
[6] Ali Alessa and Miad Faezipour. Tweet Classification Using Sentiment Analysis Features and TF-IDF Weighting for Improved Flu Trend Detection. In Petra Perner, editor, Machine Learning and Data Mining in Pattern Recognition, volume 10934 of SpringerLink Bücher, pages 174-186. Springer International Publishing, Cham, 2018.
[7] Chen Zou. Analyzing Research Trends on Drug Safety using Topic Modeling. Expert opinion on drug safety, 17(6):629636, 2018.