JOURNAL (INTERNATIONAL) Predicting Smoking Prevalence in Japan Using Search Volumes in an Internet Search Engine: An Infodemiology Study

Kazuya Taira (Kyoto univ.), Takahiro Itaya (Kyoto univ.), Sumio Fujita

Journal of Medical Internet Research (JMIR)

December 14, 2022

Methods: This study used the infodemiology approach. The outcome variable was smoking rate by prefectures obtained from the governmental statistics. The predictor variable was the search volumes on Yahoo! JAPAN Search. The search queries for which the search volumes was collected were tobacco related terms from thesaurus in the Japanese medical article database “Ichu-shi.” Predictor variables were converted to per capita and standardized as Z-scores. For smoking rates, values for 2016 and 2019 were used, and for the search volumes, values for the fiscal years(FY)one year prior to the survey, i.e., FY 2015 and FY 2018, were used. Partial correlation coefficients, adjusted for data year, between smoking rates and the search volumes and regression analysis using generalized linear mixed model with random effects for each prefecture. Several models were tested, including a model that included all search queries, a variable reduction method, and one that excluded cigarette product names, and the best model was selected by the corrected Akaike’s Information criterion (AICC) and Bayesian Information criterion(BIC). Comparison of predicted and actual smoking rates in 2016 and 2019 based on the best model and predicted smoking rates in 2022 were calculated. Results: In the analysis for the total of men and women, no significant correlation coefficients were found. For men, nine search queries had significant correlations with smoking rates, such as cigarette(シガレット); r=-0.417,p<0.001, cigar(葉巻); r=-0.412, p<0.001, cigar(シガー); r=-0.399, p<0.001. For women, five search queries had significant correlations, such as vape; r=0.335, p=0.001,no smoking(禁煙); r=0.288, p=0.005 cigar(シガー); r=0.286, p=0.006. The models with all search queries were the best models for both AICC and BIC scores. Scatter plots of actual and estimated smoking rates in 2016 and 2019 confirmed a relatively high degree of agreement. The average of the 47 prefectures for the estimated smoking rate in 2022 for total was 23.492 ± 0.957, showing an increasing trend, and 29.024±0.921 for males and 8.793±0.643 for females. Conclusions: This study suggests that the search volume of tobacco-related queries in Internet search engines can predict smoking rates by prefecture. Findings will enable the development of low-cost, timely, and crisis-resistant health indicators that will enable the evaluation of health measures and contribute to improved public health.

Paper : Predicting Smoking Prevalence in Japan Using Search Volumes in an Internet Search Engine: An Infodemiology Study (external link)