Extracting Environmental Research Trends using LDA

  • Authors

    • Do Yeon Kim
    • Sung Won Kang
    2018-06-08
    https://doi.org/10.14419/ijet.v7i2.33.18107
  • Environmental Research Trends, Machine Learning, Natural Language Processing, LDA, Topic Model
  • In this study, we compared topics of two distinct text data on environmental issues: environmental research reports and on-line environmental news using a Latent Dirichlet Allocation (LDA) analysis. For the environmental research reports, we used digitized research reports from the Korea Environment Institute, whereas for the newspaper, we crawled environment news articles on the Naver portal service. Once we extracted the topics, we compared the annual share of each topic in each text medium. From the LDA analysis, ten topics emerged from each medium. Six are common in both media, whereas four of the latest issues, namely, “gene variation,†“noise,†“health,†and “data,†only appeared in environmental news. In addition, among the six common topics, the share of “water pollution†and “waste†topics in environmental news appears to lead the share of the same two topics in the environmental research reports. These two results suggest that research topics tend to fall behind environmental issues in terms of latest interest. This study raises the possibility of using the LDA model to analyze research trends and find new research topics.

     

  • References

    1. [1] Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77-84.

      [2] Steyvers, M., & Griffiths, T. (2007). Probabilistic topic models. Handbook of latent semantic analysis, 427(7), 424-440.

      [3] Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3(Jan), 993-1022.

      [4] Steyvers, M., & Griffiths, T. (2007). Probabilistic topic models. Handbook of latent semantic analysis, 427(7), 424-440.

      [5] Park, S. E. (2015). Analysis of social media contents related to broadcast media using topic modeling. Proceedings of the Korea Intelligent Information Systems Society, 22-22.

      [6] Wang, X., & McCallum, A. (2006, August). Topics over time: a non-Markov continuous-time model of topical trends. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 424-433). ACM.

      [7] Newman, D. J., & Block, S. (2006). Probabilistic topic decomposition of an eighteenth century American newspaper. Journal of the Association for Information Science and Technology, 57(6), 753-767.

      [8] Gerrish, S., & Blei, D. M. (2010, June). A Language-based Approach to Measuring Scholarly Impact. In ICML (Vol. 10, pp. 375-382).

      [9] Sievert, C., & Shirley, K. (2014). LDAvis: A method for visualizing and interpreting topics. In Proceedings of the workshop on interactive language learning, visualization, and interfaces (pp. 63-70).

  • Downloads

  • How to Cite

    Yeon Kim, D., & Won Kang, S. (2018). Extracting Environmental Research Trends using LDA. International Journal of Engineering & Technology, 7(2.33), 1222-1228. https://doi.org/10.14419/ijet.v7i2.33.18107