Document Analysis for Evidence Based Medicine. A proposal with Big Data Analytics

Gustavo Verduzco Reyes, Ernesto Bautista Thompson, Jorge A. Ruiz Vanoye, Alejandro Fuentes Penna


Big data analytics is a technology that includes the storage, administration and analysis of large volumes of data, its application in the field of health has been recent. This paper focuses on a proposal for analysis of three-phase documents: diagnosis, analysis and evaluation. The proposal forms part of a thesis work under development, which has identified the data sources for its analysis as Cochrane, ACP Journal, PUBMED. Also the analysis techniques to use as Support vector machines, Naive Bayes and Cluster k-means. As well as evaluation techniques such as confusion matrix and ROC curve. This proposal is a support in the decision making for the health professionals, allowing them to make better medical diagnoses.

Full Text:

PDF (Spanish)


BAKAN, S. y GOUL, M. (2010). “Advances in Predictive Modeling: How In-Database Analytics Will Evolve to Change the Game”. Business Inteligence Journal, 15(2). Retrieved from Database_Analytics_Will_Evolve_to_Change_the_Game/links/53f5c3870cf22be01c3faa29.pdf

BELLE, A., THIAGARAJAN, R., SOROUSHMEHR, S. M. R., NAVIDI, F., BEARD, D. A., & NAJARIAN, K. (2015). “Big Data Analytics in Healthcare”. BioMed Research International, 2015, 1–16.

BRENNAN, P. F., & BAKKEN, S. (2015). “Nursing needs big data and big data needs nursing”. Journal of Nursing Scholarship, 47(5), 477–484.

BRITT, B. L., BERRY, M. W., BROWNE, M., MERRELL, M. A., & KOLPACK, J. (2008). “Document classification techniques for automated technology readiness level analysis”. Journal of the Association for Information Science and Technology, 59(4), 675–680.

CAÑEDO, ANDALIA, R. (2011). “Los buscadores en la recuperación de información en salud”. ACIMED, 22(3), 219–236.

CAZAU, P. (2006). Introducción a la investigación en ciencias sociales. Lima. Editorial Universidad Ricardo Palma. Retrieved from

CERRITO, P., & CERRITO, J. C. (2006). “Data and text mining the electronic medical record to improve care and to lower costs”. In Proceedings of SUGI (Vol. 31, pp. 26–29). Retrieved from

CHAWLA, N. V., & DAVIS, D. A. (2013). “Bringing big data to personalized healthcare: a patient-centered framework”. Journal of General Internal Medicine, 28(3), 660–665.

CORSO, C. L. (2009). Aplicación de algoritmos de clasificación supervisada usando Weka. Córdoba: Universidad Tecnológica Nacional, Facultad Regional Córdoba. Retrieved from

DIEBOLD, F. X. (2012). On the Origin (s) and Development of the Term’Big Data’. Retrieved from

GEORGIOU, A. (2002). “Data, information and knowledge: the health informatics model and its role in evidence-based medicine”. Journal of Evaluation in Clinical Practice, 8(2), 127–130.

HILTBRAND, T. (2013). “Behavior-Based Budget Management Using Predictive Analytics”. The Business Intelligence Journal, 18(INL/JOU-12-26713). Retrieved from

JIANG, F., & LEUNG, C. (2015). A Data Analytic Algorithm for Managing, Querying, and Processing Uncertain Big Data in Cloud Environments. Algorithms, 8(4), 1175–1194.

KUDYBA, S. P. (2013). Big Data, Mining, and Analytics: Components of Strategic Decision Making - Books24x7. Retrieved May 10, 2017, from

LAM, C., LAI, F.-C., WANG, C.-H., LAI, M.-H., HSU, N., & CHUNG, M.-H. (2016). Text Mining of Journal Articles for Sleep Disorder Terminologies. PloS One, 11(5), e0156031.

LAROSE, DANIEL T. & LAROSE, CHANTAL D.. (2015). Data Mining and Predictive Analytics, Second Edition - Books24x7. Retrieved May 10, 2017, from

LEMKE, F., & MUELLER, J.-A. (2003). “Medical data analysis using self-organizing data mining technologies”. Systems Analysis Modelling Simulation, 43(10), 1399–1408.

MELLIS, C. (2015). “Evidence-based medicine: What has happened in the past 50 years?”. Journal of Paediatrics and Child Health, 51(1), 65–68.

MONTORI, V. M., & GUYATT, G. H. (2008). “Progress in evidence-based medicine”. Jama, 300(15), 1814–1816.

PRATI, R. C., BATISTA, G., & MONARD, M. C. (2008). “Curvas ROC para avaliação de classificadores”. Revista IEEE América Latina, 6(2), 215–222.

RAJA, U., MITCHELL, T., DAY, T., & HARDIN, J. M. (2008). “Text mining in healthcare. Applications and opportunities”. J Healthc Inf Manag, 22(3), 52–6.

ROJAS, E., MUNOZ-GAMA, J., SEPÚLVEDA, M., & CAPURRO, D. (2016). “Process mining in healthcare: A literature review”. Journal of Biomedical Informatics, 61, 224–236.

SACKETT, D. L., ROSENBERG, W. M., GRAY, J. M., HAYNES, R. B., & RICHARDSON, W. S. (1996). “Evidence based medicine: what it is and what it isn’t”. British Medical Journal Publishing Group. Retrieved from

SANTISO, S., CASILLAS, A., PÉREZ, A., ORONOZ, M., & GOJENOLA, K. (2016). “Document-level adverse drug reaction event extraction on electronic health records in Spanish”. Procesamiento del Lenguaje Natural, 56, 49–56.

SIMPAO, A. F., AHUMADA, L. M., & REHMAN, M. A. (2015). “Big data and visual analytics in anaesthesia and health care”. British Journal of Anaesthesia, aeu552.

SPINK, A., YANG, Y., JANSEN, J., NYKANEN, P., LORENCE, D. P., OZMUTLU, S., & OZMUTLU, H. C. (2004). “A study of medical and health queries to web search engines”. Health Information & Libraries Journal, 21(1), 44–51.

URAMOTO, N., MATSUZAWA, H., NAGANO, T., MURAKAMI, A., TAKEUCHI, H., & TAKEDA, K. (2004). “A text-mining system for knowledge discovery from biomedical documents”. IBM Systems Journal, 43(3), 516–533.

ZHANG, Y., BROUSSARD, R., KE, W., & GONG, X. (2014). “Evaluation of a scatter/gather interface for supporting distinct health information search tasks”. Journal of the Association for Information Science and Technology, 65(5), 1028–1041.30


  • There are currently no refbacks.