Biomedical NLP
Principal Investigator: Dejing Dou
We propose to advance biomedical natural language processing by combining ontology-based reasoning, semantic search, and embedding-driven methods to improve information extraction, representation, and consistency detection in biomedical texts.
Biomedical texts contain vast and complex information, yet their richness often comes with challenges of heterogeneity, inconsistency, and semantic ambiguity. This project seeks to address these challenges by developing novel methods for biomedical natural language processing (NLP), with a particular focus on ontology-driven and embedding-based approaches.
We investigate how ontologies can enhance semantic representation and reasoning, enabling the detection of inconsistencies and improving interoperability across biomedical resources. Semantic search systems are designed to facilitate efficient access to knowledge within large biomedical datasets. Embedding-based approaches are explored to capture semantic relations beyond traditional ontology frameworks, enabling more flexible integration with neural NLP models.
The research also emphasizes building resources, benchmarks, and evaluation strategies that support the reproducibility and scalability of biomedical NLP. By integrating symbolic reasoning with modern embedding-based techniques, this work contributes toward more accurate, interpretable, and robust NLP systems for advancing biomedical knowledge discovery.
Objectives:
- Develop ontology-driven methods to improve semantic representation and reasoning in biomedical texts.
- Design semantic search and information extraction systems tailored for complex biomedical datasets.
- Investigate embedding-based approaches to capture nuanced semantic relations in biomedical language.
- Create frameworks to identify inconsistencies and errors in biomedical literature through ontology-based reasoning.
- Build interoperable resources, ontologies, and benchmarks to support biomedical NLP research and applications.
Keywords: Bioinformatics | Big Data | Natural Language Processing | Ontologies | Machine Learning / Deep Learning |
Publications
Book chapters
N. H. Nisansa D de Silva, "Relational Databases and Biomedical Big Data", Bioinformatics in MicroRNA Research, pp. 69--81, 2017. doi: 10.1007/978-1-4939-7046-9_5
Journal Papers
Jingshan Huang, Fernando Gutierrez, Harrison J Strachan, Dejing Dou, Weili Huang, Barry Smith, Judith A Blake, Karen Eilbeck, Darren A Natale, Yu Lin, Bin Wu, Nisansa de Silva, and others, "OmniSearch: a semantic search system based on the Ontology for MIcroRNA Target (OMIT) for microRNA-target gene interaction data", Journal of biomedical semantics, vol. 7, no. 1, pp. 25, 2016. doi: 10.1186/s13326-016-0064-2
Jingshan Huang, Karen Eilbeck, Barry Smith, Judith A Blake, Dejing Dou, Weili Huang, Darren A Natale, Alan Ruttenberg, Jun Huan, Michael T Zimmermann, Guoqian Jiang, Yu Lin, Bin Wu, Harrison J. Strachan, Nisansa de Silva, and others, "The development of non-coding RNA ontology", International journal of data mining and bioinformatics, vol. 15, no. 3, pp. 214--232, 2016. doi: 10.1504/IJDMB.2016.077072
Conference Papers
Nisansa de Silva and Dejing Dou, "Semantic Oppositeness Embedding Using an Autoencoder-based Learning Model", in Database and Expert Systems Applications, 2019, pp. 159--174. doi: 10.1007/978-3-030-27615-7_12
Nisansa de Silva, Dejing Dou, and Jingshan Huang, "Discovering Inconsistencies in PubMed Abstracts Through Ontology-Based Information Extraction", in Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, New York, NY, USA: ACM, 2017, pp. 362--371. doi: 10.1145/3107411.3107452
Team
External Collaborators: | Jingshan Huang | Fernando Gutierrez |

