Skip to main navigation Skip to search Skip to main content

Unsupervised identification of redundant domain entries in InterPro database using clustering techniques

  • Middle East Technical University
  • European Molecular Biology Laboratory

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

InterPro is a widely used database that integrates functional signatures provided by different protein sequence annotation databases with manual curation; in order to present a comprehensive database of functional sequence annotation. However, the integration of the signatures causes inconsistent and/or redundant annotations in some cases. In this study, we proposed an unsupervised method for the automatic detection of inconsistent and redundant entries in the InterPro database. Two clustering methods: Markov Cluster Algorithm (MCL) and hierarchical clustering are employed in order to investigate to what extent these signatures can be detected. Results show that a considerable amount of (~75%) redundant entries can be identified. The future goal is to develop a system that does the identification of redundant and inconsistent signatures with very high performance using machine learning techniques in a supervised fashion. The findings of the study may aid InterPro curators to fix the problematic entries. It may also be used by curators as a road map before the integration of new signatures.

Original languageEnglish
Title of host publicationBCB 2015 - 6th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
PublisherAssociation for Computing Machinery, Inc
Pages505-506
Number of pages2
ISBN (Electronic)9781450338530
DOIs
Publication statusPublished - 9 Sept 2015
Externally publishedYes
Event6th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, BCB 2015 - Atlanta, United States
Duration: 9 Sept 201512 Sept 2015

Publication series

NameBCB 2015 - 6th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics

Conference

Conference6th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, BCB 2015
Country/TerritoryUnited States
CityAtlanta
Period9/09/1512/09/15

Keywords

  • Clustering
  • HMM alignments
  • Hidden markov models
  • Protein sequence databases
  • Redundancy analysis
  • Similarity detection

Fingerprint

Dive into the research topics of 'Unsupervised identification of redundant domain entries in InterPro database using clustering techniques'. Together they form a unique fingerprint.

Cite this