Spectral clustering of protein sequences
What is SCPS?
SCPS is an efficient, user-friendly, scalable and multi-platform implementation of a spectral clustering method for clustering homologous proteins. SCPS also implements connected component analysis and hierarchical clustering, integrates TribeMCL and interfaces with external tools such as Cytoscape and NCBI BLAST.
Overview
Clustering protein sequences based on their evolutionary relationship is important for sequence annotation as structural and functional relationships can potentially be inferred. Most of the existing methods are based on simply thresholding a measure related to the distance between sequences. Paccanaro et al (2006) mapped this problem into that of clustering the nodes of a weighted undirected graph in which each node corresponds to a protein sequence and the weights on the edges correspond to a measure of distance between two sequences. The goal is to partition such a graph into a set of discrete clusters whose members are homologs.
SCPS is an improved implementation of the method of Paccanaro et al (2006). The algorithm was tested on difficult sets of proteins whose relationships are known from the SCOP database. The method correctly identified many of the superfamily relationships, and the quality of the clusters as quantified by a measure that combines sensitivity and specificity was consistently better (on average, improvements were 45% over connected component analysis and 28% over TribeMCL).
Screenshots
Click on any of the images below to get an idea of how SCPS looks like and whether it is suitable for you.
References
Nepusz T., Sasidharan R., Paccanaro A.: SCPS: a fast implementation of a spectral method for detecting protein families on a genome-wide scale. BMC Bioinformatics 11:120, 2010.
(read online)
Paccanaro A., Casbon J.A., Saqi M.: Spectral clustering of protein sequences. Nucleic Acids Research 34(5):1571–1580, 2006. (read online)

cs
rhul

