The protein sequence analysis has been of great interests recently. Through the protein data bank, a protein sequence is easy obtained and ready analyzed for various studies and applications. Conventional neural network and genetic algorithm have been applied for the protein classification, unfortunately, these methods need many empirical parameters and cannot have reasonable predictions for various cases. In this paper, we present efficient statistical algorithms to classify the stability of proteins based on their sequence. A protein sequence consists of successive amino acid codes and can be considered as multivariate categorical data. Based on the statistical variance analysis for data set in each group, stable or unstable proteins, the weights are calculated and become an important clue for the effects of the combination of amino acids codes on protein stability. Once the weights for every combination of two successive amino acid residues have been decided, we can assign each protein a score which is from a function of the sequence presenting its stability. The distribution of the score for a stable protein is different from the score of an unstable protein. In testing various protein problems, the proposed approach is well suit in large scale protein stability analysis by its sequence.
Journal: TechConnect Briefs
Volume: 1, Technical Proceedings of the 2003 Nanotechnology Conference and Trade Show, Volume 1
Published: February 23, 2003
Pages: 24 - 27
Industry sector: Medical & Biotech
Topics: Biomaterials, Informatics, Modeling & Simulation