Predicting Kinase Functional Sites Using Hierarchical Stochastic Language Modeling

Huan Yu1,*, Guojun Pei1,*, Peng Ge2, Xiangzhong Fang1, Fengzhu Sun3, Luhua Lai2, Minping Qian1,2, Minghua Deng1,2

1 LMAM, School of Mathematical Sciences, Peking University, Beijing 100871, China
2 Center for Theoretical Biology, Peking University, Beijing 100871, China
3 Molecular and Computational Biology Program, University of Southern California, 1050 Childs Way, Los Angeles, CA 90089-2910, USA
* These authors contributed equally to this work.


Original data

1. Kinase sequences with E.C. number from Swiss-PROT
            Sequences, Zipped (2.12M bytes)

2. Protein structures in PDB for validation

            PDB entries, Text file (11K bytes)

3. PROSITE patterns in RE format for validation
           
Prosite patterns, Zipped (12K bytes)


Training result models for kinase
All Models, TGZ file (16K bytes)


Validations (PDB/PROSITE/Cross-validation)

        Validations, EXCEL file (860K bytes)




Program

        Test program, EXE file (776K bytes)
The test program takes two parameters, the first is the filename of the model file, and the second is the filename of the sequence file. The sequence file contains data in lines: one line of sequence name followed by one line of the sequence.
Here is a sample sequence file, sample model and sample result

Corresponding Author:
Minghua Deng