You need to agree to share your contact information to access this model
This repository is publicly accessible, but you have to accept the conditions to access its files and content.
I confirm this request is for academic research/education, I will provide accurate institutional information (including institutional email and PI), and I agree to non-commercial and non-harmful use.
Log in or Sign Up to review the conditions and access this model content.
Accurate and Rapid Prediction of Protein pKa: Protein Language Models Reveal the Sequence-pKa Relationship
This is the official implementation of the paper "Accurate and Rapid Prediction of Protein pKa: Protein Language Models Reveal the Sequence-pKa Relationship".
Freely available server
Our pKALM server is available: Access pKALM
Installation
conda create -n pkalm python=3.12
conda activate pkalm
pip install -r requirements.txt
Usage
Use predict_batch.py to predict pKa values using a FASTA file:
--input: a FASTA-formatted file containing protein sequences.--out_dir: the output directory.
Use predict.py to predict pKa values using an input file:
--input: a CSV-formatted file containingidx,res, andseqcolumns.--out_dir: the output directory.
Datasets
data/seq_train.csvanddata/seq_test.csv: The training and testing datasets used in the paper.data/PKAD2_DOWNLOAD.xlsx: The raw PKAD-2 dataset.data/process_PKAD2_DOWNLOAD_rev.xlsx: The processed PKAD-2 dataset.data/IPC2_peptide_25.csvanddata/IPC2_peptide_75.csv: The peptide IPC2 datasets for testing and training.data/IPC2_protein_25.csvanddata/IPC2_protein_75.csv: The protein IPC2 datasets for testing and training.data/UP000005640_9606.fasta: The human proteome dataset for benchmarking speeds.
pKALM User Access Agreement
By requesting access to this model, you acknowledge and agree that:
- Academic-use eligibility: Access is intended for academic research and education users affiliated with recognized universities or research institutions.
- Required institutional information: You must provide accurate institutional details, including institutional email, institution/lab, and PI information for verification.
- Identity and contact sharing: Your Hugging Face username and email address may be used and stored by maintainers for access review, communication, and compliance.
- Non-commercial restriction: You will use this model only for non-commercial academic purposes unless explicit written permission is granted by the maintainers.
- Prohibited harmful use: You will not use this model to design, optimize, or support activities that may harm humans, animals, public safety, critical infrastructure, or the environment.
- No high-stakes sole reliance: You will not rely on this model output as the sole basis for medical, clinical, safety-critical, or other high-stakes decisions without qualified expert validation.
- No unauthorized redistribution: You will not bypass access controls, share gated files with unauthorized parties, or misrepresent your identity, affiliation, or intended use.
- Compliance responsibility: You are responsible for complying with applicable laws, regulations, institutional policies, research ethics requirements, and third-party licenses.
- Access governance: Access decisions are at maintainers' discretion and may be denied or revoked at any time in cases of misuse, policy breach, or unverifiable information.
Help
If you have any questions, don't hesitate to get in touch with me at shijie.xu@ees.hokudai.ac.jp.