Insulin dynamics, Type 2 Diabetes drug targets, molecular simulations, and systems biology — a research program at the intersection of computational chemistry and metabolic disease.
Comprehensive computational investigation of insulin biology and Type 2 Diabetes, leveraging Cornell BioHPC (660 cores, 2.5 PB Lustre), ChEMBL, RCSB PDB, AlphaFold, GEO, and GWAS Catalog.
410 ns all-atom MD simulation of human insulin monomer (PDB: 1MSO, Lispro variant). GROMACS 2022.1, Amber03 force field, SPC/E water, 300 K, 1 bar. The longest continuous insulin monomer trajectory from our lab, revealing progressive structural instability.
| Metric | First 100 ns | Last 100 ns | Trend | Interpretation |
|---|---|---|---|---|
| RMSD | 1.300 nm | 1.881 nm | ↑ Increasing | Progressive structural drift — not equilibrated |
| Rg | 1.845 nm | 2.322 nm | ↑ Expanding | Monomer unfolding / swelling |
| SASA | 38.54 nm² | 38.80 nm² | ~ Stable | Surface area relatively constant |
| RMSF (A-chain) | 1.45 – 1.97 nm | High | C-terminal especially disordered | |
| RMSF (B-chain) | 0.88 – 1.60 nm | Moderate | Residues 9–16 most rigid (~0.9 nm) | |
Biological significance: The continuous RMSD increase over 410 ns demonstrates that insulin monomers are intrinsically unstable — consistent with the known biological requirement for zinc-mediated hexamer formation in pancreatic β-cell storage granules. The A-chain's higher flexibility (particularly residues 17–21) suggests it initiates unfolding, while the B-chain core (residues 9–16) provides residual stability.
ChEMBL v33 analysis of 7 validated Type 2 Diabetes drug targets. 4,865 compounds evaluated for potency (IC50 < 100 nM). DPP-4 emerges as the most druggable target.
| Target | ChEMBL ID | Compounds | Potent (<100nM) | Hit Rate | Median IC50 | Approved Drugs |
|---|---|---|---|---|---|---|
| DPP-4 | CHEMBL2842 | 909 | 414 | 45.5% | 49 nM | Sitagliptin, Vildagliptin, Saxagliptin |
| PPARγ | CHEMBL4093 | 435 | 153 | 35.2% | 55.5 nM | Pioglitazone, Rosiglitazone |
| GLP-1R | CHEMBL284 | 783 | 245 | 31.3% | 420 nM | Semaglutide*, Tirzepatide* |
| GPR40 | CHEMBL3983 | 559 | 21 | 3.8% | 170 nM | Fasiglifam (withdrawn) |
| GCK | CHEMBL235 | 520 | 27 | 5.2% | 270 nM | Dorzagliatin (China) |
| SGLT2 | CHEMBL3510 | 768 | 7 | 0.9% | 4,422 nM | Empagliflozin, Dapagliflozin |
| INSR | CHEMBL1981 | 891 | 38 | 4.3% | 7,900 nM | Insulin (biologics) |
* GLP-1R approved drugs are peptide agonists, not small molecules. Small molecule GLP-1R modulators remain an active research area.
22 curated PDB structures and 20 AlphaFold predictions covering insulin variants, receptor complexes, and T2D drug targets.
| PDB | Protein | Resolution | Method | Relevance |
|---|---|---|---|---|
| 1MSO | Insulin Lispro (T6) | 1.0 Å | X-ray | Our MD starting structure, rapid-acting analog |
| 4ZXB | Insulin Receptor Ectodomain | 3.3 Å | X-ray | Hexamer production MD (running) |
| 6PXV | Insulin Receptor (full-length, cryo-EM) | 3.2 Å | Cryo-EM | DPP-4 production MD (queued) |
| 7KI0 | GLP-1R + Semaglutide | 2.5 Å | Cryo-EM | Semaglutide binding mode |
| 6X18 | GLP-1R + GLP-1 peptide | 2.1 Å | X-ray | Endogenous ligand complex |
| 5VEW | GLP-1R + small molecule | 2.7 Å | X-ray | Small molecule GPCR modulation |
| 2PRG | PPARγ LBD | 2.3 Å | X-ray | Thiazolidinedione binding |
| 7VSI | SGLT2-MAP17 | 2.95 Å | Cryo-EM | Empagliflozin binding site |
| 5YQZ | Glucagon Receptor | 3.0 Å | X-ray | Counter-regulatory target |
| 3I40 | Human Insulin | 1.85 Å | X-ray | Native insulin reference |
| 1GCN | Glucagon | 3.0 Å | X-ray | Counter-regulatory hormone |
| 4CFH | AMPK (active form) | 3.24 Å | X-ray | Metformin downstream target |
Large-scale protein–protein interaction network analysis using STRING database. Mapped connectivity of T2D-associated genes to identify hub nodes and functional modules.
| Protein | Gene | UniProt | Function | Drug Class |
|---|---|---|---|---|
| Insulin | INS | P01308 | Glucose uptake signaling | Recombinant insulin |
| Insulin Receptor | INSR | P06213 | Tyrosine kinase receptor | Insulin sensitizers |
| DPP-4 | DPP4 | P27487 | Incretin degradation | Gliptins |
| GLP-1 Receptor | GLP1R | P43220 | Incretin signaling (GPCR) | GLP-1 agonists |
| SGLT2 | SLC5A2 | P31639 | Renal glucose reabsorption | Gliflozins |
| PPARγ | PPARG | P37231 | Adipogenesis, insulin sensitivity | Thiazolidinediones |
| Glucokinase | GCK | P35557 | Glucose sensing (β-cell) | GK activators |
GEO microarray datasets comparing T2D vs. control pancreatic islet samples. Identifying differentially expressed genes for pathway enrichment and drug target validation.
| Dataset | Samples | Platform | Tissue | Status |
|---|---|---|---|---|
| GSE25724 | 13 (T2D) / 7 (Ctrl) | Affymetrix HG-U133A | Pancreatic islets | 3 significant genes |
| GSE38642 | 63 | Affymetrix HG-U133 Plus 2 | Pancreatic islets | Metadata parsing |
| GSE41762 | 77 | Affymetrix HG-U133 Plus 2 | Pancreatic islets | Metadata parsing |
| GSE50244 | 89 | Illumina HT-12 v4 | Pancreatic islets | Metadata parsing |
| GSE76894 | 103 | Affymetrix HG-U133 Plus 2 | Pancreatic islets | Metadata parsing |
Total: 345 samples across 5 datasets. GSE25724 successfully analyzed (T2D vs control split). Remaining datasets require manual metadata curation for group assignment.
20 AlphaFold-predicted structures for T2D-related proteins, complementing experimental PDB entries for targets lacking high-resolution crystal structures.
| Protein | Gene | Role in T2D | AlphaFold |
|---|---|---|---|
| Insulin | INS | Primary hormone | AF-P01308 |
| Insulin Receptor | INSR | Signal transduction | AF-P06213 |
| GLP-1 Receptor | GLP1R | Incretin pathway | AF-P43220 |
| SGLT2 | SLC5A2 | Renal glucose transport | AF-P31639 |
| Glucokinase | GCK | Glucose sensor | AF-P35557 |
| PPARγ | PPARG | Insulin sensitization | AF-P37231 |
| DPP-4 | DPP4 | Incretin degradation | AF-P27487 |
| Glucagon | GCG | Counter-regulation | AF-P01275 |
| TLR4 | TLR4 | Inflammation in T2D | AF-O00206 |
| AMPK α1 | PRKAA1 | Metformin target | AF-Q13131 |
Cornell BioHPC SLURM cluster — active and completed molecular dynamics simulations.
PDB 1MSO (Lispro). Amber03/SPC/E. Revealed progressive instability: Rg 1.85 → 2.32 nm. Confirms hexameric storage preference.
Initial exploration. RMSD equilibrated after ~20 ns, then Rg expansion detected — motivated 410 ns extension.
7 drug targets (ChEMBL), 22 PDB structures, 5 GEO datasets (345 samples), GWAS catalog. Full T2D landscape mapping.
PDB 4ZXB (6 chains, 1601 residues). Comparing hexamer vs monomer stability. T = 299.7 K, PE = −1.06×10⁷ kJ/mol.
PDB 6PXV (6 chains, 1830 residues). Fresh topology rebuild. GROMACS 5.1.2 for large complex handling.
Extend 410 ns trajectory to determine if RMSD ever reaches a plateau. GPU acceleration expected 10–50× speedup.
Automated daily scan of arXiv, PubMed, bioRxiv, medRxiv, and OpenAlex. Selected high-relevance papers from our continuous monitoring pipeline.
Merkle lab, Cambridge. Semaglutide causes Ca²⁺ influx via L-type channels in hPSC-derived POMC neurons. Direct human evidence for GLP-1R mechanism. bioRxiv 2024.
Boltz-2 for de novo binary protein structure prediction at atom scale. Applicable to T2D protein network analysis. bioRxiv 2025.
Target trial emulation: thiazolidinediones vs other antidiabetics on dementia incidence. PPARγ neuroprotection. medRxiv 2026.
Interpretable graph convolutional networks for cardiovascular disease risk prediction. Extends network medicine approach. PubMed 2026.
Training protein + small molecule force fields de novo. Potential improvement to our MD simulation accuracy. arXiv 2026.
Continuous glucose monitoring in non-diabetics reveals glycemic phenotypes. Pre-diabetes computational modeling opportunity. PubMed 2026.
Reproducible protocols for all computational analyses.
Software: GROMACS 2022.1 (CPU), 2024.2 (CUDA 12)
Force field: Amber03 + SPC/E water
Box: Dodecahedron, d = 1.5 nm minimum
Equilibration: EM (steepest descent) → NVT 100 ps (V-rescale 300K) → NPT 100 ps (Parrinello-Rahman 1 bar)
Production: dt = 2 fs, PME electrostatics, LINCS H-bond constraints
Analysis: RMSD, RMSF, Rg, SASA (GROMACS built-in tools)
Database: ChEMBL v33
Targets: 7 T2D targets (DPP-4, PPARγ, GLP-1R, SGLT2, INSR, GCK, GPR40)
Filters: IC50, Ki, EC50 assays; potency threshold < 100 nM
Tools: ChEMBL web services API, RDKit, pandas
PDB: 22 structures (resolution < 4.5 Å)
AlphaFold: 20 predictions (EBI database)
Visualization: PyMOL, ChimeraX
Planned: AutoDock Vina, Schrödinger Glide for virtual screening
Network: STRING PPI database (13.7M edges)
Expression: GEO microarray (5 datasets, 345 samples)
Genomics: GWAS Catalog (T2D-associated loci)
Structure prediction: AlphaFold DB + ColabFold
Literature: Daily arXiv/PubMed/bioRxiv scan (automated cron)
| Resource | Specs | Use |
|---|---|---|
| Cornell BioHPC (ECCO) | 12 nodes, 28–112 cores/node, 128–1024 GB RAM, SLURM | MD production runs |
| BioHPC GPU | 2× NVIDIA A40 (48GB), 2× P100, 2× T4 | GPU-accelerated MD (pending reservation) |
| Lustre Storage | 2.5 PB (1.2 + 1.3 PB) | Trajectory storage, shared data |
| Research3 (Johnson) | Kaiko crypto data, WRDS databases | Cross-domain analysis |