Computational Biology Research

Insulin dynamics, Type 2 Diabetes drug targets, molecular simulations, and systems biology — a research program at the intersection of computational chemistry and metabolic disease.

Molecular Dynamics T2D Drug Discovery GROMACS ChEMBL AlphaFold Cornell BioHPC

Overview

Comprehensive computational investigation of insulin biology and Type 2 Diabetes, leveraging Cornell BioHPC (660 cores, 2.5 PB Lustre), ChEMBL, RCSB PDB, AlphaFold, GEO, and GWAS Catalog.

410 ns
Insulin MD Trajectory
4,865
Drug Compounds (ChEMBL)
22
PDB Crystal Structures
20
AlphaFold Predictions
7
Validated Drug Targets
5
GEO Expression Datasets
660
BioHPC CPU Cores
3
Active MD Simulations

Insulin Monomer Molecular Dynamics

410 ns all-atom MD simulation of human insulin monomer (PDB: 1MSO, Lispro variant). GROMACS 2022.1, Amber03 force field, SPC/E water, 300 K, 1 bar. The longest continuous insulin monomer trajectory from our lab, revealing progressive structural instability.

RMSD — Backbone Deviation (nm)

Radius of Gyration (nm)

RMSF — Per-Residue Flexibility (nm)

Solvent Accessible Surface Area (nm²)

Key Findings

MetricFirst 100 nsLast 100 nsTrendInterpretation
RMSD1.300 nm1.881 nm↑ IncreasingProgressive structural drift — not equilibrated
Rg1.845 nm2.322 nm↑ ExpandingMonomer unfolding / swelling
SASA38.54 nm²38.80 nm²~ StableSurface area relatively constant
RMSF (A-chain)1.45 – 1.97 nmHighC-terminal especially disordered
RMSF (B-chain)0.88 – 1.60 nmModerateResidues 9–16 most rigid (~0.9 nm)

Biological significance: The continuous RMSD increase over 410 ns demonstrates that insulin monomers are intrinsically unstable — consistent with the known biological requirement for zinc-mediated hexamer formation in pancreatic β-cell storage granules. The A-chain's higher flexibility (particularly residues 17–21) suggests it initiates unfolding, while the B-chain core (residues 9–16) provides residual stability.

T2D Drug Target Landscape

ChEMBL v33 analysis of 7 validated Type 2 Diabetes drug targets. 4,865 compounds evaluated for potency (IC50 < 100 nM). DPP-4 emerges as the most druggable target.

Compound Library Size & Potency by Target

TargetChEMBL IDCompoundsPotent (<100nM)Hit RateMedian IC50Approved Drugs
DPP-4CHEMBL284290941445.5%49 nMSitagliptin, Vildagliptin, Saxagliptin
PPARγCHEMBL409343515335.2%55.5 nMPioglitazone, Rosiglitazone
GLP-1RCHEMBL28478324531.3%420 nMSemaglutide*, Tirzepatide*
GPR40CHEMBL3983559213.8%170 nMFasiglifam (withdrawn)
GCKCHEMBL235520275.2%270 nMDorzagliatin (China)
SGLT2CHEMBL351076870.9%4,422 nMEmpagliflozin, Dapagliflozin
INSRCHEMBL1981891384.3%7,900 nMInsulin (biologics)

* GLP-1R approved drugs are peptide agonists, not small molecules. Small molecule GLP-1R modulators remain an active research area.

Structural Biology Database

22 curated PDB structures and 20 AlphaFold predictions covering insulin variants, receptor complexes, and T2D drug targets.

Selected PDB Structures

PDBProteinResolutionMethodRelevance
1MSOInsulin Lispro (T6)1.0 ÅX-rayOur MD starting structure, rapid-acting analog
4ZXBInsulin Receptor Ectodomain3.3 ÅX-rayHexamer production MD (running)
6PXVInsulin Receptor (full-length, cryo-EM)3.2 ÅCryo-EMDPP-4 production MD (queued)
7KI0GLP-1R + Semaglutide2.5 ÅCryo-EMSemaglutide binding mode
6X18GLP-1R + GLP-1 peptide2.1 ÅX-rayEndogenous ligand complex
5VEWGLP-1R + small molecule2.7 ÅX-raySmall molecule GPCR modulation
2PRGPPARγ LBD2.3 ÅX-rayThiazolidinedione binding
7VSISGLT2-MAP172.95 ÅCryo-EMEmpagliflozin binding site
5YQZGlucagon Receptor3.0 ÅX-rayCounter-regulatory target
3I40Human Insulin1.85 ÅX-rayNative insulin reference
1GCNGlucagon3.0 ÅX-rayCounter-regulatory hormone
4CFHAMPK (active form)3.24 ÅX-rayMetformin downstream target

T2D Protein Interaction Network

Large-scale protein–protein interaction network analysis using STRING database. Mapped connectivity of T2D-associated genes to identify hub nodes and functional modules.

13.7M
Interaction Edges
7
Core Drug Targets
20
T2D Hub Proteins

T2D Core Protein Targets

ProteinGeneUniProtFunctionDrug Class
InsulinINSP01308Glucose uptake signalingRecombinant insulin
Insulin ReceptorINSRP06213Tyrosine kinase receptorInsulin sensitizers
DPP-4DPP4P27487Incretin degradationGliptins
GLP-1 ReceptorGLP1RP43220Incretin signaling (GPCR)GLP-1 agonists
SGLT2SLC5A2P31639Renal glucose reabsorptionGliflozins
PPARγPPARGP37231Adipogenesis, insulin sensitivityThiazolidinediones
GlucokinaseGCKP35557Glucose sensing (β-cell)GK activators

Differential Gene Expression

GEO microarray datasets comparing T2D vs. control pancreatic islet samples. Identifying differentially expressed genes for pathway enrichment and drug target validation.

DatasetSamplesPlatformTissueStatus
GSE2572413 (T2D) / 7 (Ctrl)Affymetrix HG-U133APancreatic islets3 significant genes
GSE3864263Affymetrix HG-U133 Plus 2Pancreatic isletsMetadata parsing
GSE4176277Affymetrix HG-U133 Plus 2Pancreatic isletsMetadata parsing
GSE5024489Illumina HT-12 v4Pancreatic isletsMetadata parsing
GSE76894103Affymetrix HG-U133 Plus 2Pancreatic isletsMetadata parsing

Total: 345 samples across 5 datasets. GSE25724 successfully analyzed (T2D vs control split). Remaining datasets require manual metadata curation for group assignment.

AlphaFold Structure Predictions

20 AlphaFold-predicted structures for T2D-related proteins, complementing experimental PDB entries for targets lacking high-resolution crystal structures.

ProteinGeneRole in T2DAlphaFold
InsulinINSPrimary hormoneAF-P01308
Insulin ReceptorINSRSignal transductionAF-P06213
GLP-1 ReceptorGLP1RIncretin pathwayAF-P43220
SGLT2SLC5A2Renal glucose transportAF-P31639
GlucokinaseGCKGlucose sensorAF-P35557
PPARγPPARGInsulin sensitizationAF-P37231
DPP-4DPP4Incretin degradationAF-P27487
GlucagonGCGCounter-regulationAF-P01275
TLR4TLR4Inflammation in T2DAF-O00206
AMPK α1PRKAA1Metformin targetAF-Q13131

Simulation Campaign

Cornell BioHPC SLURM cluster — active and completed molecular dynamics simulations.

Insulin Monomer 410 ns

Mar 8–23, 2026 · Job 548214 · cbsuecco14 (56 cores) · 212.4 ns/day

PDB 1MSO (Lispro). Amber03/SPC/E. Revealed progressive instability: Rg 1.85 → 2.32 nm. Confirms hexameric storage preference.

Insulin Monomer 100 ns (Pilot)

Mar 8, 2026 · Job 547545 · cbsuecco12 (48 cores) · 277.7 ns/day

Initial exploration. RMSD equilibrated after ~20 ns, then Rg expansion detected — motivated 410 ns extension.

ChEMBL + PDB + GEO + GWAS Batch Analysis

Mar 8, 2026 · Job 547584 · cbsuecco07 (32 cores)

7 drug targets (ChEMBL), 22 PDB structures, 5 GEO datasets (345 samples), GWAS catalog. Full T2D landscape mapping.

Insulin Hexamer 50 ns

Mar 27, 2026 · Job 548385 · cbsuecco01 (16 cores)

PDB 4ZXB (6 chains, 1601 residues). Comparing hexamer vs monomer stability. T = 299.7 K, PE = −1.06×10⁷ kJ/mol.

DPP-4 Full Pipeline (EM→NVT→NPT→50ns)

Mar 27, 2026 · Job 548383 · Queued

PDB 6PXV (6 chains, 1830 residues). Fresh topology rebuild. GROMACS 5.1.2 for large complex handling.

Insulin Lispro 500 ns Extension

Planned — pending GPU reservation (NVIDIA A40)

Extend 410 ns trajectory to determine if RMSD ever reaches a plateau. GPU acceleration expected 10–50× speedup.

Literature Monitoring

Automated daily scan of arXiv, PubMed, bioRxiv, medRxiv, and OpenAlex. Selected high-relevance papers from our continuous monitoring pipeline.

GLP-1R Agonists Activate POMC Neurons ⭐⭐⭐⭐

Merkle lab, Cambridge. Semaglutide causes Ca²⁺ influx via L-type channels in hPSC-derived POMC neurons. Direct human evidence for GLP-1R mechanism. bioRxiv 2024.

Structural Atlas of Human Interactome ⭐⭐⭐⭐

Boltz-2 for de novo binary protein structure prediction at atom scale. Applicable to T2D protein network analysis. bioRxiv 2025.

TZDs vs Dementia in T2D ⭐⭐⭐

Target trial emulation: thiazolidinediones vs other antidiabetics on dementia incidence. PPARγ neuroprotection. medRxiv 2026.

GCN for CVD Risk in T2D ⭐⭐⭐

Interpretable graph convolutional networks for cardiovascular disease risk prediction. Extends network medicine approach. PubMed 2026.

Force Field from Scratch ⭐⭐⭐

Training protein + small molecule force fields de novo. Potential improvement to our MD simulation accuracy. arXiv 2026.

CGM Pre-Diabetes Stratification ⭐⭐⭐

Continuous glucose monitoring in non-diabetics reveals glycemic phenotypes. Pre-diabetes computational modeling opportunity. PubMed 2026.

Methods

Reproducible protocols for all computational analyses.

Molecular Dynamics

Software: GROMACS 2022.1 (CPU), 2024.2 (CUDA 12)
Force field: Amber03 + SPC/E water
Box: Dodecahedron, d = 1.5 nm minimum
Equilibration: EM (steepest descent) → NVT 100 ps (V-rescale 300K) → NPT 100 ps (Parrinello-Rahman 1 bar)
Production: dt = 2 fs, PME electrostatics, LINCS H-bond constraints
Analysis: RMSD, RMSF, Rg, SASA (GROMACS built-in tools)

Cheminformatics

Database: ChEMBL v33
Targets: 7 T2D targets (DPP-4, PPARγ, GLP-1R, SGLT2, INSR, GCK, GPR40)
Filters: IC50, Ki, EC50 assays; potency threshold < 100 nM
Tools: ChEMBL web services API, RDKit, pandas

Structural Biology

PDB: 22 structures (resolution < 4.5 Å)
AlphaFold: 20 predictions (EBI database)
Visualization: PyMOL, ChimeraX
Planned: AutoDock Vina, Schrödinger Glide for virtual screening

Systems Biology

Network: STRING PPI database (13.7M edges)
Expression: GEO microarray (5 datasets, 345 samples)
Genomics: GWAS Catalog (T2D-associated loci)
Structure prediction: AlphaFold DB + ColabFold
Literature: Daily arXiv/PubMed/bioRxiv scan (automated cron)

Compute Infrastructure

ResourceSpecsUse
Cornell BioHPC (ECCO)12 nodes, 28–112 cores/node, 128–1024 GB RAM, SLURMMD production runs
BioHPC GPU2× NVIDIA A40 (48GB), 2× P100, 2× T4GPU-accelerated MD (pending reservation)
Lustre Storage2.5 PB (1.2 + 1.3 PB)Trajectory storage, shared data
Research3 (Johnson)Kaiko crypto data, WRDS databasesCross-domain analysis