deepTMHMM Topology Variation — Interesting Genes ?

Genes with transmembrane topology variation across haplotypes (filtered from 2,355 genes with ≥2 distinct topologies in JoGo)

Help — deepTMHMM Topology Variation Viewer

Overview

This page lists 2,355 genes from the JoGo haplotype-resolved proteome where the predicted transmembrane topology varies across haplotypes (a-level). Predictions were generated by deepTMHMM, a deep-learning method for transmembrane topology prediction. Only genes with ≥2 distinct topologies are shown.

Data Source

  • Haplotype sequences: JoGo a-level haplotypes (174,376 sequences from 19,193 gene regions)
  • Prediction tool: deepTMHMM (DTU Biolib)
  • Gene annotations: MANE Select v1.2 (GRCh38), 19,316 genes
  • Database: deepTMHMM.db — 597,769 topology segment rows across 19,193 gene regions

Topology States

StateColorDescription
TMhelixRedTransmembrane helix — alpha-helical segment spanning the membrane
signalOrangeSignal peptide — N-terminal targeting sequence cleaved after translocation
insideBlueCytoplasmic (inside) region
outsideGreenExtracellular (outside) / lumenal region
Beta sheetTealTransmembrane beta-barrel sheet (outer membrane proteins)
periplasmPurplePeriplasmic region (rare in human proteins)

Table Columns

ColumnDescription
InterestStar rating (1–5) based on the interest score formula below
GeneMANE gene symbol. Click to open the per-gene topology viewer page
RegionGenomic region name (GENE_chrN_start_end). Click to open JoGo browser
Prot LenMaximum protein length (aa) across haplotypes
HaplotypesNumber of distinct a-level haplotypes for this gene
TopologiesNumber of distinct topology architectures across haplotypes
TM Rangemin–max number of TM helices across haplotypes (e.g., 0-7 means some haplotypes have 0 TM helices and some have 7)
Total DiffsSum of all topology differences vs reference haplotype (a0001)
Diff BreakdownStacked bar showing proportion of each diff type (see colors below)
CountsBadges showing count per diff type
Haps w/ TMNumber of haplotypes that have at least one TM helix
Haps w/ SPNumber of haplotypes that have a signal peptide

Difference Types

TypeBadgeDescription
TM count changeN TMHaplotype has a different number of TM helices than the reference (a0001)
Signal changeN SPHaplotype gained or lost a signal peptide vs reference
Boundary shiftN shiftSame TM/signal count as reference but different segment boundaries

Interest Score

Genes are ranked by a composite interest score:

score = n_topologies × 5.0 + n_total_diffs × 0.1 + tm_range × 3.0 + tm_change_bonus (2.0 × min(n_tm_change, 10)) + signal_change_bonus (4.0 if any)

Higher scores indicate more biologically interesting topology variation. Genes with large TM count ranges, many distinct topologies, and signal peptide changes rank highest.

Filters

FilterDescription
SearchFilter by gene name (case-insensitive substring match)
Min TopologiesOnly show genes with at least N distinct topology architectures
Min DiffsOnly show genes with at least N total differences vs reference
Has TM count changeAt least one haplotype has a different TM helix count than reference
Has signal changeAt least one haplotype differs in signal peptide presence
TM range ≥ 3The difference between max and min TM helix count is ≥ 3
Some haps 0 TM, some >0Some haplotypes have no TM helices while others do — most dramatic variation

Per-Gene Viewer

Clicking a gene name opens the detailed topology viewer page (REGION_tmhmm_viewer.html) which shows:

  • Gene Information — MANE gene metadata, TM helix count, topology string
  • Topology Diagram — Interactive SVG showing all segments per haplotype with hover tooltips
  • Topology Differences — Table of per-haplotype differences vs reference
  • Haplotype Details — Collapsible per-haplotype segment tables
  • Summary Statistics — TM helix distribution and diff breakdown

Methods

The pipeline consists of the following steps:

  • Extract amino acid sequences from JoGo haplotype TSV (174,376 sequences, filtering 518 zero-length entries)
  • Split into 174 chunks of 1,000 sequences each
  • Run deepTMHMM via Singularity container on SLURM cluster (60 cores, 128 GB per job)
  • Parse GFF3 output and reformat into TSV
  • Build SQLite database with deepTMHMM predictions + MANE gene annotations
  • Generate per-gene HTML viewer pages (19,193 files) and this summary page
Interest Gene Region Prot Len Haplotypes Topologies TM Range Total Diffs Diff Breakdown Counts Haps w/ TM Haps w/ SP