Same-length haplotypes with high domain architecture diversity (filtered from 18,500 genes in JoGo)
This page lists 60 genes from the JoGo haplotype-resolved proteome where Pfam protein domain architectures vary across same-length haplotypes (a-level). These represent cases where amino acid substitutions (not indels) cause changes in domain recognition, suggesting functional divergence among haplotypes at the protein domain level.
Genes are selected with ≥5 distinct domain architectures and ≥30 domain differences, filtered from 18,500 genes with Pfam hits in the database.
Only same protein length haplotype comparisons are included (ref aalen = alt aalen). This isolates domain variation caused by amino acid substitutions rather than insertions/deletions, making differences more biologically interpretable. Genes must have ≥5 distinct domain architectures and ≥30 total differences to appear on this page.
| Type | Badge Color | Description |
|---|---|---|
| boundary_shift | Orange | Same domain present but alignment coordinates shifted — substitutions alter domain boundary recognition |
| domain_lost | Red | Domain present in reference (a0001) is entirely absent in the alternate haplotype |
| domain_gained | Green | Domain absent in reference but present in the alternate haplotype |
| copy_lost | Dark Red | Fewer copies of a repeated domain (e.g., immunoglobulin repeats) |
| copy_gained | Blue | Additional copies of a repeated domain gained |
| Column | Description |
|---|---|
| Interest | Star rating (1–5) based on the interest score formula below |
| Gene | Gene symbol. Click to open the per-gene Pfam domain viewer page |
| Region | Genomic region name (GENE_chrN_start_end). Click to open JoGo browser |
| Prot Len | Protein length (aa) of reference haplotype (a0001) |
| Haplotypes | Number of a-level haplotypes with Pfam domain hits |
| Architectures | Number of distinct domain architectures across haplotypes |
| Total Diffs | Sum of all domain differences vs reference haplotype (a0001) among same-length haplotypes |
| Diff Breakdown | Stacked bar showing proportion of each diff type (orange=shift, red=lost, green=gained, dark red=copy_lost, blue=copy_gained) |
| Diff Types | Number of distinct diff types present (out of 5) |
| Counts | Color-coded badges showing count per diff type |
Genes are ranked by a composite interest score:
Higher scores indicate more biologically interesting domain variation. Genes with many distinct architectures, complete domain loss/gain events, and diverse diff types rank highest. TTN (titin, 72 architectures) and FBN3 (fibrillin-3, 51 architectures) top the list.
| Filter | Description |
|---|---|
| Search | Filter by gene name (case-insensitive substring match) |
| Min Architectures | Only show genes with at least N distinct domain architectures |
| Min Diffs | Only show genes with at least N total domain differences |
| Diff Types: All 5 types | Only show genes exhibiting all 5 difference types |
| Diff Types: Has domain_lost | At least one domain is completely lost in some haplotype |
| Diff Types: Has domain_gained | At least one domain is gained in some haplotype |
Clicking a gene name opens the detailed domain viewer page ({REGION}_pfam_viewer.html) which shows:
The pipeline consists of the following steps:
hmmscan in parallel (8 jobs × 4 CPUs, E-value ≤ 1e-5)Nagasaki M, et al. JoGo 1.0: the ACTG hierarchical nomenclature and database covering 4.7 million haplotypes across 19,194 human genes. Nucleic Acids Research, 2026. doi:10.1093/nar/gkaf1232
| Interest ▼ | Gene | Region | Prot Len | Haplotypes | Architectures | Total Diffs | Diff Breakdown | Diff Types | Counts |
|---|