Breakthrough in Protein Annotation: 'Peptideins' Illuminate Hidden Genetic Landscapes and Spur New Research
May 6, 2026
The article expands human protein annotation beyond canonical genes by recognizing microproteins and introducing the term peptidein for ncORF-encoded products that show RNA translation and protein synthesis but don’t yet meet full protein-coding gene criteria.
The TransCODE Consortium analyzed 7,264 DNA sequences suspected to encode dark proteins and found only 15 with strong experimental support to be catalogued as protein-coding genes.
Evolutionary analysis using ORBLv and ORBLq shows many ncORFs, especially uORFs and overlapping ORFs, have ORF-level constraint, linking evolutionary signals to potential functional peptide detectability.
Despite the new naming, experimental evidence for most peptideins remains limited and their functional roles are largely unknown at this stage.
Experts like Christoph Dietrich hail the development as a major breakthrough that could trigger a new wave of research into these short proteins.
A substantial share of ncORF-encoded microproteins are presented by HLA-I, implying intracellular origins and immunopeptidome relevance; longer coding sequences, ncORF position, tissue expression, and isoelectric point influence detectability and presentation.
The TransCODE workflow certifies ncORFs as reference human proteins using proteomics, immunopeptidomics, and Ribo-seq data under stringent HUPO-HPP guidelines (two unique peptides, minimum 18 residues, FDR <0.1%).
Functional genomics, including CRISPR-Cas9 knockout screens, identified six candidate ncORFs as potential peptideins or protein-coding genes when combining HLA evidence, ORF constraint, and knockout phenotypes.
Rebranding to peptideins aims to attract attention and spur research into their cellular roles and potential disease relevance.
A new classification called peptideins formally includes thousands of short protein-like sequences into human genome and protein databases.
A tiered annotation framework for ncORFs proposes tier 1A for strong protein-coding evidence and peptidein tiers 1B, 2A, 2B for candidates with confirmed translation but lacking full coding annotation.
Dark proteins, or microproteins, are typically very short, often lack evolutionary relatives, and may be encoded near or overlapping known genes.
Summary based on 2 sources
Get a daily email with more Science stories
Sources

Nature • May 6, 2026
Expanding the human proteome with microproteins and peptideins
Nature • May 6, 2026
Revealed: the mysterious ‘dark’ proteins that might play a big role in biology