| Title: | Amino Acid Annotation (A3) Format |
|---|---|
| Description: | Implements the Amino Acid Annotation (A3) format using 'S7' classes. The A3 format is a structured schema for annotating amino acid sequences with site, region, post-translational modification (PTM), processing event, and variant information. Provides functions to create, read, and write A3 objects, and to import annotations from 'UniProt', 'AlphaFold', 'PDBe', 'ClinVar', and 'Ensembl'. |
| Authors: | E.D. Gennatas [aut, cre, cph] (ORCID: <https://orcid.org/0000-0001-9280-3609>) |
| Maintainer: | E.D. Gennatas <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 0.5.3 |
| Built: | 2026-05-27 18:09:39 UTC |
| Source: | https://github.com/rtemis-org/a3 |
Amino Acid Annotation format utilities
rtemis.a3: Amino Acid Annotation format
Maintainer: E.D. Gennatas [email protected] (ORCID) [copyright holder]
Useful links:
Report bugs at https://github.com/rtemis-org/a3/issues
Perform amino acid substitutions
aa_sub(x, substitutions, verbosity = 1L)aa_sub(x, substitutions, verbosity = 1L)
x |
Character vector: Amino acid sequence. e.g. |
substitutions |
Character vector: Substitutions to perform in the format
"OriginalPositionNew", e.g. |
verbosity |
Integer: Verbosity level. |
Character vector with substitutions performed.
EDG
aa_sub(c("A", "R", "N", "D"), c("R2K", "N3S"))aa_sub(c("A", "R", "N", "D"), c("R2K", "N3S"))
Creates an annotation spec (named list) with an A3Position index and optional type,
for use with create_A3().
annotation_position(x, type = NULL)annotation_position(x, type = NULL)
x |
Integer vector: Positions of the annotation (1-based indexing). |
type |
Optional character scalar: Annotation type. |
Named list with index (A3Position) and type (character)
EDG
# Create a site annotation for positions 5 and 17, of type "active site" annotation_position(c(5L, 17L), type = "active site")# Create a site annotation for positions 5 and 17, of type "active site" annotation_position(c(5L, 17L), type = "active site")
Creates an annotation spec (named list) with an A3Range index and optional type,
for use with create_A3().
annotation_range(x, type = NULL)annotation_range(x, type = NULL)
x |
Integer matrix with 2 columns corresponding to start and end positions of the annotation (1-based indexing). |
type |
Optional character scalar: Annotation type. |
Named list with index (A3Range) and type (character)
EDG
# Create a region annotation for ranges 3-10 and 15-22, of type "repeat" annotation_range(rbind(c(3L, 10L), c(15L, 22L)), type = "repeat")# Create a region annotation for ranges 3-10 and 15-22, of type "repeat" annotation_range(rbind(c(3L, 10L), c(15L, 22L)), type = "repeat")
Creates an A3Variant object for variant annotations with position and info.
annotation_variant(x, info = list())annotation_variant(x, info = list())
x |
Integer scalar: Position of the variant (1-based indexing). |
info |
Named list: Additional information about the variant (e.g. reference and alternate amino acids, variant type, etc.) |
A3Variant object
EDG
# Create a variant annotation for position 10 with info about the variant annotation_variant(10L, info = list(ref = "A", alt = "T", type = "missense"))# Create a variant annotation for position 10 with info about the variant annotation_variant(10L, info = list(ref = "A", alt = "T", type = "missense"))
Queries the NCBI ClinVar database via E-utilities for all variants
associated with a gene, and returns those with protein-level amino acid
changes as a named list of A3Variant objects.
clinvar_variants( gene, significance = c("pathogenic", "likely_pathogenic"), max_variants = 500L, esearch_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi", esummary_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi", batch_size = 500L, verbosity = 1L )clinvar_variants( gene, significance = c("pathogenic", "likely_pathogenic"), max_variants = 500L, esearch_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi", esummary_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi", batch_size = 500L, verbosity = 1L )
gene |
Character scalar: HGNC gene symbol, e.g. |
significance |
Character vector: Clinical significance filter. Any
combination of |
max_variants |
Integer scalar: Maximum number of ClinVar records to
fetch after filtering. Defaults to |
esearch_url |
Character scalar: NCBI E-utilities esearch endpoint. |
esummary_url |
Character scalar: NCBI E-utilities esummary endpoint. |
batch_size |
Integer scalar: Number of UIDs per esummary request. NCBI recommends no more than 500 per request. |
verbosity |
Integer scalar: Verbosity level. |
Variants without a parseable protein change (intronic, regulatory, etc.) are silently skipped.
Named list of A3Variant objects. Names use {from}{position}{to}
notation (e.g. "R406W"), with a numeric suffix to disambiguate
duplicates.
EDG
# Requires internet connection and fetches data from ClinVar. ## Not run: # Pathogenic and likely pathogenic variants (default) mapt_variants <- clinvar_variants("MAPT") # All variants regardless of significance mapt_all <- clinvar_variants("MAPT", significance = NULL) ## End(Not run)# Requires internet connection and fetches data from ClinVar. ## Not run: # Pathogenic and likely pathogenic variants (default) mapt_variants <- clinvar_variants("MAPT") # All variants regardless of significance mapt_all <- clinvar_variants("MAPT", significance = NULL) ## End(Not run)
Concatenate character vector to single string for sequence input
concat(x)concat(x)
x |
Character vector to concatenate |
Single character string
EDG
# Concatenate a character vector of single-letter amino acids into a sequence string concat(c("M", "A", "E", "P", "R"))# Concatenate a character vector of single-letter amino acids into a sequence string concat(c("M", "A", "E", "P", "R"))
Create an A3 object from sequence, annotations, and metadata
create_A3( sequence, site = list(), region = list(), ptm = list(), processing = list(), variant = list(), uniprot_id = NULL, description = NULL, reference = NULL, organism = NULL )create_A3( sequence, site = list(), region = list(), ptm = list(), processing = list(), variant = list(), uniprot_id = NULL, description = NULL, reference = NULL, organism = NULL )
sequence |
Character scalar: Amino acid sequence string. |
site |
Named list of site annotations |
region |
Named list of region annotations |
ptm |
Named list of PTM annotations |
processing |
Named list of processing annotations |
variant |
Named list of variant annotations |
uniprot_id |
Optional character scalar: UniProt accession. |
description |
Optional character scalar: Protein description. |
reference |
Optional character scalar: Citation or URL. |
organism |
Optional character scalar: Species name. |
A3 object
EDG
# Minimal: sequence only a <- create_A3("MAEPRQEFEVMEDHAGTYGLGDRK") # With site, region, and PTM annotations a <- create_A3( "MAEPRQEFEVMEDHAGTYGLGDRK", site = list( Active_site = annotation_position(c(5L, 17L), type = "active site") ), region = list( Domain = annotation_range(rbind(c(3L, 10L), c(15L, 22L)), type = "repeat" ) ), ptm = list( Phosphorylation = annotation_position(c(2L, 18L), type = "phosphoserine") ), uniprot_id = "P10636", organism = "Homo sapiens" )# Minimal: sequence only a <- create_A3("MAEPRQEFEVMEDHAGTYGLGDRK") # With site, region, and PTM annotations a <- create_A3( "MAEPRQEFEVMEDHAGTYGLGDRK", site = list( Active_site = annotation_position(c(5L, 17L), type = "active site") ), region = list( Domain = annotation_range(rbind(c(3L, 10L), c(15L, 22L)), type = "repeat" ) ), ptm = list( Phosphorylation = annotation_position(c(2L, 18L), type = "phosphoserine") ), uniprot_id = "P10636", organism = "Homo sapiens" )
Get the coding sequence of a gene
gene2sequence( gene, organism = "hsapiens", biomart = "ensembl", host = "https://www.ensembl.org", verbosity = 1L )gene2sequence( gene, organism = "hsapiens", biomart = "ensembl", host = "https://www.ensembl.org", verbosity = 1L )
gene |
Character vector: One or more HGNC gene symbols. |
organism |
Character scalar: Organism short name (Ensembl convention,
e.g. |
biomart |
Character scalar: BioMart name. |
host |
Character scalar: Host address. |
verbosity |
Integer: Verbosity level. |
data.frame with columns "gene", "ensembl_transcript_id" and "sequence".
EDG
# Requires internet connection and fetches data from Ensembl using biomaRt. ## Not run: mapt_seq <- gene2sequence("MAPT") ## End(Not run)# Requires internet connection and fetches data from Ensembl using biomaRt. ## Not run: mapt_seq <- gene2sequence("MAPT") ## End(Not run)
Get AlphaFold info for a given UniProt ID
get_alphafold(uniprotid)get_alphafold(uniprotid)
uniprotid |
Character: UniProt ID. |
data frame with AlphaFold info.
EDG
# Requires internet connection and fetches data from AlphaFold. ## Not run: get_alphafold("P10636") ## End(Not run)# Requires internet connection and fetches data from AlphaFold. ## Not run: get_alphafold("P10636") ## End(Not run)
Queries the PDBe APIs for secondary structure and ligand binding sites from
experimental PDB structures, converts residue numbers to UniProt canonical
coordinates via SIFTS, and returns named lists of A3Region and A3Site
objects ready for use with create_A3().
pdb_annotations( accession, pdb_id = NULL, pdbe_graph_url = "https://www.ebi.ac.uk/pdbe/graph-api", pdbe_api_url = "https://www.ebi.ac.uk/pdbe/api", verbosity = 1L )pdb_annotations( accession, pdb_id = NULL, pdbe_graph_url = "https://www.ebi.ac.uk/pdbe/graph-api", pdbe_api_url = "https://www.ebi.ac.uk/pdbe/api", verbosity = 1L )
accession |
Character scalar: UniProt accession, e.g. |
pdb_id |
Optional character scalar: Four-character PDB ID (e.g. |
pdbe_graph_url |
Character scalar: PDBe Graph API base URL. |
pdbe_api_url |
Character scalar: PDBe REST API base URL. |
verbosity |
Integer scalar: Verbosity level. |
Structure ranking and SIFTS residue mappings come from the PDBe Graph API
best_structures endpoint. Secondary structure and binding sites are fetched
from the PDBe REST API.
When pdb_id is NULL the top-ranked structure (by PDBe coverage score) is
used automatically.
Named list with two elements:
regionNamed list of A3Region objects for secondary structure
(types: "helix", "betaStrand", "turn").
siteNamed list of A3Site objects for ligand binding sites
(type: "bindingSite").
EDG
# Requires internet connection and fetches data from PDBe. ## Not run: ann <- pdb_annotations("P10636") mapt <- create_A3( sequence = uniprot_sequence("P10636"), region = ann[["region"]], site = ann[["site"]] ) ## End(Not run)# Requires internet connection and fetches data from PDBe. ## Not run: ann <- pdb_annotations("P10636") mapt <- create_A3( sequence = uniprot_sequence("P10636"), region = ann[["region"]], site = ann[["site"]] ) ## End(Not run)
A3 object from JSON fileRead A3 object from JSON file
read_A3json(filepath, verbosity = 1L)read_A3json(filepath, verbosity = 1L)
filepath |
Character: Path to JSON file. |
verbosity |
Integer: if greater than 0, print messages. |
A3 object.
EDG
a3 <- create_A3("MAEPRQEFEVMEDHAGTYGLGDRK", uniprot_id = "P10636") tmp <- tempfile(fileext = ".json") write_A3json(a3, tmp) a3_read <- read_A3json(tmp, verbosity = 0L) unlink(tmp)a3 <- create_A3("MAEPRQEFEVMEDHAGTYGLGDRK", uniprot_id = "P10636") tmp <- tempfile(fileext = ".json") write_A3json(a3, tmp) a3_read <- read_A3json(tmp, verbosity = 0L) unlink(tmp)
Lightweight FASTA-only fetch. Use this when you only need the amino acid
sequence. For full annotations and metadata use uniprot_to_A3().
uniprot_sequence( accession, base_url = "https://rest.uniprot.org/uniprotkb", verbosity = 1L )uniprot_sequence( accession, base_url = "https://rest.uniprot.org/uniprotkb", verbosity = 1L )
accession |
Character scalar: UniProt accession, e.g. |
base_url |
Character scalar: UniProt REST API base URL. |
verbosity |
Integer scalar: Verbosity level. |
Character scalar: amino acid sequence in single-letter code.
EDG
# Requires internet connection and fetches data from UniProt. ## Not run: seq <- uniprot_sequence("P10636") ## End(Not run)# Requires internet connection and fetches data from UniProt. ## Not run: seq <- uniprot_sequence("P10636") ## End(Not run)
Fetches the UniProt JSON for the given accession via the UniProt REST API,
maps all sequence annotations to the A3 schema (see specs/uniprot.md), and
returns a fully populated A3 object.
uniprot_to_A3( accession, base_url = "https://rest.uniprot.org/uniprotkb", verbosity = 1L )uniprot_to_A3( accession, base_url = "https://rest.uniprot.org/uniprotkb", verbosity = 1L )
accession |
Character scalar: UniProt accession, e.g. |
base_url |
Character scalar: UniProt REST API base URL. |
verbosity |
Integer scalar: Verbosity level. |
Feature types not mapped to A3 (mutagenesis, alternative sequences, sequence conflicts, etc.) are silently ignored.
A3 object with sequence, annotations, and UniProt metadata.
EDG
# Requires internet connection and fetches data from UniProt. ## Not run: mapt <- uniprot_to_A3("P10636") ## End(Not run)# Requires internet connection and fetches data from UniProt. ## Not run: mapt <- uniprot_to_A3("P10636") ## End(Not run)
A3 object to JSON fileWrite A3 object to JSON file
write_A3json(x, filepath, overwrite = FALSE)write_A3json(x, filepath, overwrite = FALSE)
x |
|
filepath |
Character: Path to save JSON file. |
overwrite |
Logical: If TRUE, overwrite existing file. |
Invisible x. Writes JSON file as side effect.
EDG
a3 <- create_A3("MAEPRQEFEVMEDHAGTYGLGDRK", uniprot_id = "P10636") tmp <- tempfile(fileext = ".json") write_A3json(a3, tmp) a3_read <- read_A3json(tmp) unlink(tmp)a3 <- create_A3("MAEPRQEFEVMEDHAGTYGLGDRK", uniprot_id = "P10636") tmp <- tempfile(fileext = ".json") write_A3json(a3, tmp) a3_read <- read_A3json(tmp) unlink(tmp)