Genes
This module is a wrapper around Gene database provided from NCBI. It exposes a simple interface for working with genes in Python. Additionally, it provides a way to map any (almost) kind of gene identifier to its corresponding Entrez Id.
Usage
from orangecontrib.bioinformatics.ncbi.gene import GeneMatcher
# Notice that we have symbols, synonyms and Ensembel ID here
genes_of_interest = ['CD4', 'ENSG00000205426', "2'-PDE", 'HB-1Y']
# Initialize GeneMatcher. Human is our organism of interest.
gm = GeneMatcher('9606')
# this will automatically start the process of name matching
gm.genes = genes_of_interest
# print results
for gene, gene_obj in zip(genes_of_interest, gm.genes):
print(f"{gene:<20} {gene_obj}")
We are lucky all of the gene names have a unique match in the Gene database. That’s great!
CD4 <Gene symbol=CD4, tax_id=9606, gene_id=920>
ENSG00000205426 <Gene symbol=KRT81, tax_id=9606, gene_id=3887>
2'-PDE <Gene symbol=PDE12, tax_id=9606, gene_id=201626>
HB-1Y <Gene symbol=HMHB1, tax_id=9606, gene_id=57824>
Now that we have identified our genes, we can explore further. Genes get automatically populated with additional information from the NCBI database.
g = gm.genes[0]
print(g.synonyms)
['CD4mut']
print(g.db_refs)
{'MIM': '186940', 'HGNC': 'HGNC:1678', 'Ensembl': 'ENSG00000010610'}
print(g.type_of_gene)
protein-coding
print(g.description)
CD4 molecule
# look at all the available Gene attributes
print(g.__slots__)
('species', 'tax_id', 'gene_id', 'symbol', 'synonyms', 'db_refs', 'description', 'locus_tag', 'chromosome',
'map_location', 'type_of_gene', 'symbol_from_nomenclature_authority', 'full_name_from_nomenclature_authority',
'nomenclature_status', 'other_designations', 'modification_date', 'homology_group_id',
'homologs', 'input_identifier')
We can also access homologs directly from Gene interface:
print(g.homologs)
{'9913': '407098', '10090': '12504', '10116': '24932'}
print(g.homology_group_id)
'513'
# Find homolog in mouse.
print(g.homolog_gene(taxonomy_id='10090'))
'12504'
Class References
- class Gene[source]
Representation of gene summary.
- __init__(input_identifier=None)[source]
If we want to match gene to it’s corresponding Entrez ID we must, upon class initialization, provide some input identifier. This way
GeneMatcher
will know what to match it against in Gene Database.- Parameters
input_identifier (str) – This can be any of the following: symbol, synonym, locus tag, other database id, …
- class GeneMatcher[source]
Gene name matching interface.
- __init__(tax_id, progress_callback=None, auto_start=True)[source]
- Parameters
tax_id: – str: Taxonomy id of target organism.
- get_known_genes()[source]
Return Genes with known Entrez ID
- Returns
Genes with unique match
- Return type
list
ofGene
instances
- match_table_attributes(data_table, run=True, rename=False, source_name='Source ID')[source]
Helper function for gene name matching with
Orange.data.Table
.Match table attributes and if a unique match is found create a new column attribute for Entrez Id. Attribute name is defined here: orangecontrib.bioinformatics.ncbi.gene.config.NCBI_ID
- Parameters
data_table (
Orange.data.Table
) – Data table- Returns
Data table column attributes are populated with Entrez Ids
- Return type
Orange.data.Table
- match_table_column(data_table, column_name, target_column=None)[source]
Helper function for gene name matching with
Orange.data.Table
.Give a column of genes, GeneMatcher will try to map genes to their corresponding Entrez Ids.
- Parameters
data_table (
Orange.data.Table
) – Data tablecolumn_name (str) – Name of the column where gene symbols are located
target_column (
StringVariable
) – Column where we store Entrez Ids. Defaults to StringVariable(ncbi.gene.config.NCBI_ID)
- Returns
Data table with a column of Gene Ids
- Return type
Orange.data.Table
- to_data_table(selected_genes=None)[source]
Transform GeneMatcher results to Orange data table.
Optionally we can provide a list of genes (Entrez Ids). The table on the output will be populated only with provided genes.
- Parameters
selected_genes (list) – List of Entrez Ids
- Returns
Summary of Gene info in tabular format
- Return type
Orange.data.Table