Genes
This module is a wrapper around Gene database provided from NCBI. It exposes a simple interface for working with genes in Python. Additionally, it provides a way to map any (almost) kind of gene identifier to its corresponding Entrez Id.
Usage
from orangecontrib.bioinformatics.ncbi.gene import GeneMatcher
# Notice that we have symbols, synonyms and Ensembel ID here
genes_of_interest = ['CD4', 'ENSG00000205426', "2'-PDE", 'HB-1Y']
# Initialize GeneMatcher. Human is our organism of interest.
gm = GeneMatcher('9606')
# this will automatically start the process of name matching
gm.genes = genes_of_interest
# print results
for gene, gene_obj in zip(genes_of_interest, gm.genes):
print(f"{gene:<20} {gene_obj}")
We are lucky all of the gene names have a unique match in the Gene database. That’s great!
CD4 <Gene symbol=CD4, tax_id=9606, gene_id=920>
ENSG00000205426 <Gene symbol=KRT81, tax_id=9606, gene_id=3887>
2'-PDE <Gene symbol=PDE12, tax_id=9606, gene_id=201626>
HB-1Y <Gene symbol=HMHB1, tax_id=9606, gene_id=57824>
Now that we have identified our genes, we can explore further. Genes get automatically populated with additional information from the NCBI database.
g = gm.genes[0]
print(g.synonyms)
['CD4mut']
print(g.db_refs)
{'MIM': '186940', 'HGNC': 'HGNC:1678', 'Ensembl': 'ENSG00000010610'}
print(g.type_of_gene)
protein-coding
print(g.description)
CD4 molecule
# look at all the available Gene attributes
print(g.__slots__)
('species', 'tax_id', 'gene_id', 'symbol', 'synonyms', 'db_refs', 'description', 'locus_tag', 'chromosome',
'map_location', 'type_of_gene', 'symbol_from_nomenclature_authority', 'full_name_from_nomenclature_authority',
'nomenclature_status', 'other_designations', 'modification_date', 'homology_group_id',
'homologs', 'input_identifier')
We can also access homologs directly from Gene interface:
print(g.homologs)
{'9913': '407098', '10090': '12504', '10116': '24932'}
print(g.homology_group_id)
'513'
# Find homolog in mouse.
print(g.homolog_gene(taxonomy_id='10090'))
'12504'
Class References
- class Gene[source]
Representation of gene summary.
- __init__(input_identifier=None)[source]
If we want to match gene to it’s corresponding Entrez ID we must, upon class initialization, provide some input identifier. This way
GeneMatcherwill know what to match it against in Gene Database.- Parameters
input_identifier (str) – This can be any of the following: symbol, synonym, locus tag, other database id, …
- class GeneMatcher[source]
Gene name matching interface.
- __init__(tax_id, progress_callback=None, auto_start=True)[source]
- Parameters
tax_id: – str: Taxonomy id of target organism.
- get_known_genes()[source]
Return Genes with known Entrez ID
- Returns
Genes with unique match
- Return type
listofGeneinstances
- match_table_attributes(data_table, run=True, rename=False, source_name='Source ID')[source]
Helper function for gene name matching with
Orange.data.Table.Match table attributes and if a unique match is found create a new column attribute for Entrez Id. Attribute name is defined here: orangecontrib.bioinformatics.ncbi.gene.config.NCBI_ID
- Parameters
data_table (
Orange.data.Table) – Data table- Returns
Data table column attributes are populated with Entrez Ids
- Return type
Orange.data.Table
- match_table_column(data_table, column_name, target_column=None)[source]
Helper function for gene name matching with
Orange.data.Table.Give a column of genes, GeneMatcher will try to map genes to their corresponding Entrez Ids.
- Parameters
data_table (
Orange.data.Table) – Data tablecolumn_name (str) – Name of the column where gene symbols are located
target_column (
StringVariable) – Column where we store Entrez Ids. Defaults to StringVariable(ncbi.gene.config.NCBI_ID)
- Returns
Data table with a column of Gene Ids
- Return type
Orange.data.Table
- to_data_table(selected_genes=None)[source]
Transform GeneMatcher results to Orange data table.
Optionally we can provide a list of genes (Entrez Ids). The table on the output will be populated only with provided genes.
- Parameters
selected_genes (list) – List of Entrez Ids
- Returns
Summary of Gene info in tabular format
- Return type
Orange.data.Table