Genes¶
This module is a wrapper around Gene database provided from NCBI. It exposes a simple interface for working with genes in Python. Additionally, it provides a way to map any (almost) kind of gene identifier to its corresponding Entrez Id.
Usage¶
from orangecontrib.bioinformatics.ncbi.gene import GeneMatcher
# Notice that we have symbols, synonyms and Ensembel ID here
genes_of_interest = ['CD4', 'ENSG00000205426', "2'-PDE", 'HB-1Y']
# Initialize GeneMatcher. Human is our organism of interest.
gm = GeneMatcher('9606')
# this will automatically start the process of name matching
gm.genes = genes_of_interest
# print results
for gene, gene_obj in zip(genes_of_interest, gm.genes):
print(f"{gene:<20} {gene_obj}")
We are lucky all of the gene names have a unique match in the Gene database. That’s great!
CD4 <Gene symbol=CD4, tax_id=9606, gene_id=920>
ENSG00000205426 <Gene symbol=KRT81, tax_id=9606, gene_id=3887>
2'-PDE <Gene symbol=PDE12, tax_id=9606, gene_id=201626>
HB-1Y <Gene symbol=HMHB1, tax_id=9606, gene_id=57824>
Now that we have identified our genes, we can explore further. Genes get automatically populated with additional information from the NCBI database.
g = gm.genes[0]
print(g.synonyms)
['CD4mut']
print(g.db_refs)
{'MIM': '186940', 'HGNC': 'HGNC:1678', 'Ensembl': 'ENSG00000010610'}
print(g.type_of_gene)
protein-coding
print(g.description)
CD4 molecule
# look at all the available Gene attributes
print(g.__slots__)
('species', 'tax_id', 'gene_id', 'symbol', 'synonyms', 'db_refs', 'description', 'locus_tag', 'chromosome',
'map_location', 'type_of_gene', 'symbol_from_nomenclature_authority', 'full_name_from_nomenclature_authority',
'nomenclature_status', 'other_designations', 'modification_date', 'homology_group_id',
'homologs', 'input_identifier')
We can also access homologs directly from Gene interface:
print(g.homologs)
{'9913': '407098', '10090': '12504', '10116': '24932'}
print(g.homology_group_id)
'513'
# Find homolog in mouse.
print(g.homolog_gene(taxonomy_id='10090'))
'12504'