KEGG - Kyoto Encyclopedia of Genes and Genomes

KEGG - Kyoto Encyclopedia of Genes and Genomes

kegg is a python module for accessing KEGG (Kyoto Encyclopedia of Genes and Genomes) using its web services.

Note

This module requires slumber and requests packages.

>>> # Create a KEGG Genes database interface
>>> genome = KEGGGenome()
>>> # List all available entry ids
>>> keys = list(genome.keys())
>>> print(keys[0])
T01001
>>> # Retrieve the entry for the key.
>>> entry = genome[keys[0]]
>>> print(entry.entry_key)
T01001
>>> print(entry.definition)
Homo sapiens (human)
>>> print(entry)  
ENTRY       T01001            Complete  Genome
NAME        hsa, HUMAN, 9606
DEFINITION  Homo sapiens (human)
...

The Organism class can be a convenient starting point for organism specific databases.

>>> organism = Organism("Homo sapiens")  # searches for the organism by name
>>> print(organism.org_code)  # prints the KEGG organism code
hsa
>>> genes = organism.genes  # get the genes database for the organism
>>> gene_ids = list(genes.keys()) # KEGG gene identifiers
>>> entry = genes["hsa:672"]
>>> print(entry.definition) 
(RefSeq) BRCA1, DNA repair associated
>>> # print the entry in DBGET database format.
>>> print(entry) 
ENTRY       672               CDS       T01001
NAME        BRCA1, BRCAI, BRCC1, BROVCA1, FANCS, IRIS, PNCA4, PPP1R53, PSCP, RNF53
DEFINITION  ...
class Organism(org)[source]

A convenience class for retrieving information regarding an organism in the KEGG Genes database.

Parameters

org (str) – KEGG organism code (e.g. “hsa”, “sce”). Can also be a descriptive name (e.g. ‘yeast’, “homo sapiens”) in which case the organism code will be searched for by using KEGG find api.

See also

organism_name_search

Search KEGG for an organism code

property org

KEGG organism code.

property genes

An Genes database instance for this organism.

gene_aliases()[source]

Return a list of sets of equal genes (synonyms) in KEGG for this organism.

Note

This only includes ‘ncbi-geneid’ and ‘ncbi-proteinid’ records from the KEGG Genes DBLINKS entries.

pathways(with_ids=None)[source]

Return a list of all pathways for this organism.

list_pathways()[source]

List all pathways for this organism.

get_enriched_pathways(genes, reference=None, prob=<orangecontrib.bioinformatics.utils.statistics.Binomial object>, callback=None)[source]

Return a dictionary with enriched pathways ids as keys and (list_of_genes, p_value, num_of_reference_genes) tuples as items.

get_pathways_by_genes(gene_ids)[source]

Pathways that include all genes in gene_ids.

KEGGOrganism

alias of orangecontrib.bioinformatics.kegg.Organism

Search for a organism by name and return it’s KEGG organism code.

pathways(org)[source]

Return a list of all KEGG pathways for an KEGG organism code org.

from_taxid(taxid)[source]

Return a KEGG organism code for a an NCBI Taxonomy id string taxid.

to_taxid(name)[source]

Return a NCBI Taxonomy id for a given KEGG Organism name

DBEntry (entry)

The entry.DBEntry represents a DBGET databas entry. The individual KEGG Database interfaces below provide their own specialization for this base class.

class DBEntry(text=None)[source]

Bases: object

A DBGET entry object.

property entry_key

Primary entry key used for identifying the entry.

parse(text)[source]

Parse text string containing a formated DBGET entry.

format(section_indent=12)[source]

Return a DBGET formated string representation.

KEGG Databases interface (databases)

class DBDataBase(**kwargs)[source]

Bases: object

Base class for a DBGET database interface.

ENTRY_TYPE

alias of orangecontrib.bioinformatics.kegg.entry.DBEntry

DB = None

A database name/abbreviation (e.g. ‘pathway’). Needs to be set in a subclass or object instance’s constructor before calling the base. __init__

iterkeys()[source]

Return an iterator over the keys.

iteritems()[source]

Return an iterator over the items.

itervalues()[source]

Return an iterator over all DBDataBase.ENTRY_TYPE instances.

keys()[source]

Return an iterator over all database keys. These are unique KEGG identifiers that can be used to query the database.

values()[source]

Return an iterator over all DBDataBase.ENTRY_TYPE instances.

items()[source]

Return an iterator over all (key, DBDataBase.ENTRY_TYPE) tuples.

get(key, default=None)[source]

Return an DBDataBase.ENTRY_TYPE instance for the key. Raises KeyError if not found.

get_text(key)[source]

Return the database entry for key as plain text.

get_entry(key)[source]

Return the database entry for key as an instance of ENTRY_TYPE.

find(name)[source]

Find name using kegg find api.

pre_cache(keys=None, batch_size=10, progress_callback=None)[source]

Retrieve all the entries for keys and cache them locally for faster subsequent retrieval. If keys is None then all entries will be retrieved.

batch_get(keys)[source]

Batch retrieve all entries for keys. This can be significantly faster then getting each entry separately especially if entries are not yet cached.

class GenomeEntry(text)[source]

Bases: orangecontrib.bioinformatics.kegg.entry.DBEntry

Entry for a KEGG Genome database.

property organism_code

A three or four letter KEGG organism code (e.g. ‘hsa’, ‘sce’, …)

property taxid

Organism NCBI taxonomy id.

property annotation

ANNOTATION

property chromosome

CHROMOSOME

property comment

COMMENT

property data_source

DATA_SOURCE

property definition

DEFINITION

property disease

DISEASE

property entry

ENTRY

property keywords

KEYWORDS

property name

NAME

property original_db

ORIGINAL_DB

property plasmid

PLASMID

property reference

REFERENCE

property statistics

STATISTICS

property taxonomy

TAXONOMY

class Genome[source]

Bases: orangecontrib.bioinformatics.kegg.databases.DBDataBase

An interface to the A KEGG GENOME database.

ENTRY_TYPE

alias of GenomeEntry

org_code_to_entry_key(code)[source]

Map an organism code (‘hsa’, ‘sce’, …) to the corresponding kegg identifier (T + 5 digit number).

search(string, relevance=False)[source]

Search the genome database for string using bfind.

class GeneEntry(text=None)[source]

Bases: orangecontrib.bioinformatics.kegg.entry.DBEntry

property aaseq

AASEQ

property brite

BRITE

property class_

CLASS

DBLINKS

property definition

DEFINITION

property disease

DISEASE

property drug_target

DRUG_TARGET

property entry

ENTRY

property module

MODULE

property motif

MOTIF

property name

NAME

property ntseq

NTSEQ

property organism

ORGANISM

property orthology

ORTHOLOGY

property pathway

PATHWAY

property position

POSITION

property structure

STRUCTURE

class Genes(org_code)[source]

Bases: orangecontrib.bioinformatics.kegg.databases.DBDataBase

Interface to the KEGG Genes database.

Parameters

org_code (str) – KEGG organism code (e.g. ‘hsa’).

ENTRY_TYPE

alias of GeneEntry

class CompoundEntry(text=None)[source]

Bases: orangecontrib.bioinformatics.kegg.entry.DBEntry

property atom

ATOM

property bond

BOND

property brite

BRITE

property comment

COMMENT

DBLINKS

property entry

ENTRY

property enzyme

ENZYME

property exact_mass

EXACT_MASS

property formula

FORMULA

property mol_weight

MOL_WEIGHT

property name

NAME

property pathway

PATHWAY

property reaction

REACTION

property reference

REFERENCE

property remark

REMARK

class Compound[source]

Bases: orangecontrib.bioinformatics.kegg.databases.DBDataBase

ENTRY_TYPE

alias of CompoundEntry

class ReactionEntry(text=None)[source]

Bases: orangecontrib.bioinformatics.kegg.entry.DBEntry

property definition

DEFINITION

property entry

ENTRY

property enzyme

ENZYME

property equation

EQUATION

property name

NAME

class Reaction[source]

Bases: orangecontrib.bioinformatics.kegg.databases.DBDataBase

ENTRY_TYPE

alias of ReactionEntry

class EnzymeEntry(text=None)[source]

Bases: orangecontrib.bioinformatics.kegg.entry.DBEntry

property all_reac

ALL_REAC

property class_

CLASS

property comment

COMMENT

DBLINKS

property entry

ENTRY

property genes

GENES

property name

NAME

property orthology

ORTHOLOGY

property pathway

PATHWAY

property product

PRODUCT

property reaction

REACTION

property reference

REFERENCE

property substrate

SUBSTRATE

property sysname

SYSNAME

class Enzyme[source]

Bases: orangecontrib.bioinformatics.kegg.databases.DBDataBase

ENTRY_TYPE

alias of EnzymeEntry

class PathwayEntry(text=None)[source]

Bases: orangecontrib.bioinformatics.kegg.entry.DBEntry

property class_

CLASS

property compound

COMPOUND

DBLINKS

property description

DESCRIPTION

property disease

DISEASE

property drug

DRUG

property entry

ENTRY

property enzyme

ENZYME

property ko_pathway

KO_PATHWAY

property module

MODULE

property name

NAME

property organism

ORGANISM

property pathway_map

PATHWAY_MAP

property reference

REFERENCE

property rel_pathway

REL_PATHWAY

class Pathway(prefix='map')[source]

Bases: orangecontrib.bioinformatics.kegg.databases.DBDataBase

KEGG Pathway database

Parameters

prefix (str) – KEGG Organism code (‘hsa’, …) or ‘map’, ‘ko’, ‘ec’ or ‘rn’

ENTRY_TYPE

alias of PathwayEntry

KEGG Pathway (pathway)

class Pathway(pathway_id, local_cache=None, connection=None)[source]

Bases: object

Class representing a KEGG Pathway (parsed from a “kgml” file)

Parameters

pathway_id (str) – A KEGG pathway id (e.g. ‘path:hsa05130’)

property name

hsa05130”)

Type

Pathway name/id (e.g. “path

property org

Pathway organism code (e.g. ‘hsa’)

property number

Pathway number as a string (e.g. ‘05130’)

property title

Pathway title string.

property image

URL of the pathway image.

URL to a pathway on the KEGG web site.

get_image()[source]

Return an local filesystem path to an image of the pathway. The image will be downloaded if not already cached.

classmethod list(organism)[source]

List all pathways for KEGG organism code organism.

Utilities

class DBGETEntryParser[source]

A DBGET entry parser (inspired by xml.dom.pulldom).

Example

>>> stream = StringIO(
...     "ENTRY foo\n"
...     "NAME  foo's name\n"
...     "  BAR A subsection of 'NAME'\n"
... )
>>> parser = DBGETEntryParser()
>>> for event, title, contents_part in parser.parse(stream):
...    print(parser.EVENTS[event], title, repr(contents_part))
...
ENTRY_START None None
SECTION_START ENTRY 'foo\n'
SECTION_END ENTRY None
SECTION_START NAME "foo's name\n"
SUBSECTION_START BAR "A subsection of 'NAME'\n"
SUBSECTION_END BAR None
SECTION_END NAME None
ENTRY_END None None
ENTRY_END = 1

Entry end event

ENTRY_START = 0

Entry start events

SECTION_END = 3

Section end event

SECTION_START = 2

Section start event

SUBSECTION_END = 5

Subsection end event

SUBSECTION_START = 4

Subsection start event

TEXT = 6

Text element event