Gene sets
This module can load either gene sets distributed with Orange or custom gene sets in the GMT file format.
Loading gene sets
- list_all(**kwargs)[source]
Returns available gene sets from the server files repository.
- Parameters
kwargs –
organism (
str
) – Taxonomy id (NCBI taxonomy database)
- Return type
list
of (hierarchy, organism)
Example
- The available gene set collection can be listed with:
>>> list_all(organism='10090')
- load_gene_sets(hierarchy, tax_id)[source]
Initialize gene sets from a given hierarchy.
- Parameters
hierarchy (tuple) – gene set hierarchy.
- Return type
Example
- Gene sets provided with Orange are organized hierarchically:
>>> list_of_genesets= list_all(organism='10090') [(('KEGG', 'Pathways'), '10090'), (('KEGG', 'pathways'), '10090'), (('GO', 'biological_process'), '10090'), (('GO', 'molecular_function'), '10090'), (('GO', 'cellular_component'), '10090')] >>> load_gene_sets(list_of_genesets[0])
Supporting functionality
- class GeneSets(sets=None)[source]
Bases:
set
A collection of gene sets: contains
GeneSet
objects.- static from_gmt_file_format(file_path)[source]
Load GeneSets object from GMT file.
- Parameters
file_path – path to a file on local disk
- Return type
- to_gmt_file_format(file_path)[source]
The GMT file format is a tab delimited file format that describes gene sets.
In the GMT format, each row represents a gene set. Columns: gs_id gmt_description Gene Gene Gene … gmt_description: ‘gs_id’,’hierarchy’,’organism’,’name’,’genes’,’description’,’link’
- Parameters
file_path – Path to where file will be created
Helper functions to work with serverfiles
- filename(hierarchy, organism)[source]
Obtain a filename for given hierarchy and organism.
- Parameters
hierarchy – GeneSet hierarchy, example: (‘GO’, ‘biological_process’)
organism – Taxonomy ID
- Returns
Filename for given hierarchy and organism
Example
>>> filename(('CustomSet', 'subsets'), '6500') 'CustomSet-subsets-6500.gmt'