Gene sets

This module can load either gene sets distributed with Orange or custom gene sets in the GMT file format.

Loading gene sets

list_all(**kwargs)[source]

Returns available gene sets from the server files repository.

Parameters

kwargs

  • organism (str) – Taxonomy id (NCBI taxonomy database)

Return type

list of (hierarchy, organism)

Example

The available gene set collection can be listed with:
>>> list_all(organism='10090')
load_gene_sets(hierarchy, tax_id)[source]

Initialize gene sets from a given hierarchy.

Parameters

hierarchy (tuple) – gene set hierarchy.

Return type

GeneSets

Example

Gene sets provided with Orange are organized hierarchically:
>>> list_of_genesets= list_all(organism='10090')
    [(('KEGG', 'Pathways'), '10090'),
     (('KEGG', 'pathways'), '10090'),
     (('GO', 'biological_process'), '10090'),
     (('GO', 'molecular_function'), '10090'),
     (('GO', 'cellular_component'), '10090')]
>>> load_gene_sets(list_of_genesets[0])

Supporting functionality

class GeneSets(sets=None)[source]

Bases: set

A collection of gene sets: contains GeneSet objects.

common_hierarchy()[source]

Return a common hierarchy.

common_org()[source]

Return a common organism.

static from_gmt_file_format(file_path)[source]

Load GeneSets object from GMT file.

Parameters

file_path – path to a file on local disk

Return type

GeneSets

genes()[source]
Returns

All genes from GeneSets

hierarchies()[source]

Return all hierarchies.

split_by_hierarchy()[source]

Split gene sets by hierarchies. Return a list of GeneSets objects.

to_gmt_file_format(file_path)[source]

The GMT file format is a tab delimited file format that describes gene sets.

In the GMT format, each row represents a gene set. Columns: gs_id gmt_description Gene Gene Gene … gmt_description: ‘gs_id’,’hierarchy’,’organism’,’name’,’genes’,’description’,’link’

Parameters

file_path – Path to where file will be created

update(sets)[source]

Update a set with the union of itself and others.

class GeneSet(gs_id=None, hierarchy=None, organism=None, name=None, genes=None, description=None, link=None)[source]
gmt_description()[source]

Represent GeneSet as line in GMT file format

Returns

Comma-separated GeneSet attributes.

set_enrichment(reference, query)[source]
Parameters
  • reference

  • query

Helper functions to work with serverfiles

filename(hierarchy, organism)[source]

Obtain a filename for given hierarchy and organism.

Parameters
  • hierarchy – GeneSet hierarchy, example: (‘GO’, ‘biological_process’)

  • organism – Taxonomy ID

Returns

Filename for given hierarchy and organism

Example

>>> filename(('CustomSet', 'subsets'), '6500')
'CustomSet-subsets-6500.gmt'
filename_parse(fn)[source]

Returns a hierarchy and the organism from the gene set filename format.

Parameters

fn – GeneSets file name (.gmt)

Returns

A hierarchy and taxonomy id for given filename

Example

>>> filename_parse('Custom-set-6500.gmt')
(('Custom', 'set'), '6500')