Skip to main content

Genomic Analysis Tool Box

Cancer genomes frequently harbor high mutation rates, complex structural variations, and are characterized by genome instability. Detailed understanding of these characteristics of the cancer genome is necessary for pinpointing the cause of tumor growth and for developing therapeutic interventions. The research group led by Dr. Peter Park at Harvard Medical School has developed a collection of bioinformatic and computational tools to examine genomic abnormalities in cancer and other diseases. Aided by the rapid accumulation of tumor whole-genome sequencing data, these tools offer opportunities to efficiently identify pathogenic pathways from a single sequencing dataset, leading to comprehensive understanding of individual tumor samples, as well as potential novel targets that may accelerate therapeutic breakthroughs.




TEA (Transposable Element Analyzer) 

More than 40% of the human genome is derived from transposable elements (TEs). Retrotransposons, a class of TEs, are capable of mutagenesis due to their ability to “copy and paste” themselves across the genome via RNA intermediates. Due to the short reads of whole-genome sequencing data and the redundant nature of retrotransposons, retrotransposition events can be challenging to detect. With an advanced algorithm, TEA teases out transposition events at single-nucleotide resolution from high-coverage whole-genome sequencing data. TE insertions are a prevalent class of mutagenic events that are seldom explored. TEA, therefore, enables researchers to examine the genetic basis of the pathogenesis of individual tumors.

Meerkat (Complex Structural Variations) 

Structural variations in the genome such as deletions, insertions, and translocations underlie disease progression. However, these variations are difficult to detect using SNP arrays or low-coverage whole-genome sequencing. The algorithm developed by Dr. Park’s group, Meerkat, takes advantage of the reads from whole-genome sequencing data to identify complex structural variations. The types of structural variations allow for the inference of causative mechanisms, and therefore uncovering information about both the genomic landscape and the underlying pathogenic pathways. This unique capability of Meerkat allows for resource prioritization over high-confidence, focused therapeutic candidates.

MSI Analysis (Microsatellite Instability) 

Microsatellite instability (MSI) results in high length polymorphism in tandem repeats in the genome, and is indicative of mutations in the mismatch repair (MMR) pathway. Conventional assays for MSI are biased and time-consuming. The algorithm developed by Dr. Park’s group analyzes MSI using an efficient and unbiased approach. When combined with transcriptome or epigenomic data, the MSI analysis algorithm uncovers differential gene expression of polymorphic alleles and associated epigenetic characteristics in specific cancer types. The wealth of information uncovered using MSI analysis has significant diagnostic, prognostic, and therapeutic values.