Services

We provide NGS computational analysis for TCI-affiliated faculty members. The BiNGS staff advises on experimental design and protocols related to NGS, performs data analysis, and assists in data interpretation, integration, and visualization.

We offer two tiers of services:

1) Standard Computational Analysis

Step 1: Project design meeting. Upon receiving a project request we will schedule a meeting to discuss the details. We recommend having the initial meeting before you start experimenting so we can advise on best experimental practices, available technologies, data analysis and statistical considerations to ensure that the data generated will be meaningful and appropriate to address the scientific questions from a computational perspective. We also accept projects starting from NGS data that was already generated.

Step 2: Standard analysis. All data sets will go through a rigorous QC evaluation followed by alignment to the appropriate genome. Depending on data type, we will perform basic analysis that allows the researcher a first look at the results (e.g. for RNA-seq data we will generate differential expression tables, and plots to visualize data; for ChIP-seq data we will generate pileup files allowing visualization on a genome browser and call significant peaks). For each project, we will produce a report containing a complete and detailed description of the results and computational methods (e.g. QC reports, protocols used for read trimming/mapping, methods of read filtering, methods of alignment and peak calling, and data normalization techniques).

Step 3: Follow-up meeting (optional). Upon completion of data analysis, we will request a second meeting to discuss the results and perform additional analyses that are within the scope of our standard analysis. Should the investigator choose a more customized analysis upon completion of the standard analysis, they can request to do so as described below.

2) Customized Computational Analysis

Upon request, the BiNGS Shared Resource Facility will provide customized analyses. For such projects, the facility will assign a bioinformatician that will take a ‘deeper dive’ into the data working closely with the investigator to address specific hypotheses (e.g. clustering, dimensionality reduction, data integration with publicly available datasets, network analysis, enrichment analysis, and more sophisticated and customized data visualization). A payment structure for such projects will be evaluated on a case by case basis.

Acknowledgement

We asked that our work be acknowledged in publications and presentations supported by BiNGS. Please also consider including our bioinformaticians in the authors list in cases where they contributed significantly.

Please acknowledge us with the following statement:
The development of the Bioinformatics for Next Generation Sequencing (BiNGS) shared resource facility is partially supported by the NCI P30 (P30CA196521) Cancer Center support grant, the ISMMS Skin Biology and Disease Resource-based Center NIAMS P30 support grant (AR079200), and the Black Family Stem Cell Institute. This work was also supported in part through the computational resources and staff expertise provided by Scientific Computing at the Icahn School of Medicine at Mount Sinai and supported by the Clinical and Translational Science Awards (CTSA) grant UL1TR004419 from the National Center for Advancing Translational Sciences. Research reported in this paper was supported by the Office of Research Infrastructure of the National Institutes of Health under award number S10OD026880. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Epigenomics

Chromatin immunoprecipitation (ChIP) with massively parallel DNA sequencing is a method used to analyze protein interactions with DNA to identify the binding sites of DNA-associated proteins.

Standard analysis:

  • Evaluation of reads quality and alignment statistics
  • Samples normalization using internal controls or computational methods
  • A link to a UCSC genome browser session for all normalized datasets
  • Peak files for all significantly enriched regions
  • Assessment of sample similarity (PCA, hierarchical clustering)
  • Annotation of genomic distribution (e.g. promoters, gene bodies, etc)
  • Gene Set Enrichment Analysis (GSEA)
  • Gene-Ontology terms and pathway enrichment analysis
  • Motif discovery

Customized analysis:

  • Differential peaks across multiple samples
  • Data integration (e.g. RNA-seq, ATAC-seq)
  • Characterization of chromatin states
  • Enhancer and Super Enhancer identification and gene association
  • Identification of alternative promoters
  • Alignment to repetitive sequences and enrichment quantification
  • Data integration with publicly available resources (e.g. ENCODE, TCGA)
  • Publication quality figures

Assay for Transposase-Accessible Chromatin using sequencing is a technique used to assess genome-wide chromatin accessibility which is a strong indicator of the activities of functional DNA sequences.

Standard analysis:

  • Evaluation of reads quality and alignment statistics
  • Samples normalization using computational methods
  • A link to a UCSC genome browser session for all normalized datasets
  • Peak files for all significantly accessible regions
  • Assessment of sample similarity (PCA, hierarchical clustering)
  • Annotation of genomic distribution (e.g. promoters, gene bodies, etc)
  • Gene Set Enrichment Analysis (GSEA)
  • Gene-Ontology terms and pathway enrichment analysis
  • Motif discovery

Customized analysis:

  • Differential peaks across multiple samples
  • Quantification of differential accessibility across multiple samples
  • Foot printing analysis
  • Data integration (e.g. RNA-seq, ChIP-seq)
  • Association of intergenic accessible regions with genes
  • Data integration with publicly available resources (e.g. ENCODE, TCGA)
  • Publication quality figures

Cleavage Under Targets and Release Using Nuclease combines antibody-targeted controlled cleavage by micrococcal nuclease with massively parallel DNA sequencing to identify the binding sites of DNA-associated proteins. CUT&Tag-sequencing is an improvement over Cut&Run using the Tn5 for DNA tagmentation.

Standard analysis:

  • Evaluation of reads quality and alignment statistics
  • Samples normalization using internal controls or computational methods
  • A link to a UCSC genome browser session for all normalized datasets
  • Peak files for all significantly enriched regions
  • Assessment of sample similarity (PCA, hierarchical clustering)
  • Annotation of genomic distribution (e.g. promoters, gene bodies, etc)
  • Gene Set Enrichment Analysis (GSEA)
  • Gene-Ontology terms and pathway enrichment analysis
  • Motif discovery

Customized analysis:

  • Differential peaks across multiple samples
  • Data integration (e.g. RNA-seq, ATAC-seq)
  • Characterization of chromatin states
  • Enhancer and Super Enhancer identification and gene association
  • Identification of alternative promoters
  • Alignment to repetitive sequences and enrichment quantification
  • Data integration with publicly available resources (e.g. ENCODE, TCGA)
  • Publication quality figures

Hi-C is a method that uses high-throughput sequencing to find chromatin conformations in an all against all manner throughout the entire genome. Hi-ChIP combines Hi-C with ChIP-seq to detect all interactions mediated by a protein of interest.

Standard analysis:

  • Evaluation of reads quality and alignment statistics
  • A link to a UCSC genome browser session for all normalized datasets
  • Loop calls for the significant interactions
  • Compartment and TAD calls

Customized analysis:

  • Differential loops across multiple samples
  • Data integration (e.g. RNA-seq, ATAC-seq)
  • Loops associations with Enhancer, Super Enhancer and gene promoters
  • Publication quality figures

Single-cell chromatin accessibility sequencing has become a powerful technology for understanding genome-wide epigenetic regulatory landscape heterogeneity of complex tissue.

Standard analysis:

  • Evaluation of reads quality, transcription start site enrichment and genomic region distribution
  • Linear dimensionality reduction via LSI
  • Non-linear dimensionality reduction via UMAP and tSNE
  • Unsupervised clustering of cells via smart local moving algorithm
  • UCSC genome browser session for Tn5 insertion signals grouped by cell cluster
  • Diffusion maps
  • Annotating peak to genes
  • Identification of conserved and differential chromatin regions using logistic regression model
  • Motif enrichment analysis in differential accessible regions
  • Foot printing analysis
  • Predicting motif activity per cell cluster via chromVar
  • Quantifying gene activity using chromatin accessibility data
  • Differential expression and functional enrichment analysis via over-representation analysis for gene activity data
  • Cell type annotation

Customized analysis:

  • Trajectory analysis
  • Cis co-accessibility via Cicero
  • Data integration with scRNA-Seq or scMultiomics
  • Data integration with publicly available resources 
  • Publication quality figures

scMultiome sequencing allows simultaneous profiling of  gene expression and open chromatin from the same cell. Multiomic profiling of the transcriptome and epigenome at single cell resolution can transform our understanding of biology by enhancing the characterization of cell types and states, gaining deeper insights into underlying gene regulatory mechanisms with two readouts from every cell. scRNA-seq and scATAC-seq  components of the scMultiome data are first analyzed separately (see related services), and then integrated.

Standard analysis:

  • Joined non-linear dimensionality reduction via UMAP and tSNE
  • Joined unsupervised clustering of cells via Louvain algorithm
  • Cell cycle scoring
  • Diffusion maps
  • Identification of conserved and differential genes via Wilcox and ROC methods on joined cluster
  • Functional enrichment via Gene Set Enrichment Analysis (GSEA)
  • Gene-Ontology terms and pathway enrichment analysis
  • scRNA-Seq dataset integration
  • Differential accessibility analysis on joined cluster using logistic regression model and motif enrichment test in differential accessible regions
  • Motif activity analysis on joined cluster via chromVar
  • Foot printing plot for selected transcription factor
  • Peaks to genes association

Customized analysis:

  • Integration with published dataset
  • Trajectory analysis
  • Publication quality figures

Transcriptomics

RNA-Seq is a technology-based sequencing technique which uses next-generation sequencing to reveal the presence and quantity of RNA in a biological sample at a given moment, analyzing the continuously changing cellular transcriptome. RNA-seq provides normalized expression levels and allows the identification of differential gene expression patterns between samples, alternative splicing events and usage of alternative promoters.

Standard analysis:

  • Evaluation of reads quality and alignment statistics
  • Samples normalization using internal controls or computational methods
  • Normalized read counts
  • Assessment of sample similarity (PCA, hierarchical clustering)
  • Differential gene expression
  • Gene Set Enrichment Analysis (GSEA)
  • Gene-Ontology terms and pathway enrichment analysis
  • Motif discovery
  • A link to a UCSC genome browser session for all normalized datasets

Customized analysis:

  • Gene expression modules
  • Data integration (e.g. ATAC-seq, ChIP-seq)
  • Data integration with publicly available resources (e.g. ENCODE, TCGA)
  • Publication quality figures

Alternative splicing is a process that enables mRNA to direct synthesis of different isoforms that may have different cellular functions or properties.

Standard analysis:

  • Evaluation of reads quality and alignment statistics
  • Samples normalization using internal controls or computational methods
  • A link to a UCSC genome browser session for all normalized datasets
  • rMATS output report
  • Quantification of differential splicing events (SE, RI, A5’SS, A3’SS and MXE)
  • Gene Set Enrichment Analysis (GSEA)
  • Gene-Ontology term and pathway enrichment analysis

Customized analysis:

  • Shapiro plots of 55’SS strength and motif
  • Publication quality figures

Alternative promoters are one of the main transcriptional regulatory mechanisms that play a central role in determining the set of expressed transcripts as well as their expression levels in a cell.

Standard analysis:

  • Evaluation of reads quality and alignment statistics
  • Samples normalization using computational methods
  • A link to a UCSC genome browser session for all normalized datasets
  • Assessment of sample similarity (PCA, hierarchical clustering)
  • Heatmap of promoter activity estimates
  • Identification of alternative promoter usage across conditions
  • Gene Set Enrichment Analysis (GSEA)
  • Gene-Ontology terms and pathway enrichment analysis

Customized analysis:

  • Gene expression modules
  • Motif discovery
  • Data integration (e.g. ChIP-seq, ATAC-seq)
  • Data integration with publicly available resources (e.g. ENCODE, TCGA)
  • Publication quality figures

With the advent of next-generation sequencing, transcriptomic characterization of patients’ cohorts has become increasingly valuable. Landmark cancer genomic datasets such as the Cancer Genome Atlas (TCGA) have molecularly characterized thousands of matched normal/cancer samples spanning most cancer types. Transcriptomic analysis of public cancer datasets allows the characterization of differential gene expression, enrichments of specific gene sets and survival in the context of specific cancer type, mutations and/or expression of specific genes. 

Standard analysis:

  • Differential gene expression analysis (possibly in the context of specific mutation/cancer type)
  • GO, Pathway analysis, Gene Set Enrichment Analysis (GSEA)
  • Clustering of samples through dimensionality reduction
  • Gene signature analysis
  • Survival correlations
  • Clinical features correlations
  • Publication quality figures

Single-cell RNA sequencing (scRNA-seq) can reveal complex and rare cell populations, uncover regulatory relationships between genes, and track the trajectories of distinct cell lineages in development. 

Standard analysis:

  • Evaluation of reads quality and alignment statistics
  • Linear dimensionality reduction via PCA
  • Non-liner dimansionality reduction via UMAP and tSNE
  • Unsupervised cclustering of cells via Louvain algorithm
  • Cell cycle scoring
  • Diffusion maps
  • Identification of conserved and differential biomarkers via Wilcox and ROC methods
  • Functional enrichment via Gene Set Enrichment Analysis (GSEA)
  • Gene-Ontology terms and pathway enrichment analysis
  • scRNA-Seq dataset integration
  • Cell type annotation via markers or label transfer
  • Interactive data exploration via cellxgene

Customized analysis:

  • Motif discovery
  • Doublet / empty droplet detection
  • Pseudo-time analysis
  • RNA velocity analysis
  • Malignant/non-malignant cell detection
  • Data integration (e.g. scATAC-seq)
  • Data integration with publicly available resources
  • Publication quality figures

scMultiome sequencing allows simultaneous profiling of  gene expression and open chromatin from the same cell. Multiomic profiling of the transcriptome and epigenome at single cell resolution can transform our understanding of biology by enhancing the characterization of cell types and states, gaining deeper insights into underlying gene regulatory mechanisms with two readouts from every cell. scRNA-seq and scATAC-seq  components of the scMultiome data are first analyzed separately (see related services), and then integrated.

Standard analysis:

  • Joined non-linear dimensionality reduction via UMAP and tSNE
  • Joined unsupervised clustering of cells via Louvain algorithm
  • Cell cycle scoring
  • Diffusion maps
  • Identification of conserved and differential genes via Wilcox and ROC methods on joined cluster
  • Functional enrichment via Gene Set Enrichment Analysis (GSEA)
  • Gene-Ontology terms and pathway enrichment analysis
  • scRNA-Seq dataset integration
  • Differential accessibility analysis on joined cluster using logistic regression model and motif enrichment test in differential accessible regions
  • Motif activity analysis on joined cluster via chromVar
  • Foot printing plot for selected transcription factor
  • Peaks to genes association

Customized analysis:

  • Integration with published dataset
  • Trajectory analysis
  • Publication quality figures

Spatial transcriptomics is a groundbreaking molecular profiling method that allows measureing all the gene activity in a tissue sample and map where the activity is occurring by preserving the spatial information.

Standard analysis:

  • Evaluation of reads quality and alignment statistics
  • Visualization of gene expression on spatial slides
  • Unsupervised clustering of cells via Louvain algorithm
  • Identification of spatially variable genes via Mark Variogram
  • Functional enrichment via Gene Set Enrichment Analysis (GSEA) and via Over-Representation Analysis (ORA)
  • Gene-Ontology terms and pathway enrichment analysis
  • Cell type annotation via markers or label transfer

Customized analysis:

  • Data integration (e.g., scRNA-seq)
  • Data integration with publicly available resources
  • Publication quality figures

Genomics

A diverse array of DNA sequencing strategies are currently available. These range from high read-depth cancer gene panel sequencing to whole genome sequencing. These approaches allow precise genetic variant detection as a fundamental step for understanding disease causes, evolution and response to treatment.

Standard analysis:

  • Evaluation of reads quality and alignment statistics
  • Duplicate read removal, base recalibration and other best practices
  • Germline short variant discovery (SNPs and indels)
  • Somatic short variant discovery (SNP and indels)
  • Germline copy number (CNVs) and Structural Variant (SV) discovery
  • Somatic copy number (CNVs) and Structural Variant (SV) discovery

Customized analysis:

  • Multiple caller consensus integration and reporting
  • Variant annotation and interpretation of pathogenicity
  • Mutational burden and mutational signatures analyses (somatic)
  • Cohort integration and visualization of variants (i.e oncoprint, lollipop plots)
  • Sub-clonality and tumor purity estimation
  • Tumor evolution studies for longitudinal sample sets
  • Cohort integration and visualization of CNVs (i.e. GISTIC)
  • Multitrack visualizations using Circos plot
  • Complex genomic rearrangements (i.e. chromothripsis) analysis (WGS)
  • Integrative analyses with RNA-seq (eQTL)
  • Integration non-coding variant with transcriptomic and epigenetic data (WGS)
  • Publication quality figures