Skip to the content.

FragPipe-Analyst for TMT data analysis

Introduction

This tutorial documents how to use FragPipe-Analyst to analyze quantitative proteomics results generated by the LFQ related workflows of FragPipe. You can download the example dataset here.

Input

For LFQ, You will need two things for FragPipe-Analyst.

After you understand the input, you could upload those two files to FragPipe-Analyst. Don’t forget to choose LFQ in the dropdown menu. After you upload files, FragPipe-Analyst will process the result and present the result shortly. Following material covers the deatils about what you will see.

Output

Result Plots (Upper Right Panel)

  1. Volcano plot: A volcano plot is generated for each pairwise comparison. It is a graphical visualization by plotting the “Fold Change (Log2)” on the x-axis versus the –log10 of the “p-value” on the y-axis. Interesting candidate proteins are located in the left and right upper quadrant. User can toggle the display name checkbox to highlight names of differentially expressed proteins or use ‘adjusted p-value’ as y-axis. Importantly, user can highlight protein or their interest (colored maroon) by selecting the row from “Results Table”. This highlighted plot can be downloaded using “ Save Highlighted Plot” button.

volcano_plot

  1. Heatmap: The heatmap representation gives an overview of all significant/differentially expressed proteins (rows) in all samples (columns). This visualization allows the identification of general trends such as if one sample or replicate is highly different compared to the others and might be considered as an outlier. Additionally, the hierarchical clustering of samples (columns) indicates how related the different samples are and hierarchical clustering of proteins (rows) identifies similarly behaving proteins.

    With our sample data, the heatmap shows expected separation between CCND1 and control samples and a few protein clusters behaving contrastingly different between the groups. User also have option to download protein information from individual cluster.

heatmap

  1. Feature plot: Selecting few genes from the ‘Results Table’, A box plot or a violin plot will be shown comparing expression of those genes between conditions. Note that depends on if you check “Show imputed values” box or not. It will some boxplot with imputed dataset:

boxplot imputed

or the original dataset:

boxplot

QC Plots (Bottom Left Panel)

  1. PCA plot: A Principal Component Analysis(PCA) is a technique used to emphasize variation and bring out strong patterns in a dataset. PC1, which is a linear combination of all features, shown on the x axis, explains the most variation of the data, and followed by the rest of PCs. In brief, the more similar 2 samples are, the closer they cluster together. For further information, here are a few links, which explains the principals of PCAs: Info and Basic introduction

    After plotting PCA with our sample data, samples are presented in the scatter plot below. As you can observed, CCND1 and control samples are separated by PC1 but one particular control seems to be quite different from others.

PCA

  1. Sample Correlation Plot: A correlation matrix is plotted as a heatmap to visualize the Pearson correlation coefficients between the different samples.

sample_correlation

  1. Sample CVs Plots: A plot representing distribution of protein level coefficient of variation for each condition. Each plot also contains a vertical line representing median CVs percentage within that condition.

cv_plot

  1. Feature Numbers: A bar-plot representing number of proteins identified and quantified in each TMT plex.

proteins

  1. Missing values- Heatmap: To explore the pattern of missing values in the data, a heatmap is plotted indicating whether values are missing (0) or not (1). Only proteins with at least one missing value are visualized.

missing_heatmap

Enrichment Analysis (Bottom Right Panel)

FragPipe-Analyst provides enrichment analysis for both Gene Ontology(GO) and pathways.

  1. GO: Selecting the comparison of interest, the GO database (Molecular Function/Cellular Component/Biological Process), and direction (up regulated or down regulated). It checks the differentially expressed (DE) list of genes against known sets of genes. The background gene list is composed with IDs appeared in the input data. A hypergeometric test is performed. Log odds ratio (log_odds) is calculated as log2((IN/OUT)/(bg_IN/bg_OUT)), where IN and OUT are the number of DE genes in and outside of a gene set of interest, and bg_IN and bg_OUT are the nubmer of other genes in and outside of a gene set of interest. Gene sets used are fetched from the Enrichr API

  2. Pathway enrichment: Same algorithm is used as the Gene Ontology part. Pathway database choices are Hallmark, KEGG and Reactome.

Here shows some results after you clicked the “Run Enrichment” button:

Hallmak

KEGG

GO MF Enrichment

Result table

  1. Results Table: Includes names (Gene names), Protein Ids, Log fold changes/ ratios (each pairwise comparisons), Adjusted p-values (applying FDR corrections), p-values, Boolean values for significance, average protein intensity (log transformed) in each sample.

Download options

  1. Results: Same as Results Table
  2. Unimputed data matrix: Original protein intensities before imputation in each sample.
  3. Imputed data matrix: Protein intensities after performing selected imputation method
  4. Full results: Combined table of all above data outputs i.e. with and without imputation information, along with fold change and p-values.

Advanced Options and Details

Differential expression analysis (DE)

As you can see, FragPipe-Analyst functionality relies on differential expression (DE) analysis. Internally, we use a Bioconductor package limma to carry out the DE analysis on each protein. Contrasts are built automatically from condition levels provided by the user allowing the generation of results for all possible comparisons. Multiple test adjustment are done with user specified options (default with “BH”). It also takes into account user defined cutoffs to filter significantly differentially expressed proteins.

Significant protein filtering criteria

Missing value imputation options

False Discovery Rate (FDR) correction option

In sample data demonstration, we set Adjusted p-value cutoff at 0.05, Log2 fold change cutoff at 1 and Type of FDR correction at Benjamini Hochberg.