Skip to the content.

Untargeted analysis of DIA datasets using FragPipe

This is the first part of two parts tutorial of an untargeted analysis of a data-independent acquisition (DIA) dataset using the FragPipe computational platform. We will analyse a subset of samples from a published clear cell renal cell carcinoma (ccRCC) study that were originally described in the following publication: D. J. Clark et al. “Integrated Proteogenomic Characterization of Clear Cell Renal Cell Carcinoma”, Cell 2019 179(4):964-983. doi: 10.1016/j.cell.2019.10.007 (https://pubmed.ncbi.nlm.nih.gov/31675502/). Briefly, in the original study, researchers from the CPTAC (Clinical Proteomic Tumor Analysis Consortium) profiled tumor (T) samples, together with normal adjacent tissue (NAT) samples from each cancer patient, to understand the tumorigenesis of ccRCC. 110 tumor and 83 NAT samples were collected from patients and their proteomes were profiled via mass spectrometry. These samples were profiled using: 1. tandem mass tag (TMT), and 2. data-independent acquisition (DIA). The DIA set was generated on an Orbitrap Lumos mass spectrometer with a variable window acquisition scheme.

Here, we will use just 10 DIA runs from 5 ccRCC patients, one tumor and one paired NAT sample for each patient. To make the data processing faster, we will use only data in two isolation windows (613 to 650 Th mass range) from each original mzML file.

We will use FragPipe for these analyses, a suite of computational tools with a Graphical User Interface (GUI) for enabling comprehensive analysis of mass spectrometry-based proteomics data. It is powered by MSFragger, an ultrafast proteomic search engine suitable for both conventional and open (wide precursor mass tolerance) peptide identification. FragPipe includes Percolator as well as the Philosopher toolkit for downstream statistical post-processing of MSFragger search results (PeptideProphet, iProphet, ProteinProphet), FDR filtering, and multi-experiment summary report generation. The software is well documented (https://fragpipe.nesvilab.org/). The FragPipe DIA workflow used in this tutorial is described in F. Yu et al. “Analysis of DIA proteomics data using MSFragger-DIA and FragPipe computational platform”. Nature Communications 2023, 14(1), 4154.

In this tutorial, we will first process the data with MSFragger-DIA to identify peptides directly from the DIA data, then rescore the search results with MSBooster and Percolator, apply FDR filters, build the spectral library using Easy-PQP, and finally pass the FragPipe-build spectral library to DIA-NN to extract peptide quantification. Once we get the identification and quantification results from FragPipe, we will load them in FragPipe-PDV to visualize the identifications, and we will perform some downstream analysis using FragPipe-Analyst. Finally, we will learn how to load the raw data in Skyline to see the extracted ion chromatograms for each of the identified peptides.

Parametrization of FragPipe graphical user interfase

In this first part of the tutorial we will set up the graphical user interface of FragPipe and launch a library-free (direct DIA) search of DIA data with MSFragger-DIA, followed by spectral librray building, and quantification with DIA-NN. The end result of this part will be the generation of a collection of matrices with the quantification values at the precursor and protein levels, as well as a summary pdf file of the experiment. Here we use FragPipe 20.1-build15 as an example:

Parametrization of the Config section

In this section we need to make sure that all the different tools that are required by FragPipe are installed in the system and provide FragPipe with the path to the corresponding executables.

MSFragger

IonQuant

Philosohper

DIANN

Python

After the installer is finished installing Python, the path should be automatically updated to “C:\Users[your user]\AppData\Local\Programs\Python\Python39\python.exe”. Otherwise, customize the path to python to your local installation.

Now your Config should look like this:

Config

Parametrization of the Workflow section

FragPipe supports multiple proteomics workflows which can be customized, saved and shared with other users.

In the Workflow tab:

Parametrization of the Database section

We will skip the Umpire tab as it is not meant to be executed in the selected workflow, and move directly to the Database tab.

select_database

database

Parametrization of the MSFragger section

In the MSFragger tab you can check the search parameters that will be used to interpret the acquired spectra our analysis. The parameters have already been filled with the default values associated to the workflow selected. Let’s review them.

MSFragger

modification

spec_processing

advacned

You can choose to save a customized parameter file to load for future use, or save the entire workflow (from either the ‘Workflow’ or the ‘Run’ tab).

Parametrization of the Validation section

The Validation section will also be executed as part of the selected workflow. The search results obtained from MSFragger will be further analyzed by MSBooster, Percolator and ProteinProphet to get confident peptide identifications.

In this process, MSBooster will first use deep learning to predict additional features of the identified peptides including fragmentation spectra, retention time, and detectability (and ion mobility).

MSBooster

These features will be used to modify the initial identification scoring, and then Percolator will use them to improve its discrimination model to increase the number of confident identifications in the DIA dataset.

PSM_validation

Finally, based on the identified peptides we will run the Protein Inference together with ProteinProphet to generate a confident list of protein groups identified in the sample at a maximum of 1% false discovery rate.

protein_prophet

Parametrization of the Spec Lib section

Next, we will jump directly to the Spec Lib tab as the other ones (PTMs, Glyco, Quant (MS1), and Quant (isobaric)) are not relevant for the selected workflow and will not be executed. In the Spec Lib section we will generate a spectral library from the search results, containing b and y fragment ions, and we will allow for an automatic selection of the runs that will be used as reference for the retention time.

Spec_lib

Parametrization of the Quant (DIA) section

In the Quant(DIA) section we will set the quantification to be performed by DIA-NN with a maximum false discovery rate of 1%. In this section, we will also verify that the “Generate MSstats input” is checked.

quant

Parametrization of the Run section

In this final section, we will indicate the output directory and run the analysis.

run

Exploration of the FragPipe main results tables

In this part of the tutorial we will go through the main results tables generated by FragPipe and some of its intermediate files.

Inspection of the FragPipe main output

output

msstats

Inspection of intermediate FragPipe output files

If you are curious, you can explore FragPipe output files to get a better understanding of various FragPipe modules.

Visualization of the FragPipe main results

Visualization of identification results in FragPipe-PDV

In this section we will visualize the identification results from FragPipe at different levels, including experiments, proteins, peptides and PSM information.

PDV_main

There are several functions embedded in FragPipe-PDV that we will explore. For example, one can look for certain peptide sequences or protein of interest. - Search the protein “CTNA1”, using the searching function located on the top right corner. How many PSMs are associated to this protein? How many different unique peptide sequences have been identified for this protein? What are their PeptideProphetProbabilities

PDV_protein

In FragPipe-PDV you can also see the annotated spectra in which peptides were identified. FragPipe-PDV has several options to configure the settings for peptide spectra visualization. - Go to the “Tools” menu below the spectrum, click and select “Show Predicted” to show the predicted spectrum in a mirror spectra format.

PDV_annotated_spec PDV_annotated_spec2

Why do you think spectra are populated with so many different unmatched peaks? Are they good identifications? - To clean the spectra, click “Show Matched Peaks” in the “Settings” menu to remove background peaks

PDV_annotated_matched_spec PDV_annotated_matched_spec2

Do the identifications look better now? Do you think that they are more credible?

Now we will see how different peptides can be identified in the same single MS2 spectrum. For this example, we will use one peptide SMEDSVDVSAPK from sp Q8IVF2 AHNK2_HUMAN Protein. This is one of the ccRCC cancer biomarkers (overexpressed in tumor samples) that we will also use as an example later in this tutorial.

PSM

PSM_2

You can use the PDV viewer to visualize both peptides at the same time.

PSM_check_peptide

amino_acid

amino_acid_modification

matched_peak

Next: Downstream analysis using FragPipe-Analyst