DeNovoGUI: A Beginner’s Guide to De Novo Peptide Sequencing

How to Use DeNovoGUI for Protein Identification — Step-by-StepDeNovoGUI is a user-friendly graphical interface that brings together several de novo peptide sequencing engines (such as PepNovo+, Novor, and DirecTag) and mass-spectrometry data processing tools. It simplifies the process of identifying peptide sequences directly from tandem mass spectrometry (MS/MS) spectra without relying on a protein sequence database, which is especially useful for studying novel proteins, post-translational modifications, unexpected variants, or organisms with incomplete genomic information. This step-by-step guide walks you through preparing your data, configuring DeNovoGUI, running de novo sequencing, validating results, and integrating findings into downstream protein identification workflows.


Overview: When and why to use de novo sequencing

De novo sequencing reconstructs peptide sequences from fragment ion spectra alone. Use it when:

  • The organism’s proteome is not available or incomplete.
  • You suspect novel peptides, sequence variants, or rapid evolution.
  • You need to identify unexpected post-translational modifications (PTMs) not present in standard databases.
  • You want complementary evidence alongside database search results.

Advantages: detects novel sequences, uncovers modifications and variants, no reliance on databases. Limitations: lower confidence than database searches for complex spectra; requires high-quality MS/MS data.


1. Install and prepare DeNovoGUI

  1. Download the latest DeNovoGUI release for your OS from the official distribution (DeNovoGUI is typically distributed as a Java application). Ensure you have a compatible Java Runtime Environment (JRE) — Java 8 or later is commonly required.
  2. Unpack the archive (if applicable) and launch the DeNovoGUI executable (or run the JAR with java -jar DeNovoGUI.jar).
  3. Verify included engines: check that PepNovo+, Novor, DirecTag, or other available plugins are present and configured. Some engines are distributed separately and might require placement in a specific plugins folder or configuration through the GUI.

Notes:

  • Keep a record of the exact versions of DeNovoGUI and each engine for reproducibility.
  • If Novor or other engines require a license, ensure you have one.

2. Prepare your MS/MS data

High-quality input improves de novo accuracy.

File formats:

  • Use centroided MS/MS peak lists such as MGF, mzML, or mzXML. DeNovoGUI supports common formats; convert raw vendor files if necessary (e.g., with ProteoWizard’s msConvert).
  • Ensure MS/MS spectra are centroided; de novo performance suffers with profile-mode data.

Preprocessing recommendations:

  • Remove low-quality spectra (e.g., low total ion current or few peaks).
  • Apply peak filtering to remove noise and reduce the number of peaks per spectrum (e.g., keep top N peaks per window).
  • Consider deisotoping and charge-state deconvolution if supported by your pipeline.
  • Calibrate masses or apply mass error corrections if systematic shifts are suspected.

Metadata:

  • Ensure precursor m/z, charge state, and instrument type are correctly recorded in the file header.

3. Configure DeNovoGUI project and parameters

  1. Create a new project and add your MS/MS input files (MGF, mzML, mzXML).
  2. Select which de novo engines to run. Running multiple engines increases coverage and allows consensus scoring.
  3. Set instrument and fragmentation settings:
    • Fragmentation type: CID, HCD, ETD, or mixed. Choose based on your instrument and acquisition method.
    • Precursor and fragment mass tolerances: set according to instrument performance (e.g., 10 ppm precursor for Orbitrap, 0.5 Da for ion traps; fragment tolerances 0.02–0.05 Da for high-res fragments).
  4. Define enzyme specificity if any constraints should be applied for downstream interpretation (note: de novo itself doesn’t require enzyme settings, but specifying helps scoring/post-processing).
  5. Define PTMs:
    • Fixed modifications (e.g., carbamidomethylation of Cys) should be set if all peptides were treated.
    • Variable modifications can be allowed but beware of combinatorial explosion. For de novo work, limit to a small list of likely modifications (e.g., oxidation of Met).
  6. Peak processing options:
    • Noise threshold, minimum peaks per spectrum, maximum peaks per window.
    • Optionally enable precursor mass correction or recalibration.

Tip: Start with conservative settings (tight mass tolerances, only common PTMs), then relax parameters if yield is low.


4. Run de novo sequencing

  1. Start the job and monitor progress in DeNovoGUI. For large datasets, consider batch processing or running on a workstation with multiple cores (engines may be multithreaded).
  2. If using multiple engines, you can run them in sequence or in parallel (depending on system resources and DeNovoGUI configuration).
  3. Save logs and result files for reproducibility.

Performance considerations:

  • CPU and memory affect throughput. High-res spectra with many peaks take longer.
  • Some engines perform better on specific fragmentation types (e.g., ETD favors c/z ions).

5. Inspect and interpret peptide-spectrum matches (PSMs)

DeNovoGUI provides per-spectrum candidate sequences and scores.

Key fields to review:

  • Best-scoring peptide sequence(s) per spectrum and associated scores.
  • Sequence tags: short high-confidence subsequences (e.g., 3–5 residues) that can be used for database searching or validation.
  • Mass errors for precursor and fragments.
  • Annotation of matched ion types (b-, y-, c-, z-ions) to assess coverage.

Practical checks:

  • High-scoring peptides with comprehensive ion series coverage (many consecutive b- or y-ions) are more reliable.
  • Look for consistent mass shifts indicating modifications.
  • Compare candidates from multiple engines—consensus sequences are more trustworthy.

6. Validate and filter results

Because de novo sequencing has higher uncertainty, apply validation steps:

  1. Score thresholds:
    • Use engine-specific score cutoffs or percentile thresholds derived from your run.
  2. Manual inspection:
    • For key peptides (novel findings), inspect spectrum annotations and fragmentation coverage manually.
  3. Cross-engine consensus:
    • Prioritize sequences reported by two or more engines with similar residue assignments.
  4. Sequence tags to database search:
    • Use high-confidence n-mer tags (e.g., 4–6 residues) to search sequence databases with relaxed constraints; this can anchor de novo tags to known proteins or reveal variants.
  5. Use spectral libraries:
    • Compare de novo identifications to library spectra if available.
  6. False discovery rate (FDR):
    • Traditional target-decoy FDR methods don’t directly apply to de novo outputs. Instead, use decoy tag strategies or integrate de novo results into a peptide-spectra matching framework that supports FDR estimation.

7. From peptide sequences to protein identification

De novo peptides can be mapped to proteins or used to propose novel protein sequences.

Approaches:

  • Database mapping: BLAST or local sequence alignment of de novo sequences against protein databases. Short peptides may map ambiguously; longer tags increase specificity.
  • Tag-based search: Tools like PSMTag or InsPecT can convert de novo tags into constrained database searches.
  • Assembly of overlapping de novo peptides: If multiple peptides overlap, assemble them into longer contigs to increase confidence and enable protein-level identification.
  • Variant discovery: Map de novo sequences to reference proteins allowing mismatches to reveal single amino acid variants or polymorphisms.
  • De novo protein prediction: For organisms lacking proteomes, assemble peptide tags into predicted protein sequences, then validate with additional MS evidence or complementary sequencing.

8. Reporting and downstream analyses

  1. Create a result table that includes: spectrum ID, precursor m/z, charge, de novo sequence, engine score(s), fragment coverage, and notes on validation status.
  2. Annotate probable PTMs and mass shifts.
  3. For high-confidence novel peptides, provide spectrum images with annotated ions and save raw spectrum IDs for traceability.
  4. Integrate identifications into quantitative workflows (label-free quant, TMT) if needed.
  5. Document parameters and software versions used.

Include examples:

  • Example column headings for a report: Spectrum, Precursor_mz, Charge, DeNovo_Sequence, Engine, Score, Matched_Ions, Mass_Error_ppm, Validation_Status.

9. Common pitfalls and troubleshooting

  • Poor spectra: low S/N or few fragment ions yield unreliable sequences — filter or recollect data.
  • Wrong mass tolerances: too loose tolerances increase false positives; too tight miss true matches—match tolerances to instrument specs.
  • Overly permissive PTM lists: combinatorial search space reduces accuracy and increases runtime.
  • Mis-annotated charge states: correct charge assignment is critical for precursor mass and fragmentation interpretation.
  • Ignoring multiple engines: single-engine results risk algorithm-specific biases—use consensus where possible.

10. Advanced tips

  • Use hybrid strategies: combine de novo tags with database searches (open-modification searches) to identify modified or mutated peptides.
  • Iterative refinement: run initial de novo with conservative settings, extract reliable tags, update search space (e.g., include discovered PTMs), and rerun.
  • Use retention time and orthogonal data (e.g., predicted protease cleavage, homologous sequences) to prioritize candidates.
  • If working with immunopeptidomics (HLA peptides), relax enzyme specificity and prioritize length-appropriate peptides (8–11 residues).
  • For large-scale projects, automate DeNovoGUI runs and downstream parsing with scripts (export results in machine-readable formats like CSV/TSV).

Example workflow (concise)

  1. Convert raw files to mzML and centroid peaks (msConvert).
  2. Launch DeNovoGUI, create project, add mzML files.
  3. Select Novor and PepNovo+, set fragmentation = HCD, precursor tol = 10 ppm, fragment tol = 0.02 Da; allow Met oxidation variable.
  4. Run engines in parallel; export top 5 sequences per spectrum.
  5. Filter sequences: require at least 4 consecutive matched ions or consensus between engines.
  6. Use 5-mer tags in BLAST against UniProt to map to proteins or discover variants.
  7. Manually validate key novel sequences using spectrum annotation.

Conclusion

DeNovoGUI streamlines de novo peptide sequencing by integrating multiple engines and providing an accessible GUI for mass-spectrometry users. Success depends on high-quality MS/MS data, sensible parameter choices (mass tolerances, allowed modifications), cross-engine consensus, and careful validation—especially when reporting novel peptides or protein-level identifications. Use de novo sequencing not as a replacement for database searches but as a complementary approach to uncover sequences missing from reference proteomes or containing unexpected modifications or variants.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *