How to Use DeNovoGUI for Protein Identification — Step-by-StepDeNovoGUI is a user-friendly graphical interface that brings together several de novo peptide sequencing engines (such as PepNovo+, Novor, and DirecTag) and mass-spectrometry data processing tools. It simplifies the process of identifying peptide sequences directly from tandem mass spectrometry (MS/MS) spectra without relying on a protein sequence database, which is especially useful for studying novel proteins, post-translational modifications, unexpected variants, or organisms with incomplete genomic information. This step-by-step guide walks you through preparing your data, configuring DeNovoGUI, running de novo sequencing, validating results, and integrating findings into downstream protein identification workflows.
Overview: When and why to use de novo sequencing
De novo sequencing reconstructs peptide sequences from fragment ion spectra alone. Use it when:
- The organism’s proteome is not available or incomplete.
- You suspect novel peptides, sequence variants, or rapid evolution.
- You need to identify unexpected post-translational modifications (PTMs) not present in standard databases.
- You want complementary evidence alongside database search results.
Advantages: detects novel sequences, uncovers modifications and variants, no reliance on databases. Limitations: lower confidence than database searches for complex spectra; requires high-quality MS/MS data.
1. Install and prepare DeNovoGUI
- Download the latest DeNovoGUI release for your OS from the official distribution (DeNovoGUI is typically distributed as a Java application). Ensure you have a compatible Java Runtime Environment (JRE) — Java 8 or later is commonly required.
- Unpack the archive (if applicable) and launch the DeNovoGUI executable (or run the JAR with
java -jar DeNovoGUI.jar
). - Verify included engines: check that PepNovo+, Novor, DirecTag, or other available plugins are present and configured. Some engines are distributed separately and might require placement in a specific plugins folder or configuration through the GUI.
Notes:
- Keep a record of the exact versions of DeNovoGUI and each engine for reproducibility.
- If Novor or other engines require a license, ensure you have one.
2. Prepare your MS/MS data
High-quality input improves de novo accuracy.
File formats:
- Use centroided MS/MS peak lists such as MGF, mzML, or mzXML. DeNovoGUI supports common formats; convert raw vendor files if necessary (e.g., with ProteoWizard’s msConvert).
- Ensure MS/MS spectra are centroided; de novo performance suffers with profile-mode data.
Preprocessing recommendations:
- Remove low-quality spectra (e.g., low total ion current or few peaks).
- Apply peak filtering to remove noise and reduce the number of peaks per spectrum (e.g., keep top N peaks per window).
- Consider deisotoping and charge-state deconvolution if supported by your pipeline.
- Calibrate masses or apply mass error corrections if systematic shifts are suspected.
Metadata:
- Ensure precursor m/z, charge state, and instrument type are correctly recorded in the file header.
3. Configure DeNovoGUI project and parameters
- Create a new project and add your MS/MS input files (MGF, mzML, mzXML).
- Select which de novo engines to run. Running multiple engines increases coverage and allows consensus scoring.
- Set instrument and fragmentation settings:
- Fragmentation type: CID, HCD, ETD, or mixed. Choose based on your instrument and acquisition method.
- Precursor and fragment mass tolerances: set according to instrument performance (e.g., 10 ppm precursor for Orbitrap, 0.5 Da for ion traps; fragment tolerances 0.02–0.05 Da for high-res fragments).
- Define enzyme specificity if any constraints should be applied for downstream interpretation (note: de novo itself doesn’t require enzyme settings, but specifying helps scoring/post-processing).
- Define PTMs:
- Fixed modifications (e.g., carbamidomethylation of Cys) should be set if all peptides were treated.
- Variable modifications can be allowed but beware of combinatorial explosion. For de novo work, limit to a small list of likely modifications (e.g., oxidation of Met).
- Peak processing options:
- Noise threshold, minimum peaks per spectrum, maximum peaks per window.
- Optionally enable precursor mass correction or recalibration.
Tip: Start with conservative settings (tight mass tolerances, only common PTMs), then relax parameters if yield is low.
4. Run de novo sequencing
- Start the job and monitor progress in DeNovoGUI. For large datasets, consider batch processing or running on a workstation with multiple cores (engines may be multithreaded).
- If using multiple engines, you can run them in sequence or in parallel (depending on system resources and DeNovoGUI configuration).
- Save logs and result files for reproducibility.
Performance considerations:
- CPU and memory affect throughput. High-res spectra with many peaks take longer.
- Some engines perform better on specific fragmentation types (e.g., ETD favors c/z ions).
5. Inspect and interpret peptide-spectrum matches (PSMs)
DeNovoGUI provides per-spectrum candidate sequences and scores.
Key fields to review:
- Best-scoring peptide sequence(s) per spectrum and associated scores.
- Sequence tags: short high-confidence subsequences (e.g., 3–5 residues) that can be used for database searching or validation.
- Mass errors for precursor and fragments.
- Annotation of matched ion types (b-, y-, c-, z-ions) to assess coverage.
Practical checks:
- High-scoring peptides with comprehensive ion series coverage (many consecutive b- or y-ions) are more reliable.
- Look for consistent mass shifts indicating modifications.
- Compare candidates from multiple engines—consensus sequences are more trustworthy.
6. Validate and filter results
Because de novo sequencing has higher uncertainty, apply validation steps:
- Score thresholds:
- Use engine-specific score cutoffs or percentile thresholds derived from your run.
- Manual inspection:
- For key peptides (novel findings), inspect spectrum annotations and fragmentation coverage manually.
- Cross-engine consensus:
- Prioritize sequences reported by two or more engines with similar residue assignments.
- Sequence tags to database search:
- Use high-confidence n-mer tags (e.g., 4–6 residues) to search sequence databases with relaxed constraints; this can anchor de novo tags to known proteins or reveal variants.
- Use spectral libraries:
- Compare de novo identifications to library spectra if available.
- False discovery rate (FDR):
- Traditional target-decoy FDR methods don’t directly apply to de novo outputs. Instead, use decoy tag strategies or integrate de novo results into a peptide-spectra matching framework that supports FDR estimation.
7. From peptide sequences to protein identification
De novo peptides can be mapped to proteins or used to propose novel protein sequences.
Approaches:
- Database mapping: BLAST or local sequence alignment of de novo sequences against protein databases. Short peptides may map ambiguously; longer tags increase specificity.
- Tag-based search: Tools like PSMTag or InsPecT can convert de novo tags into constrained database searches.
- Assembly of overlapping de novo peptides: If multiple peptides overlap, assemble them into longer contigs to increase confidence and enable protein-level identification.
- Variant discovery: Map de novo sequences to reference proteins allowing mismatches to reveal single amino acid variants or polymorphisms.
- De novo protein prediction: For organisms lacking proteomes, assemble peptide tags into predicted protein sequences, then validate with additional MS evidence or complementary sequencing.
8. Reporting and downstream analyses
- Create a result table that includes: spectrum ID, precursor m/z, charge, de novo sequence, engine score(s), fragment coverage, and notes on validation status.
- Annotate probable PTMs and mass shifts.
- For high-confidence novel peptides, provide spectrum images with annotated ions and save raw spectrum IDs for traceability.
- Integrate identifications into quantitative workflows (label-free quant, TMT) if needed.
- Document parameters and software versions used.
Include examples:
- Example column headings for a report: Spectrum, Precursor_mz, Charge, DeNovo_Sequence, Engine, Score, Matched_Ions, Mass_Error_ppm, Validation_Status.
9. Common pitfalls and troubleshooting
- Poor spectra: low S/N or few fragment ions yield unreliable sequences — filter or recollect data.
- Wrong mass tolerances: too loose tolerances increase false positives; too tight miss true matches—match tolerances to instrument specs.
- Overly permissive PTM lists: combinatorial search space reduces accuracy and increases runtime.
- Mis-annotated charge states: correct charge assignment is critical for precursor mass and fragmentation interpretation.
- Ignoring multiple engines: single-engine results risk algorithm-specific biases—use consensus where possible.
10. Advanced tips
- Use hybrid strategies: combine de novo tags with database searches (open-modification searches) to identify modified or mutated peptides.
- Iterative refinement: run initial de novo with conservative settings, extract reliable tags, update search space (e.g., include discovered PTMs), and rerun.
- Use retention time and orthogonal data (e.g., predicted protease cleavage, homologous sequences) to prioritize candidates.
- If working with immunopeptidomics (HLA peptides), relax enzyme specificity and prioritize length-appropriate peptides (8–11 residues).
- For large-scale projects, automate DeNovoGUI runs and downstream parsing with scripts (export results in machine-readable formats like CSV/TSV).
Example workflow (concise)
- Convert raw files to mzML and centroid peaks (msConvert).
- Launch DeNovoGUI, create project, add mzML files.
- Select Novor and PepNovo+, set fragmentation = HCD, precursor tol = 10 ppm, fragment tol = 0.02 Da; allow Met oxidation variable.
- Run engines in parallel; export top 5 sequences per spectrum.
- Filter sequences: require at least 4 consecutive matched ions or consensus between engines.
- Use 5-mer tags in BLAST against UniProt to map to proteins or discover variants.
- Manually validate key novel sequences using spectrum annotation.
Conclusion
DeNovoGUI streamlines de novo peptide sequencing by integrating multiple engines and providing an accessible GUI for mass-spectrometry users. Success depends on high-quality MS/MS data, sensible parameter choices (mass tolerances, allowed modifications), cross-engine consensus, and careful validation—especially when reporting novel peptides or protein-level identifications. Use de novo sequencing not as a replacement for database searches but as a complementary approach to uncover sequences missing from reference proteomes or containing unexpected modifications or variants.
Leave a Reply