libmgf: Mascot Generic Format (MGF) Parser

Introduction

libmgf (formerly mgfp) is a flex/bison-based C++ MGF parser library.
It includes the library code as well as the following set of MGF processing tools:

The Steen & Steen Lab provides the library under the terms of a MIT license for use in academic and non-academic environments.

Citation

If you make use of libmgf in your own projects, please cite the following article:

If you use ms2preproc in your data analysis pipeline, please cite

Installation

Obtaining the Software

Binary packages for Microsoft Windows are available here:

Linux and Mac users, please build from source.

Building from Source

Building libmgf from source is straightforward. It requires a working CMake build system (available from http://cmake.org/) and CMake >= 2.6.

With cmake available, the build process is

 git clone git://github.com/kirchnerlab/libmgf.git
 mkdir libmgf-build
 cd libmgf-build
 cmake ../libmgf
 make
 make test
 make install
 make package (optional, generates binary packages for your platform)

For Mac users: if you are using MacPorts, then linking errors with boost::program_options seem common; alas, this is not a libmgf issue but indicates a compiler/library incompatibility on your system.

Usage

Using the command line applications

All command line applications come with a --help switch that describes their usage.

Using the libmgf library

To use the parser, one must first create a parser driver instance:

    #include <mgf/mgf.h>
    ...
    mgf::MgfFile mgfFile;
    mgf::Driver driver(mgfFile);

Then set the verbosity flage (defaulting to off/false)

    driver.trace_parsing = true;
    driver.trace_scanning = true;

and parse the input. The input is a stream (like std::cin in the example here or std::fstream).

    bool result = driver.parse_stream(std::cin);

One should always check if the parsing was successful and only continue if so.

    if (!result) {
        std::cerr << std::endl
          << "Error parsing data stream (use -v for details)." << std::endl;
        return -1;
    }

If the parsing was successful, the contents of the MGF file are available in terms of an MgfFile object; it is possible to iterate over the MS/MS spectra and to read/modify/otherwise process the contents. The example here attempts to extract TMT reporter ion intensities from centroid mode MS/MS spectra:

    typedef mgf::MgfFile::iterator MFI;
    for (MFI i = mgfFile.begin(); i != mgfFile.end(); ++i) {
        // sort the MS/MS spectrum by m/z
        std::sort(i->begin(), i->end(), mgf::LessThanMz());
        typedef mgf::MgfSpectrum::iterator MSI;
        // extract TMT reporter ion intensities
        std::tr1::array<double, 6> obsTmtAbundances;
        for (size_t n = 0; n < 6; ++n) {
            MSI closestIt = findClosestMz(i->begin(), i->end(), tmtMasses[n]);
            // check if the closest centroid is close enough
            if (std::abs(closestIt->first - tmtMasses[n]) < 0.5) {
                obsTmtAbundances[n] = closestIt->second;
            } else {
                obsTmtAbundances[n] = 0.0;
            }
            tmts.push_back(obsTmtAbundances);
        }
    }

Examples

Coding examples are in the applications/ subdirectory

Appendix

Known Issues

The MGF Grammar

The following is the current MGF Grammar, extracted from Parser.ypp.

0 $accept: start "end of file"

1 ion: "double" "double" "end of line"
2    | "integer" "double" "end of line"
3    | "double" "integer" "end of line"
4    | "integer" "integer" "end of line"

5 ions: ions ion
6     | ion

7 charge: "integer" '+'
8       | "integer" '-'

9 charges: '(' charges ')'
10        | charges ',' charge
11        | charges "and keyword" charge
12        | charge

13 csintegerlist: csintegerlist ',' "integer"
14              | "integer"

15 blocks: /* empty */
16       | blocks block

17 block: "begin_ions keyword" "end of line" localparams ions "end_ions keyword" "end of line"
18      | "begin_ions keyword" "end of line" localparams "end_ions keyword" "end of line"

19 globalparams: /* empty */
20             | globalparams globalparam

21 globalparam: "enzyme keyword" '=' "string" "end of line"
22            | "search title keyword" '=' "string" "end of line"
23            | "database keyword" '=' "string" "end of line"
24            | "MS/MS datafile format keyword" '=' "string" "end of line"
25            | "MS/MS ion series keyword" '=' "string" "end of line"
26            | "variable modifications keyword" '=' "string" "end of line"
27            | "units for ITOL keyword" '=' "string" "end of line"
28            | "mass type (mono or avg) keyword" '=' "string" "end of line"
29            | "fixed modifications keyword" '=' "string" "end of line"
30            | "quantitation method keyword" '=' "string" "end of line"
31            | "maximum hits keyword" '=' "string" "end of line"
32            | "type of report keyword" '=' "string" "end of line"
33            | "type of search keyword" '=' "string" "end of line"
34            | "taxonomy keyword" '=' "string" "end of line"
35            | "tolerance units keyword" '=' "string" "end of line"
36            | "user keyword" '=' "string" "end of line"
37            | "user email keyword" '=' "string" "end of line"
38            | "username keyword" '=' "string" "end of line"
39            | "perform decoy search keyword" '=' "integer" "end of line"
40            | "error tolerance keyword" '=' "integer" "end of line"
41            | "partials keyword" '=' "integer" "end of line"
42            | "fragment ion tolerance keyword" '=' "double" "end of line"
43            | "fragment ion tolerance keyword" '=' "integer" "end of line"
44            | "misassigned 13C keyword" '=' "double" "end of line"
45            | "precursor m/z keyword" '=' "double" "end of line"
46            | "precursor m/z keyword" '=' "integer" "end of line"
47            | "protein mass (kDa) keyword" '=' "double" "end of line"
48            | "protein mass (kDa) keyword" '=' "integer" "end of line"
49            | "peptide mass tolerance keyword" '=' "double" "end of line"
50            | "peptide mass tolerance keyword" '=' "integer" "end of line"
51            | "charge set keyword" '=' charges "end of line"
52            | "NA translation keyword" '=' csintegerlist "end of line"
53            | "comment" "end of line"

54 localparams: /* empty */
55            | localparams localparam

56 localparam: "title keyword and full title string" "end of line"
57           | "amino acid composition keyword" '=' "string" "end of line"
58           | "MS/MS ion series keyword" '=' "string" "end of line"
59           | "variable modifications keyword" '=' "string" "end of line"
60           | "retention time or range keyword" '=' "double" "end of line"
61           | "retention time or range keyword" '=' "integer" "end of line"
62           | "retention time or range keyword" '=' "double" '-' "double" "end of line"
63           | "retention time or range keyword" '=' "double" '-' "integer" "end of line"
64           | "retention time or range keyword" '=' "integer" '-' "double" "end of line"
65           | "retention time or range keyword" '=' "integer" '-' "integer" "end of line"
66           | "scan number of range keyword" '=' "integer" "end of line"
67           | "scan number of range keyword" '=' "integer" '-' "integer" "end of line"
68           | "tolerance units keyword" '=' "string" "end of line"
69           | "amino acid sequence keyword" '=' "string" "end of line"
70           | "sequence tag keyword" '=' "string" "end of line"
71           | "error tolerant sequence keyword" '=' "string" "end of line"
72           | "peptide mass tolerance keyword" '=' "double" "end of line"
73           | "peptide mass tolerance keyword" '=' "integer" "end of line"
74           | "charge set keyword" '=' charges "end of line"
75           | "precursor mass keyword" '=' "double" "end of line"
76           | "precursor mass keyword" '=' "double" "double" "end of line"
77           | "precursor mass keyword" '=' "double" "integer" "end of line"
78           | "precursor mass keyword" '=' "integer" "end of line"
79           | "precursor mass keyword" '=' "integer" "double" "end of line"
80           | "precursor mass keyword" '=' "integer" "integer" "end of line"
81           | "comment" "end of line"

82 contents: globalparams blocks "end of file"

83 start: contents


 All Classes Functions Variables Typedefs

Generated on Thu Nov 24 07:44:09 2011 for libmgf by  doxygen 1.6.1