John Innes Centre site search

You are here - Home ::

Deduction of empirical formula


The empirical formula of a compound is its formula written merely as a list of elements and how many are present, for example: C6H12O6. Obviously a lot of different isomers can have the same empirical formula, but nevertheless, it's a good thing to be able to find out if you are trying to identify an unknown peak in a chromatogram. The Merck index has an index by empirical formula, and all chemical databases can be searched that way.

There are three obvious approaches to getting an empirical formula out of a mass spectrometer.

Use exact mass

If you haven't already done so, you might like to read about exact mass. Because hydrogen does not weigh exactly 1.000, and nitrogen does not weigh exactly 14.000, a CH2 group does not weigh the same as a nitrogen atom. If you can measure the mass accurately enough, you can work out whether a chemical has a nitrogen atom or a CH2 group - and so on for all the other potential atoms and groups. This can be simply a matter of trying out every empirical formula that could add up to approximately the mass you have, and seeing whether the error is realistic or not.

Obviously the more accurate your mass spectrometer, the smaller its realistic error, and therefore the less formulae it will find that may be correct. Also there are more ways to build big masses, so empirical formulae tend to be less certain as the mass gets larger.

Currently Q-tofs are good enough to produce a usefully small list for chemicals of a few hundred Daltons, while Ion cyclotrons can do much better still.

Use isotope information

Many elements have common heavy (or light) isotopes, occurring at known abundances relative to the main isotope. More information is available about isotope effects. An example is carbon, which is about 1% 13C. Theoretically a peak at M+1 about 6% higher than the baseline should indicate the presence of 6 carbons. Of course things aren't quite this simple as many other elements also have isotopes at +1, so it can be quite a task to work out all the combinations that could give rise to a particular pattern of isotope peaks. Nevertheless, this is a good way to cut down the list of potential formulae produced by approach No. 1. It works especially well for elements with large abundancies of heavy isotopes, and preferably several of them. In biology, this usually means sulphur (about 4% +2) and the halogens (apart from fluorine).

Use good sense

The list produced by approaches 1 and 2 can often be pruned to a single chemical if you can discount unlikely formulae. It's hard to imagine a plant metabolite with a formula like C23H2N47
Things like the nitrogen rule are useful. Nitrogen is the only common element in biochemicals to have a valency that is odd, and a mass that is even. Thus whether the mass of a parent ion is even or odd depends on whether it has an even or odd number of nitrogen atoms.