International Chemical Identifier (InChI key)

What is InChI?

The international chemical identifier "International Chemical Identifier" InChI was created in 2007 by the IUPAC to facilitate the search on the Internet (Google) for chemical compounds in a more efficient way.

Disadvantages of the IUPAC nomenclature

The main problem that arises in Internet searches with the IUPAC systematic names of organic compounds is that sometimes a molecule has many synonyms. Therefore, if you search for a particular synonym, other documents that are described by other names are omitted from the results.

On the other hand, systematic names containing numbers and symbols such as hyphens, parentheses, brackets, quotation marks, Greek letters, etc. are not effective in Internet search engines. In addition, many of the chemical structures on the Internet are displayed only with PNG, GIF, JPG, etc. image files where search engines are not effective.

Finally, there is the inconvenience of correctly formulating large molecules with a systematic name because the name generated is too long.

For all these reasons, and in order to facilitate the nomenclature and formulation of chemical compounds and the search for information on the Internet of these compounds, the identifiers InChI key have been created.

Advantages of the InChI Key identifier

These identifiers have the following advantages:

  • This identifier is unique for each chemical compound.
  • They are free to use and without ownership, unlike, for example, CAS numbers.
  • They can be calculated (by any user) from structural information and not assigned by an organization.
  • Most of the information in an InChI is readable.

Identifier structure

El algoritmo InChI convierte la información estructural de entrada en un identificador único InChI en un proceso de tres pasos: normalization, canonical, y la serialization.

normalización, canónicos, y la serialización en el codigo InChI

The InChI key is a condensed fixed-length (27 characters) digital representation of the InChI code that is not human-understandable. Strictly speaking, the InChIKey string of a compound is not unique although repetitions can be calculated to be very infrequently occurring it is considered unique.

Examples in organic molecules

Let's look at a InChI chain example for organic molecules. The structure of the morphine molecule is as follows:

Estructura de la morfina BQJCRHHNABKAKU-KBQPJGBKSA-N

Its code InChI is as follows:

InChI=1S/C16H16NO3/c18-11-3-1-8-7-10-9-2-4-12(19)15- 16(9,5-6-17-10)13(8)14(11)20-15/h1-4,9-10,12,15,18-19H,5-7H2/t9- ,10+,12-,15-,16-/m0/s1

Each InChI begins with the string "InChI =" followed by the version number, currently 1. This is followed by the letter S for standard InChI.

The remaining information is structured as a sequence of layers and sublayers, with each layer providing a specific type of information (atoms and their bond connectivity, tautomeric information, isotope information, stereochemistry and electronic charge information).

Layers and sublayers are separated by the delimiter "/" and begin with a characteristic prefix letter.

From the InChI string, the standard InChI key code (2009 v 1.02 of the InChI software) is the version of the InChI string of the compound consisting of 27 alphanumeric characters. In the case of morphine it corresponds to:

InChI Key: BQJCRHHNABKAKU-KBQPJGBKSA-N

This 27-character alphanumeric string is the one that can be used with web search engines to easily find information on the compound, regardless of the language and synonyms of the molecule.

 

To conclude, I would just like to comment that the information about the 3D coordinates of the atoms is not represented in InChI.

How can I get the InChI and InChI key identifiers? (InChI calculator)

With the following application (JSME), you can draw chemical structures and interactively obtain the IUPAC codes (InChI and InChI key) of the molecule you want in a simple way.