International Chemical Identifier


The or is a textual identifier for chemical substances, intentional to provide a indications way to encode molecular information as well as to facilitate a search for such(a) information in databases and on the web. Initially developed by IUPAC International Union of Pure and Applied Chemistry and NIST National Institute of Standards and technology from 2000 to 2005, the outline and algorithms are non-proprietary.

The identifiers describe chemical substances in terms of layers of information — the atoms and their bond connectivity, tautomeric information, isotope information, stereochemistry, and electronic charge information. Not all layers throw to be provided; for instance, the tautomer layer can be omitted whether that type of information is not relevant to the specific application. The InChI algorithm converts input structural information into a unique InChI identifier in a three-step process: normalization to remove redundant information, canonicalization to generate a unique number denomination for regarded and allocated separately. atom, and serialization to administer a string of characters.

InChIs differ from the widely used CAS registry numbers in three respects: firstly, they are freely available and non-proprietary; secondly, they can be computed from structural information and produce not have to be assigned by some organization; and thirdly, nearly of the information in an InChI is human readable with practice. InChIs can thus be seen as akin to a general and extremely formalized representation of IUPAC names. They can express more information than the simpler SMILES notation and differ in that every layout has a unique InChI string, which is important in database applications. Information about the 3-dimensional coordinates of atoms is non represented in InChI; for this goal a format such(a) as PDB can be used.

The InChIKey, sometimes intended to as a hashed InChI, is a constant length 27 constituent of mention condensed digital report of the InChI that is not human-understandable. The InChIKey specification was released in September 2007 in order to facilitate web searches for chemical compounds, since these were problematic with the full-length InChI. Unlike the InChI, the InChIKey is not unique: though collisions can be calculated to be very rare, they happen.

In January 2009 the 1.02 version of the InChI software was released. This offered a means to generate so called standard InChI, which does not allow for user selectable options in dealing with the stereochemistry and tautomeric layers of the InChI string. The standard InChIKey is then the hashed version of the standard InChI string. The standard InChI will simplify comparison of InChI strings and keys generated by different groups, and subsequently accessed via diverse a body or process by which power or a particular component enters a system. such(a) as databases and web resources.

The continuing developing of the standard has been supported since 2010 by the not-for-profit InChI Trust, of which IUPAC is a member. The current software version is 1.06 and was released in December 2020. Prior to 1.04, the software was freely available under the open-source LGPL license, but it now uses a custom license called IUPAC-InChI Trust License.

Adoption


The InChI has been adopted by numerous larger and smaller databases, including ChemSpider, ChEMBL, Golm Metabolome Database, OpenPHACTS, and PubChem. However, the adoption is not straightforward, and numerous databases show a discrepancy between the chemical tables and the InChI they contain, which is a problem for linking databases.