Molecule Recognition

The Molecule Recognition, or image to mol, task involves the process of turning an image of a chemical diagram into a computer processable format. Molecule diagrams are widely used to illustrate connectivity of various atoms and bonds in real life molecular structures. The way they are drawn can be complicated and in many cases, it is not straightforward to interpret molecule images even visually.

For example, the figure shown below shows a molecule image with its corresponding MOL file. The shape between the O, N and Br atoms is actually a hexagon with two pentagons. Every pentagon shares 3 sides with the hexagon and 2 sides with the other pentagon. Now this may be slightly confusing, especially when you discover that vertical line in the middle and the diagonal line going towards the O atom are not actually connected. That is called a bridge bond and therefore this pattern is sort of 2 1/2 dimensional!

Image to MOL file

Molecule images:

Notice that there is a huge number of existing molecules and the number keeps increasing by thousands every single day. Therefore, the complexity of molecule images can vary from simple such as this, to moderate such as this or complex such as this and this.


Bond Types

Single Bond single bond Double Bond double bond Triple Bond triple bond
Wedge Bond wedge bond Bold Bond bold bond Hollow Wedge Bond hollow wedge bond
Dashed Wedge Bond dashed wedge bond Dashed Bold Bond dashed bold bond Dashed Bond dashed bond
Wavy Bond wavy bond Dative Bond dative bond Aromatic Ring aromatic ring
Open Bridge open bridge Closed Bridge closed bridge