Numerous chemical name searches turned into one, all in the same software solution

Olajide Azeez-Singer is the CEO of PADRYT Technologies. PADRYT Technologies, a patent information and software provider, develops innovative solutions across the patent value chain. PADRYT’s customer-oriented focus allows them to adapt their platform, Leonis, to support their customers’ unique data and workflow needs.

Streamline chemical research with PADRYT’s newest addition to their platform

Finding relevant patent documents is always a challenge, especially when it comes to research regarding chemical names. Not only do patent documents with chemical names contain several languages, but they can also feature variations in chemical notation formats such as SMILES, InChI, and/or InChIKey. To account for these variations, information professionals, researchers, and scientists hoping to leverage patent information have to create numerous and complex searches to find all the relevant documents needed for their research. PADRYT’s goal is to make this process easier. By automating a portion of the chemical research process, PADRYT can reduce the time needed to complete a chemical name search, allowing researchers and information specialists to leverage the strategic benefits of patent information and focus on developing their newest innovation.

Building from the IFI platform

PADRYT’s Leonis platform utilizes the OSCAR/OPSIN library for both search query interception and indexing documents. Since the OSCAR/OPSIN library mainly relies on chemical names in English, we greatly benefited from the work IFI CLAIMS has done with their English translations. By utilizing the English translations provided by IFI CLAIMS, PADRYT was able to easily integrate the OSCAR/OPSIN library into Leonis. While PADRYT currently works with chemical names in English, we are looking forward to supporting other languages in the future.

Why search query interception?

Search query interception is a fast and efficient way of helping researchers translate their search terms and extend them as they see fit. Researchers can use natural languages for chemical names searching and the platform will intercept the search if asked to do so. It is easy to extract any chemical names identified in the search terms and extend the search by adding SMILES, InChI and InChIKey before the search request is passed to our search engine.

Why OSCAR/OPSIN?

OPSIN stands for Open Parser for Systematic IUPAC Nomenclature. Developed at the University of Cambridge, it is an open source, freely available algorithm that interprets the majority of organic chemical nomenclature in a fast and precise manner. The approach uses basic grammar to guide tokenization, a potentially difficult problem when dealing with chemical names. After parsing the name an XML parse tree is constructed. It uses this XML parse tree to reconstruct the structure from the name in a step-by-step manner. One main reason for this approach is performance. The OSCAR/OPSIN library only takes a fraction of a second to extract the chemical names and extend them with SMILES, InChI and InChIKey. The results from OPSIN for various computer-generated name/structure pair sets show exceptionally high precision, 99.8% and up. When using general organic chemical nomenclature, high recall Is between 98.7 and 99.2%. With this approach, users have full control over how a search is interpreted.

The OSCAR/OPSIN library is also used to intercept document indexing, extract chemical names in claims, description, title, and abstract, and add SMILES, InChI, or InChIKey from the document before it is indexed in our customer patent archive. This software can serve as the basis for future open-source developments for chemical name interpretation, which we will discuss further in our next blog.

How it works

With the Leonis platform, activating search interception is as simple as clicking the checkbox on the search screen. Once activated, the Leonis search module will intercept a search request, extract the terms, and run them through the OSCAR/OPSIN library to analyze and identify the chemical names. The identified chemical names are then returned to the search module with their corresponding SMILES and InChI values. The final step is extending the search to pass through PADRYT’s search engine for processing. The processed search, extended search through PADRYT’s engine, and the results list are displayed to the user who can make any necessary modifications.

The Leonis platform provides quick, high-performance search query interception without the need for value extraction and complex processing while indexing documents. Search queries are transparent to the user, allowing you the flexibility of editing and reusing them as you see fit. You can create SMILES/InChI of chemical names for drawing chemical structures in the Chemical Structure tool of your choice. JSME, a free molecule editor, will be integrated in our next product release.

Conclusion

PADRYT Technologies actively seeks to incorporate open-source projects with a strong community to help build sophisticated software and libraries with the goal of solving real-world problems. Through PADRYT’s promotion of this work, they encourage others to contribute and help further the community’s understanding of how the effective use of this software can assist in finding solutions.