Skip to content
cschaerfe edited this page Jul 8, 2015 · 3 revisions

CADDSuite provides tools for accessing an implementation of a molecular database, for short simply called MolDB, that allows to consistently and efficiently store molecules and additional information about them, including docking or rescoring results, that can be used to filter the database. For enhanced import and export speed it is advisable to store information about topologies and conformations in binary form in MolDB. Furthermore, MolDB automatically keeps tracks of all conformations and topologies that already exist in the data base, thus preventing accumulation of redundant information.

MolDB by default uses MySql, but since the implementation is realized with the help QtSql, the data base engine can quickly be switched to any other engine supported by QtSql.

CADDSuite provides three end-user tools to create and utilize a MolDB: DBImporter stores molecules and additional chemical information about them, which it generates automatically, in a MolDB. DBExporter allows to filter a MolDB for compounds for a variety of criteria and export them to molecular files. VendorFinder can search for compound vendors for given molecules (if the MolDB to used has been setup with the necessary information; see below).

Importing molecules into MolDB

Molecules can be imported into a MolDB using the tool DBImporter. If you do not yet have a MolDB, just supply the desired name of your name data base and make sure that the MySql user that is specified to DBImporter has the permission to create new data bases; in this case, DBImporter will then automatically create a new MolDB and import the molecules found in the given input files into it afterwards.

When molecules are imported by use of DBImporter, a variety of additional chemical information is automatically generated for each compound and stored in the data base (along with topologies and conformations) that will later enable fast and easy filtering of the data base:

  • molecular weight
  • canonical smile
  • logP
  • a binary pathway-based fingerprint
  • functional group counts
  • the unique chemical key (UCK)

Furthermore, all property tags existing in the input file are saved in the data base and will automatically be reattached to molecules exported from a MolDB.

When starting DBImporter, only a molecule file and the data base information (data base name, user, host, password) need to be specified. Optionally, a data set name describing the compounds to be imported can be specified to DBImporter. This description is just intended for the user's convenience; it will make it very easy to export exactly the same set of molecules from the data base later. On the command line, running DBImporter could thus look like this:

BALL/build/bin/TOOLS/DBImporter -i compounds.sdf -u mysqluser -d myDataBaseName -h localhost -p mysqlPassword

If you want to create a data base containing purchasable compounds (so that VendorFinder can later be used with this data base), then you have to specify, along with the molecule file provided by the respective vendor, the vendor's name, a description of the vendor's library version or date (for your own reference only), the URL from which the library was obtained (again, just for your own bookkeeping) and the name of the property tag which contains the vendor's own ID for each compound.

An example for this:

BALL/build/bin/TOOLS/DBImporter -i compounds.sdf -u mysqluser -d myDataBaseName -h localhost -p mysqlPassword -vn Asinex -vid ID -vd 2011-09 
    -vu "http://www.asinex.com/Download/Asinex_Gold_Collection1.zip"`

Exporting molecules from MolDB

Molecules can be searched in a MolDB by DBExporter by a number of criteria:

  • minimal and maximal logP
  • minimal and maximal molecular weight
  • minimal and maximal conformation IDs
  • minimal and maximal similarity to compounds in a given query data set
  • unique chemical key (UCK)
  • SMARTS (i.e. substructure) search
  • data set description as used with DBImporter

At least one of these criteria has be used. If more than one criterium is employed, the intersection, i.e. compounds that pass all filter criteria, will be exported.

The following example shows the use of DBExporter with filtering for compound with: -2 < logP < 2, 100 g/mol < molecular weight < 500 g/mol, and minimal and maximal similarity to compounds in given query data set of 0.75 and 0.95, respectively.

On the command line, this example could look like this:

BALL/build/bin/TOOLS/DBExporter -u mysqluser -d myDataBaseName -h localhost -p mysqlPassword -min_logP -2 -max_logP 2 -min_MW 100 
   -max_MW 500 -min_sim 0.75 -max_sim 0.95 -q hivpr-bindingdb.sdf -o db-export.sdf

Searching for compound vendors

If a MolDB has been setup with libraries obtained from a number of compound vendors as described above, the tool VendorFinder can be used to search for compound vendors for a set of given molecules.

As input, VendorFinder only needs a molecule file and the usual MySql data base information (data base name, user, host, password). As output VendorFinder generates a list containing the vendor name, the vendor's ID and the molecular weight for each compound that was found in the data base.

Clone this wiki locally