MoleculeNet¶
The DeepChem library is packaged alongside the MoleculeNet suite of datasets.
One of the most important parts of machine learning applications is finding a suitable dataset.
The MoleculeNet suite has curated a whole range of datasets and loaded them into DeepChem
dc.data.Dataset objects for convenience.
Contributing a new dataset to MoleculeNet¶
If you are proposing a new dataset to be included in the MoleculeNet benchmarking suite, please follow the instructions below. Please review the datasets already available in MolNet before contributing.
Read the Contribution guidelines.
Open an issue to discuss the dataset you want to add to MolNet.
Write a DatasetLoader class that inherits from deepchem.molnet.load_function.molnet_loader._MolnetLoader and implements a create_dataset method. See the _QM9Loader for a simple example.
Write a load_dataset function that documents the dataset and add your load function to deepchem.molnet.__init__.py for easy importing.
Prepare your dataset as a .tar.gz or .zip file. Accepted filetypes include CSV, JSON, and SDF.
Ask a member of the technical steering committee to add your .tar.gz or .zip file to the DeepChem AWS bucket. Modify your load function to pull down the dataset from AWS.
Add documentation for your loader to the MoleculeNet docs.
Submit a [WIP] PR (Work in progress pull request) following the PR template.
BACE Dataset¶
BBBC Datasets¶
BBBP Datasets¶
BBBP stands for Blood-Brain-Barrier Penetration
Cell Counting Datasets¶
Chembl Datasets¶
Chembl25 Datasets¶
Clearance Datasets¶
Clintox Datasets¶
Delaney Datasets¶
Factors Datasets¶
HIV Datasets¶
HOPV Datasets¶
HOPV stands for the Harvard Organic Photovoltaic Dataset.
HPPB Datasets¶
KAGGLE Datasets¶
Kinase Datasets¶
Lipo Datasets¶
Materials Datasets¶
Materials datasets include inorganic crystal structures, chemical compositions, and target properties like formation energies and band gaps. Machine learning problems in materials science commonly include predicting the value of a continuous (regression) or categorical (classification) property of a material based on its chemical composition or crystal structure. “Inverse design” is also of great interest, in which ML methods generate crystal structures that have a desired property. Other areas where ML is applicable in materials include: discovering new or modified phenomenological models that describe material behavior