rdtools.featurizer#

This module contains functions for generating molecular fingerprints.

class rdtools.featurizer.AvalonGenerator(*args, **kwargs)#

Bases: Protocol

A protocol for the Avalon fingerprint generator.

static GetCountFingerprintAsNumPy(mol: Mol) ndarray[tuple[int, ...], dtype[int64]]#

Get the count fingerprint as a numpy array.

static GetFingerprintAsNumPy(mol: Mol) ndarray[tuple[int, ...], dtype[int64]]#

Get the fingerprint as a numpy array.

rdtools.featurizer.GetAvalonGenerator(fpSize: int = 512, *args: Any, **kwargs: Any) AvalonGenerator#

Get the Avalon fingerprint generator.

Parameters:
  • fpSize (int, optional) – The length of the fingerprint. Defaults to 512.

  • *args (Any) – Additional arguments for the generator.

  • **kwargs (Any) – Additional keyword arguments for the generator.

Returns:

AvalonGenerator – The Avalon fingerprint generator.

rdtools.featurizer.GetMACCSGenerator(*args: Any, **kwargs: Any) MACCSGenerator#

Get the MACCS fingerprint generator.

Parameters:
  • *args (Any) – Additional arguments for the generator.

  • **kwargs (Any) – Additional keyword arguments for the generator.

Returns:

MACCSGenerator – The MACCS fingerprint generator.

class rdtools.featurizer.MACCSGenerator(*args, **kwargs)#

Bases: Protocol

A protocol for the MACCS fingerprint generator.

static GetFingerprintAsNumPy(mol: Mol) ndarray[tuple[int, ...], dtype[int64]]#

Get the fingerprint as a numpy array.

rdtools.featurizer.get_fingerprint(mol: Mol, count: bool = False, fp_type: str = 'morgan', num_bits: int = 2048, dtype: str = 'int32', **kwargs: Any) ndarray[tuple[int, ...], dtype[int64]]#

A helper function for generating molecular fingerprints.

Please visit RDKit for more information. This function also supports fingerprint-specific arguments, please visit the above website and find GetXXXGenerator for the corresponding argument names and allowed value types.

Parameters:
  • mol (Mol) – The molecule to generate a fingerprint for.

  • count (bool, optional) – Whether to generate a count fingerprint. Default is False.

  • fp_type (str, optional) – The type of fingerprint to generate. Options are: 'atom_pair', 'morgan' (default), 'rdkit', 'topological_torsion', 'avalon', and 'maccs'.

  • num_bits (int, optional) – The length of the fingerprint. Default is 2048. It has no effect on 'maccs' generator.

  • dtype (str, optional) – The data type of the output numpy array. Defaults to 'int32'.

  • **kwargs (Any) – Additional arguments for the generator.

Returns:

npt.NDArray[np.int_] – A numpy array of the molecular fingerprint.

rdtools.featurizer.get_fingerprint_generator(fp_type: str = 'morgan', num_bits: int = 1024, count: bool = True, **kwargs: Any) Any#

Get the fingerprint generator for the specified type.

Parameters:
  • fp_type (str, optional) – The type of fingerprint to generate. Options are: 'atom_pair', 'morgan' (default), 'rdkit', 'topological_torsion', 'avalon', and 'maccs'.

  • num_bits (int, optional) – The length of the fingerprint. Default is 1024.

  • count (bool, optional) – Whether to generate a count fingerprint. Default is True.

  • **kwargs (Any) – Additional arguments for the generator.a

Returns:

Any – The fingerprint generator.

rdtools.featurizer.get_rxn_fingerprint(rmol: Mol, pmol: Mol, mode: str = 'REAC_DIFF', fp_type: str = 'morgan', count: bool = False, num_bits: int = 2048, **kwargs: Any) ndarray[tuple[int, ...], dtype[int64]]#

Generate reaction fingerprints.

based on the reactant molecule complex and the product molecule complex.

Parameters:
  • rmol (Mol) – the reactant complex molecule object

  • pmol (Mol) – the product complex molecule object

  • mode (str, optional) – The fingerprint combination of 'REAC' (reactant), 'PROD' (product), 'DIFF' (reactant - product), 'REVD' (product - reactant), 'SUM' (reactant + product), separated by '_'. Defaults to REAC_DIFF, with the fingerprint to be a concatenation of reactant fingerprint and the difference between the reactant complex and the product complex.

  • fp_type (str, optional) – The type of fingerprint to generate. Options are: 'atom_pair', 'morgan' (default), 'rdkit', 'topological_torsion', 'avalon', and 'maccs'.

  • count (bool, optional) – Whether to generate a count fingerprint. Default is False.

  • num_bits (int, optional) – The length of the molecular fingerprint. For a mode with N blocks, the eventual length is num_bits * N. Default is 2048. It has no effect on 'maccs' generator.

  • **kwargs (Any) – Additional arguments for the generator.

Returns:

npt.NDArray[np.int_] – A numpy array of the molecular fingerprint.

rdtools.featurizer.rdkit_vector_to_array(vector: ExplicitBitVect | UIntSparseIntVect, num_bits: int | None = None, dtype: str = 'int32') ndarray[tuple[int, ...], dtype[int64]]#

Convert a RDKit vector to a numpy array.

This function converts a RDKit rdkit.DataStructs.cDataStructs.ExplicitBitVect or rdkit.DataStructs.cDataStructs.UIntSparseIntVect vector to a numpy array.

Parameters:
  • vector (Union[DataStructs.ExplicitBitVect, DataStructs.UIntSparseIntVect]) – RDkit Vector generated from fingerprint algorithms.

  • num_bits (Optional[int], optional) – The length of the vector, defaults to None.

  • dtype (str, optional) – The data type of the output numpy array. Defaults to 'int32'.

Returns:

npt.NDArray[np.int_] – A numpy array of the vector.