rdtools.featurizer#
This module contains functions for generating molecular fingerprints.
- class rdtools.featurizer.AvalonGenerator(*args, **kwargs)#
Bases:
Protocol
A protocol for the Avalon fingerprint generator.
- static GetCountFingerprintAsNumPy(mol: Mol) ndarray[tuple[int, ...], dtype[int64]] #
Get the count fingerprint as a numpy array.
- static GetFingerprintAsNumPy(mol: Mol) ndarray[tuple[int, ...], dtype[int64]] #
Get the fingerprint as a numpy array.
- rdtools.featurizer.GetAvalonGenerator(fpSize: int = 512, *args: Any, **kwargs: Any) AvalonGenerator #
Get the Avalon fingerprint generator.
- Parameters:
fpSize (int, optional) – The length of the fingerprint. Defaults to
512
.*args (Any) – Additional arguments for the generator.
**kwargs (Any) – Additional keyword arguments for the generator.
- Returns:
AvalonGenerator – The Avalon fingerprint generator.
- rdtools.featurizer.GetMACCSGenerator(*args: Any, **kwargs: Any) MACCSGenerator #
Get the MACCS fingerprint generator.
- Parameters:
*args (Any) – Additional arguments for the generator.
**kwargs (Any) – Additional keyword arguments for the generator.
- Returns:
MACCSGenerator – The MACCS fingerprint generator.
- class rdtools.featurizer.MACCSGenerator(*args, **kwargs)#
Bases:
Protocol
A protocol for the MACCS fingerprint generator.
- static GetFingerprintAsNumPy(mol: Mol) ndarray[tuple[int, ...], dtype[int64]] #
Get the fingerprint as a numpy array.
- rdtools.featurizer.get_fingerprint(mol: Mol, count: bool = False, fp_type: str = 'morgan', num_bits: int = 2048, dtype: str = 'int32', **kwargs: Any) ndarray[tuple[int, ...], dtype[int64]] #
A helper function for generating molecular fingerprints.
Please visit RDKit for more information. This function also supports fingerprint-specific arguments, please visit the above website and find
GetXXXGenerator
for the corresponding argument names and allowed value types.- Parameters:
mol (Mol) – The molecule to generate a fingerprint for.
count (bool, optional) – Whether to generate a count fingerprint. Default is
False
.fp_type (str, optional) – The type of fingerprint to generate. Options are:
'atom_pair'
,'morgan'
(default),'rdkit'
,'topological_torsion'
,'avalon'
, and'maccs'
.num_bits (int, optional) – The length of the fingerprint. Default is
2048
. It has no effect on'maccs'
generator.dtype (str, optional) – The data type of the output numpy array. Defaults to
'int32'
.**kwargs (Any) – Additional arguments for the generator.
- Returns:
npt.NDArray[np.int_] – A numpy array of the molecular fingerprint.
- rdtools.featurizer.get_fingerprint_generator(fp_type: str = 'morgan', num_bits: int = 1024, count: bool = True, **kwargs: Any) Any #
Get the fingerprint generator for the specified type.
- Parameters:
fp_type (str, optional) – The type of fingerprint to generate. Options are:
'atom_pair'
,'morgan'
(default),'rdkit'
,'topological_torsion'
,'avalon'
, and'maccs'
.num_bits (int, optional) – The length of the fingerprint. Default is
1024
.count (bool, optional) – Whether to generate a count fingerprint. Default is
True
.**kwargs (Any) – Additional arguments for the generator.a
- Returns:
Any – The fingerprint generator.
- rdtools.featurizer.get_rxn_fingerprint(rmol: Mol, pmol: Mol, mode: str = 'REAC_DIFF', fp_type: str = 'morgan', count: bool = False, num_bits: int = 2048, **kwargs: Any) ndarray[tuple[int, ...], dtype[int64]] #
Generate reaction fingerprints.
based on the reactant molecule complex and the product molecule complex.
- Parameters:
rmol (Mol) – the reactant complex molecule object
pmol (Mol) – the product complex molecule object
mode (str, optional) – The fingerprint combination of
'REAC'
(reactant),'PROD'
(product),'DIFF'
(reactant - product),'REVD'
(product - reactant),'SUM'
(reactant + product), separated by'_'
. Defaults toREAC_DIFF
, with the fingerprint to be a concatenation of reactant fingerprint and the difference between the reactant complex and the product complex.fp_type (str, optional) – The type of fingerprint to generate. Options are:
'atom_pair'
,'morgan'
(default),'rdkit'
,'topological_torsion'
,'avalon'
, and'maccs'
.count (bool, optional) – Whether to generate a count fingerprint. Default is
False
.num_bits (int, optional) – The length of the molecular fingerprint. For a mode with N blocks, the eventual length is
num_bits * N
. Default is2048
. It has no effect on'maccs'
generator.**kwargs (Any) – Additional arguments for the generator.
- Returns:
npt.NDArray[np.int_] – A numpy array of the molecular fingerprint.
- rdtools.featurizer.rdkit_vector_to_array(vector: ExplicitBitVect | UIntSparseIntVect, num_bits: int | None = None, dtype: str = 'int32') ndarray[tuple[int, ...], dtype[int64]] #
Convert a RDKit vector to a numpy array.
This function converts a RDKit
rdkit.DataStructs.cDataStructs.ExplicitBitVect
orrdkit.DataStructs.cDataStructs.UIntSparseIntVect
vector to a numpy array.- Parameters:
vector (Union[DataStructs.ExplicitBitVect, DataStructs.UIntSparseIntVect]) – RDkit Vector generated from fingerprint algorithms.
num_bits (Optional[int], optional) – The length of the vector, defaults to
None
.dtype (str, optional) – The data type of the output numpy array. Defaults to
'int32'
.
- Returns:
npt.NDArray[np.int_] – A numpy array of the vector.