rdtools.featurizer#
This module contains functions for generating molecular fingerprints.
- class rdtools.featurizer.AvalonGenerator(*args, **kwargs)#
Bases:
ProtocolA protocol for the Avalon fingerprint generator.
- static GetCountFingerprintAsNumPy(mol: Mol) ndarray[tuple[int, ...], dtype[int64]]#
Get the count fingerprint as a numpy array.
- static GetFingerprintAsNumPy(mol: Mol) ndarray[tuple[int, ...], dtype[int64]]#
Get the fingerprint as a numpy array.
- rdtools.featurizer.GetAvalonGenerator(fpSize: int = 512, *args: Any, **kwargs: Any) AvalonGenerator#
Get the Avalon fingerprint generator.
- Parameters:
fpSize (int, optional) – The length of the fingerprint. Defaults to
512.*args (Any) – Additional arguments for the generator.
**kwargs (Any) – Additional keyword arguments for the generator.
- Returns:
AvalonGenerator – The Avalon fingerprint generator.
- rdtools.featurizer.GetMACCSGenerator(*args: Any, **kwargs: Any) MACCSGenerator#
Get the MACCS fingerprint generator.
- Parameters:
*args (Any) – Additional arguments for the generator.
**kwargs (Any) – Additional keyword arguments for the generator.
- Returns:
MACCSGenerator – The MACCS fingerprint generator.
- class rdtools.featurizer.MACCSGenerator(*args, **kwargs)#
Bases:
ProtocolA protocol for the MACCS fingerprint generator.
- static GetFingerprintAsNumPy(mol: Mol) ndarray[tuple[int, ...], dtype[int64]]#
Get the fingerprint as a numpy array.
- rdtools.featurizer.get_fingerprint(mol: Mol, count: bool = False, fp_type: str = 'morgan', num_bits: int = 2048, dtype: str = 'int32', **kwargs: Any) ndarray[tuple[int, ...], dtype[int64]]#
A helper function for generating molecular fingerprints.
Please visit RDKit for more information. This function also supports fingerprint-specific arguments, please visit the above website and find
GetXXXGeneratorfor the corresponding argument names and allowed value types.- Parameters:
mol (Mol) – The molecule to generate a fingerprint for.
count (bool, optional) – Whether to generate a count fingerprint. Default is
False.fp_type (str, optional) – The type of fingerprint to generate. Options are:
'atom_pair','morgan'(default),'rdkit','topological_torsion','avalon', and'maccs'.num_bits (int, optional) – The length of the fingerprint. Default is
2048. It has no effect on'maccs'generator.dtype (str, optional) – The data type of the output numpy array. Defaults to
'int32'.**kwargs (Any) – Additional arguments for the generator.
- Returns:
npt.NDArray[np.int_] – A numpy array of the molecular fingerprint.
- rdtools.featurizer.get_fingerprint_generator(fp_type: str = 'morgan', num_bits: int = 1024, count: bool = True, **kwargs: Any) Any#
Get the fingerprint generator for the specified type.
- Parameters:
fp_type (str, optional) – The type of fingerprint to generate. Options are:
'atom_pair','morgan'(default),'rdkit','topological_torsion','avalon', and'maccs'.num_bits (int, optional) – The length of the fingerprint. Default is
1024.count (bool, optional) – Whether to generate a count fingerprint. Default is
True.**kwargs (Any) – Additional arguments for the generator.a
- Returns:
Any – The fingerprint generator.
- rdtools.featurizer.get_rxn_fingerprint(rmol: Mol, pmol: Mol, mode: str = 'REAC_DIFF', fp_type: str = 'morgan', count: bool = False, num_bits: int = 2048, **kwargs: Any) ndarray[tuple[int, ...], dtype[int64]]#
Generate reaction fingerprints.
based on the reactant molecule complex and the product molecule complex.
- Parameters:
rmol (Mol) – the reactant complex molecule object
pmol (Mol) – the product complex molecule object
mode (str, optional) – The fingerprint combination of
'REAC'(reactant),'PROD'(product),'DIFF'(reactant - product),'REVD'(product - reactant),'SUM'(reactant + product), separated by'_'. Defaults toREAC_DIFF, with the fingerprint to be a concatenation of reactant fingerprint and the difference between the reactant complex and the product complex.fp_type (str, optional) – The type of fingerprint to generate. Options are:
'atom_pair','morgan'(default),'rdkit','topological_torsion','avalon', and'maccs'.count (bool, optional) – Whether to generate a count fingerprint. Default is
False.num_bits (int, optional) – The length of the molecular fingerprint. For a mode with N blocks, the eventual length is
num_bits * N. Default is2048. It has no effect on'maccs'generator.**kwargs (Any) – Additional arguments for the generator.
- Returns:
npt.NDArray[np.int_] – A numpy array of the molecular fingerprint.
- rdtools.featurizer.rdkit_vector_to_array(vector: ExplicitBitVect | UIntSparseIntVect, num_bits: int | None = None, dtype: str = 'int32') ndarray[tuple[int, ...], dtype[int64]]#
Convert a RDKit vector to a numpy array.
This function converts a RDKit
rdkit.DataStructs.cDataStructs.ExplicitBitVectorrdkit.DataStructs.cDataStructs.UIntSparseIntVectvector to a numpy array.- Parameters:
vector (Union[DataStructs.ExplicitBitVect, DataStructs.UIntSparseIntVect]) – RDkit Vector generated from fingerprint algorithms.
num_bits (Optional[int], optional) – The length of the vector, defaults to
None.dtype (str, optional) – The data type of the output numpy array. Defaults to
'int32'.
- Returns:
npt.NDArray[np.int_] – A numpy array of the vector.