Skip to content

generation

3D structure generation from molecular representations.

This module provides tools for generating 3D molecular structures from representations without explicit coordinates (e.g., SMILES, molecular graphs).

RDKitGeneration dataclass

Bases: StructureGeneration, RDKitMaker[InputType, OutputType]


              flowchart TD
              jfchemistry.generation.RDKitGeneration[RDKitGeneration]
              jfchemistry.generation.base.StructureGeneration[StructureGeneration]
              jfchemistry.core.makers.rdkit_maker.RDKitMaker[RDKitMaker]
              jfchemistry.core.makers.jfchem_maker.JFChemMaker[JFChemMaker]
              jfchemistry.core.makers.core_maker.CoreMaker[CoreMaker]

                              jfchemistry.generation.base.StructureGeneration --> jfchemistry.generation.RDKitGeneration
                
                jfchemistry.core.makers.rdkit_maker.RDKitMaker --> jfchemistry.generation.RDKitGeneration
                                jfchemistry.core.makers.jfchem_maker.JFChemMaker --> jfchemistry.core.makers.rdkit_maker.RDKitMaker
                                jfchemistry.core.makers.core_maker.CoreMaker --> jfchemistry.core.makers.jfchem_maker.JFChemMaker
                




              click jfchemistry.generation.RDKitGeneration href "" "jfchemistry.generation.RDKitGeneration"
              click jfchemistry.generation.base.StructureGeneration href "" "jfchemistry.generation.base.StructureGeneration"
              click jfchemistry.core.makers.rdkit_maker.RDKitMaker href "" "jfchemistry.core.makers.rdkit_maker.RDKitMaker"
              click jfchemistry.core.makers.jfchem_maker.JFChemMaker href "" "jfchemistry.core.makers.jfchem_maker.JFChemMaker"
              click jfchemistry.core.makers.core_maker.CoreMaker href "" "jfchemistry.core.makers.core_maker.CoreMaker"
            

Generate 3D structures using RDKit distance geometry methods.

This class uses RDKit's distance geometry embedding algorithms (ETKDG family) to generate 3D conformers from molecular graphs. The methods use distance bounds derived from experimental torsion angle preferences and small ring conformations to produce chemically reasonable 3D structures.

The implementation supports all RDKit embedding parameters, allowing fine control over the generation process including optimization settings, pruning criteria, and random seed control.

Units

Pass a float in the listed unit or a pint Quantity (e.g. jfchemistry.ureg or jfchemistry.Q_):

  • prune_rms_thresh: [Å]
ATTRIBUTE DESCRIPTION
name

Name of the job (default: "rdKit Generation").

TYPE: str

method

Distance geometry method to use: - "ETKDGv3": ETKDG version 3 (default, recommended) - "ETKDGv2": ETKDG version 2 - "ETKDG": Original ETKDG method - "ETDG": Experimental torsion distance geometry - "KDG": Basic distance geometry - "srETKDGv3": Small ring ETKDG version 3

TYPE: Literal['ETDG', 'ETKDG', 'ETKDGv2', 'ETKDGv3', 'KDG', 'srETKDGv3']

basin_thresh

Energy threshold for basin hopping (default: None). Accepts float.

TYPE: float | Quantity | None

bounds_mat_force_scaling

Scaling factor for distance bounds (default: None).

TYPE: float | None

box_size_mult

Box size multiplier for embedding (default: None).

TYPE: float | None

clear_confs

Clear existing conformers before embedding (default: None).

TYPE: bool | None

embed_fragments_separately

Embed molecular fragments separately (default: None).

TYPE: bool | None

enable_sequential_random_seeds

Use sequential random seeds (default: None).

TYPE: bool | None

enforce_chirality

Enforce stereochemistry from molecular graph (default: None).

TYPE: bool | None

force_trans_amides

Force amide bonds to trans configuration (default: None).

TYPE: bool | None

ignore_smoothing_failures

Continue if smoothing fails (default: None).

TYPE: bool | None

max_iterations

Maximum optimization iterations per conformer (default: None).

TYPE: Annotated[int, positive] | None

num_threads

Number of parallel threads for embedding (default: 1).

TYPE: Annotated[int, positive]

num_zero_fail

Number of failures before reporting (default: None).

TYPE: Annotated[int, positive] | None

only_heavy_atoms_for_rms

Use only heavy atoms for RMSD pruning (default: None).

TYPE: bool | None

optimizer_force_tol

Force tolerance for optimization (default: None).

TYPE: float | Quantity | None

prune_rms_thresh

RMSD threshold for conformer pruning [Å] (default: None). Accepts float or pint Quantity.

TYPE: float | Quantity | None

rand_neg_eig

Randomize negative eigenvalues (default: None).

TYPE: bool | None

random_seed

Random seed for reproducibility (default: None).

TYPE: Annotated[int, positive] | None

symmetrize_conjugated_terminal_groups_for_pruning

Symmetrize terminal groups for RMSD calculations (default: None).

TYPE: bool | None

timeout

Timeout [seconds] for embedding (default: None).

TYPE: Annotated[int, positive] | None

track_failures

Track embedding failures (default: None).

TYPE: bool | None

use_basic_knowledge

Use basic chemical knowledge (default: None).

TYPE: bool | None

use_exp_torsion_angle_prefs

Use experimental torsion preferences (default: None).

TYPE: bool | None

use_macrocycle_14config

Use 1,4 distance bounds for macrocycles (default: None).

TYPE: bool | None

use_macrocycle_torsions

Use macrocycle torsion preferences (default: None).

TYPE: bool | None

use_random_coords

Start from random coordinates (default: None).

TYPE: bool | None

use_small_ring_torsions

Use small ring torsion preferences (default: None).

TYPE: bool | None

use_symmetry_for_pruning

Use symmetry when pruning conformers (default: None).

TYPE: bool | None

num_conformers

Number of conformers to generate (default: 1).

TYPE: Annotated[int, positive]

Source code in jfchemistry/generation/rdkit_generation.py
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
@dataclass
class RDKitGeneration[InputType: RDMolMolecule, OutputType: RecursiveMoleculeList](
    StructureGeneration, RDKitMaker[InputType, OutputType]
):
    """Generate 3D structures using RDKit distance geometry methods.

    This class uses RDKit's distance geometry embedding algorithms (ETKDG family)
    to generate 3D conformers from molecular graphs. The methods use distance
    bounds derived from experimental torsion angle preferences and small ring
    conformations to produce chemically reasonable 3D structures.

    The implementation supports all RDKit embedding parameters, allowing fine
    control over the generation process including optimization settings, pruning
    criteria, and random seed control.

    Units:
        Pass a float in the listed unit or a pint Quantity (e.g. ``jfchemistry.ureg``
        or ``jfchemistry.Q_``):

        - prune_rms_thresh: [Å]

    Attributes:
        name: Name of the job (default: "rdKit Generation").
        method: Distance geometry method to use:
            - "ETKDGv3": ETKDG version 3 (default, recommended)
            - "ETKDGv2": ETKDG version 2
            - "ETKDG": Original ETKDG method
            - "ETDG": Experimental torsion distance geometry
            - "KDG": Basic distance geometry
            - "srETKDGv3": Small ring ETKDG version 3
        basin_thresh: Energy threshold for basin hopping (default: None). Accepts float.
        bounds_mat_force_scaling: Scaling factor for distance bounds (default: None).
        box_size_mult: Box size multiplier for embedding (default: None).
        clear_confs: Clear existing conformers before embedding (default: None).
        embed_fragments_separately: Embed molecular fragments separately (default: None).
        enable_sequential_random_seeds: Use sequential random seeds (default: None).
        enforce_chirality: Enforce stereochemistry from molecular graph (default: None).
        force_trans_amides: Force amide bonds to trans configuration (default: None).
        ignore_smoothing_failures: Continue if smoothing fails (default: None).
        max_iterations: Maximum optimization iterations per conformer (default: None).
        num_threads: Number of parallel threads for embedding (default: 1).
        num_zero_fail: Number of failures before reporting (default: None).
        only_heavy_atoms_for_rms: Use only heavy atoms for RMSD pruning (default: None).
        optimizer_force_tol: Force tolerance for optimization (default: None).
        prune_rms_thresh: RMSD threshold for conformer pruning [Å] (default: None). \
            Accepts float or pint Quantity.
        rand_neg_eig: Randomize negative eigenvalues (default: None).
        random_seed: Random seed for reproducibility (default: None).
        symmetrize_conjugated_terminal_groups_for_pruning: Symmetrize terminal groups
            for RMSD calculations (default: None).
        timeout: Timeout [seconds] for embedding (default: None).
        track_failures: Track embedding failures (default: None).
        use_basic_knowledge: Use basic chemical knowledge (default: None).
        use_exp_torsion_angle_prefs: Use experimental torsion preferences (default: None).
        use_macrocycle_14config: Use 1,4 distance bounds for macrocycles (default: None).
        use_macrocycle_torsions: Use macrocycle torsion preferences (default: None).
        use_random_coords: Start from random coordinates (default: None).
        use_small_ring_torsions: Use small ring torsion preferences (default: None).
        use_symmetry_for_pruning: Use symmetry when pruning conformers (default: None).
        num_conformers: Number of conformers to generate (default: 1).

    """

    # Name of the job
    name: str = "rdKit Generation"
    # Number of conformers to generate
    num_conformers: Annotated[int, "positive"] = field(
        default=1,
        metadata={"description": "the number of conformers to generate"},
    )
    # Method to use for generating the structure
    method: Literal["ETDG", "ETKDG", "ETKDGv2", "ETKDGv3", "KDG", "srETKDGv3"] = field(
        default="ETKDGv3",
        metadata={"description": "The method to use for generating the structure"},
    )
    basin_thresh: Optional[float | Quantity] = field(
        default=None,
        metadata={
            "description": "set the basin threshold for the DGeom force field. \
                Accepts float or pint Quantity.",
            "unit": "kcal/mol",
        },
    )
    bounds_mat_force_scaling: Optional[float] = field(
        default=None,
        metadata={
            "description": "scale the weights of the atom pair distance restraints relative \
                to the other types of restraints."
        },
    )
    box_size_mult: Optional[float] = field(
        default=None,
        metadata={"description": "determines the size of the box used for random coordinates"},
    )
    clear_confs: Optional[bool] = field(
        default=None, metadata={"description": "clear existing conformers"}
    )
    embed_fragments_separately: Optional[bool] = field(
        default=None, metadata={"description": "embed fragments separately"}
    )
    enable_sequential_random_seeds: Optional[bool] = field(
        default=None,
        metadata={
            "description": "handle random number seeds so that \
                conformer generation can be restarted"
        },
    )
    enforce_chirality: Optional[bool] = field(
        default=None,
        metadata={"description": "enforce correct chirality if chiral centers are present"},
    )
    force_trans_amides: Optional[bool] = field(
        default=None, metadata={"description": "force trans amide bonds"}
    )
    ignore_smoothing_failures: Optional[bool] = field(
        default=None,
        metadata={
            "description": "try and embed the molecule if if triangle smoothing of the \
                bounds matrix fails"
        },
    )
    max_iterations: Optional[Annotated[int, "positive"]] = field(
        default=None,
        metadata={"description": "the maximum number of iterations to perform for each conformer"},
    )
    num_threads: Annotated[int, "positive"] = field(
        default=1,
        metadata={"description": "the number of threads to use for the embedding"},
    )
    num_zero_fail: Optional[Annotated[int, "positive"]] = field(
        default=None,
        metadata={"description": "fail embedding if we have at least this many zero eigenvalues"},
    )
    only_heavy_atoms_for_rms: Optional[bool] = field(
        default=None,
        metadata={"description": "Only consider heavy atoms when doing RMS filtering"},
    )
    optimizer_force_tol: Optional[float | Quantity] = field(
        default=None,
        metadata={
            "description": "the force tolerance during distance-geometry minimization. \
                Accepts float or pint Quantity.",
            "unit": "eV/Å",
        },
    )
    prune_rms_thresh: Optional[float | Quantity] = field(
        default=None,
        metadata={
            "description": "keep only conformations at least this far apart [Å]. \
                Accepts float in [Å] or pint Quantity.",
            "unit": "Å",
        },
    )
    rand_neg_eig: Optional[bool] = field(
        default=None,
        metadata={
            "description": "if the embedding yields a negative eigenvalue, pick coordinates that \
                correspond to this component at random"
        },
    )
    random_seed: Optional[Annotated[int, "positive"]] = field(
        default=None,
        metadata={
            "description": "if the embedding yields a negative eigenvalue, pick coordinates that \
                correspond to this component at random"
        },
    )
    symmetrize_conjugated_terminal_groups_for_pruning: Optional[bool] = field(
        default=None,
        metadata={"description": "symmetrize terminal groups for RMSD pruning"},
    )
    timeout: Optional[Annotated[int, "positive"]] = field(
        default=None,
        metadata={
            "description": "maximum time [seconds] to generate a conformer for a \
                single molecule fragment. If set to 0, no timeout is set"
        },
    )
    track_failures: Optional[bool] = field(
        default=None,
        metadata={"description": "keep track of which checks during the embedding process fail"},
    )
    use_basic_knowledge: Optional[bool] = field(
        default=None,
        metadata={
            "description": "impose basic-knowledge constraints \
            such as flat rings"
        },
    )
    use_exp_torsion_angle_prefs: Optional[bool] = field(
        default=None,
        metadata={"description": "impose experimental torsion angle preferences"},
    )
    use_macrocycle_14config: Optional[bool] = field(
        default=None,
        metadata={"description": "use the 1-4 distance bounds from ETKDGv3"},
    )
    use_macrocycle_torsions: Optional[bool] = field(
        default=None,
        metadata={"description": "impose macrocycle torsion angle preferences"},
    )
    use_random_coords: Optional[bool] = field(
        default=None,
        metadata={
            "description": "start the embedding from random coordinates \
                instead of using eigenvalues of the distance matrix"
        },
    )
    use_small_ring_torsions: Optional[bool] = field(
        default=None,
        metadata={"description": "impose small ring torsion angle preferences"},
    )
    use_symmetry_for_pruning: Optional[bool] = field(
        default=None,
        metadata={
            "description": "use molecule symmetry when doing the RMSD pruning. \
                Note that this option automatically also sets onlyHeavyAtomsForRMS to true."
        },
    )

    def __post_init__(self):
        """Normalize unit-bearing attributes."""
        if self.prune_rms_thresh is not None and isinstance(self.prune_rms_thresh, Quantity):
            object.__setattr__(
                self, "prune_rms_thresh", to_magnitude(self.prune_rms_thresh, "angstrom")
            )
        if self.basin_thresh is not None and isinstance(self.basin_thresh, Quantity):
            object.__setattr__(self, "basin_thresh", self.basin_thresh.magnitude)
        if self.optimizer_force_tol is not None and isinstance(self.optimizer_force_tol, Quantity):
            object.__setattr__(self, "optimizer_force_tol", self.optimizer_force_tol.magnitude)
        super().__post_init__()

    def _operation(
        self, input: InputType, **kwargs
    ) -> tuple[OutputType | list[OutputType], Properties | list[Properties] | None]:
        """Generate 3D structure(s) using RDKit distance geometry embedding.

        Embeds 3D coordinates into the molecule using the specified ETKDG method
        and parameters. Automatically configures the RDKit embedding parameters
        from the instance attributes and generates the requested number of conformers.

        The method:
        1. Creates RDKit embedding parameters object for the selected method
        2. Transfers all non-None attributes to the parameters
        3. Embeds conformers using distance geometry
        4. Converts conformers to Pymatgen Molecule objects
        5. Sets charge and spin multiplicity based on formal charge

        Args:
            input: RDKit molecule without 3D coordinates.
            **kwargs: Additional kwargs to pass to the operation.

        Returns:
            Tuple containing:
                - List of Pymatgen Molecule objects with 3D coordinates
                - Empty dictionary (no additional properties)

        Examples:
            >>> from rdkit import Chem # doctest: +SKIP
            >>> from jfchemistry import RDMolMolecule # doctest: +SKIP
            >>> from jfchemistry.generation import RDKitGeneration # doctest: +SKIP
            >>>
            >>> # Create molecule from SMILES
            >>> mol = Chem.MolFromSmiles("CCO") # doctest: +SKIP
            >>> rdmol = RDMolMolecule(Chem.AddHs(mol)) # doctest: +SKIP
            >>>
            >>> # Generate 10 conformers
            >>> gen = RDKitGeneration(num_conformers=10, random_seed=42) # doctest: +SKIP
            >>> structures, props = gen.operation(rdmol) # doctest: +SKIP
            >>> print(f"Generated {len(structures)} conformers") # doctest: +SKIP
        """
        params = getattr(rdDistGeom, self.method)()
        param_keys = [x[0] for x in inspect.getmembers(params)]
        for key, value in vars(self).items():

            def _to_camel_case(s: str) -> str:
                parts = s.split("_")
                return parts[0] + "".join(word.capitalize() for word in parts[1:])

            camel_key = _to_camel_case(key)
            if camel_key not in param_keys or value is None:
                continue
            setattr(params, camel_key, value)
        rdDistGeom.EmbedMultipleConfs(input, self.num_conformers, params)
        molecules: list[Molecule] = []
        for confId in range(input.GetNumConformers()):
            molecule: Molecule = Molecule.from_str(
                rdmolfiles.MolToXYZBlock(input, confId=int(confId)),
                fmt="xyz",
            )
            molecule.to("input.xyz", fmt="xyz")
            charge = rdmolops.GetFormalCharge(input)
            spin = int(2 * (abs(charge) // 2) + 1)
            molecule.set_charge_and_spin(charge, spin)
            molecules.append(molecule)
        if self.num_conformers == 1:
            return cast("OutputType", molecules[0]), Properties()
        else:
            return cast("OutputType", molecules), Properties()

make

make(input: InputType | list[InputType], **kwargs) -> Response[_output_model]

Create a workflow job for processing structure(s).

Automatically handles job distribution for lists of structures. Each structure in a list is processed as a separate job for parallel execution.

PARAMETER DESCRIPTION
input

Single Pymatgen SiteCollection or list of SiteCollections.

TYPE: InputType | list[InputType]

**kwargs

Additional kwargs to pass to the operation.

DEFAULT: {}

RETURNS DESCRIPTION
Response[_output_model]

Response containing: - structure: Processed structure(s) - files: XYZ format file(s) of the structure(s) - properties: Computed properties from the operation

Examples:

>>> from jfchemistry.conformers import CRESTConformers
>>> from pymatgen.core import Molecule
>>> molecule = Molecule.from_ase_atoms(molecule("C2H6"))
>>> # Generate conformers
>>> conformer_gen = CRESTConformers(ewin=6.0)
>>> job = conformer_gen.make(input)
Source code in jfchemistry/core/makers/jfchem_maker.py
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
@jfchem_job()
def make(
    self,
    input: InputType | list[InputType],
    **kwargs,
) -> Response[_output_model]:
    """Create a workflow job for processing structure(s).

    Automatically handles job distribution for lists of structures. Each
    structure in a list is processed as a separate job for parallel execution.

    Args:
        input: Single Pymatgen SiteCollection or list of SiteCollections.
        **kwargs: Additional kwargs to pass to the operation.

    Returns:
        Response containing:
            - structure: Processed structure(s)
            - files: XYZ format file(s) of the structure(s)
            - properties: Computed properties from the operation

    Examples:
        >>> from jfchemistry.conformers import CRESTConformers # doctest: +SKIP
        >>> from pymatgen.core import Molecule # doctest: +SKIP
        >>> molecule = Molecule.from_ase_atoms(molecule("C2H6")) # doctest: +SKIP
        >>> # Generate conformers
        >>> conformer_gen = CRESTConformers(ewin=6.0) # doctest: +SKIP
        >>> job = conformer_gen.make(input) # doctest: +SKIP
    """
    return self._run_job(input, **kwargs)