diff --git a/README.md b/README.md index 9962ee54..62720c0b 100644 --- a/README.md +++ b/README.md @@ -246,6 +246,7 @@ These are currently used to compute properties of a minimum energy conformation | `OpenFF Aniline Para Hessian v1.0` | [2024-10-07-OpenFF-Aniline-Para-Hessian-v1.0](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2024-10-07-OpenFF-Aniline-Para-Hessian-v1.0) | Hessian single points for the final molecules in the `OpenFF Aniline Para Opt v1.0` [dataset](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2021-04-02-OpenFF-Aniline-Para-Opt-v1.0) | 'O', 'Cl', 'S', 'Br', 'H', 'F', 'N', 'C' || |`OpenFF Gen2 Hessian Dataset Protomers v1.0` | [2024-10-07-OpenFF-Gen2-Hessian-Dataset-Protomers-v1.0](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2024-10-07-OpenFF-Gen2-Hessian-Dataset-Protomers-v1.0/) | Hessian single points for the final molecules in the `OpenFF Gen2 Optimization Dataset Protomers v1.0` [dataset](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2021-12-21-OpenFF-Gen2-Optimization-Set-Protomers) | 'H', 'C', 'Cl', 'P', 'F', 'Br', 'O', 'N', 'S'|| | `MLPepper-RECAP-Optimized-Fragments-Add-Iodines-v1.0` | [2024-10-11-MLPepper-RECAP-Optimized-Fragments-Add-Iodines-v1.0](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2024-10-11-MLPepper-RECAP-Optimized-Fragments-Add-Iodines-v1.0) | Set of diverse iodine containing molecules with a number of calculated electrostatic properties. | Br, Cl, S, B, O, Si, C, N, I, P, H, F| | +| `OpenFF Iodine Chemistry Hessian Dataset v1.0` | [2024-11-11-OpenFF-Iodine-Chemistry-Hessian-Dataset-v1.0](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2024-11-11-OpenFF-Iodine-Chemistry-Hessian-Dataset-v1.0) | Hessian single points for the final molecules in the `OpenFF Iodine Chemistry Optimization Dataset v1.0` [dataset](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2022-07-27-OpenFF-iodine-optimization-set) | I, F, Br, C, Cl, O, S, N, H || # Optimization Datasets diff --git a/submissions/2024-11-11-OpenFF-Iodine-Chemistry-Hessian-Dataset-v1.0/README.md b/submissions/2024-11-11-OpenFF-Iodine-Chemistry-Hessian-Dataset-v1.0/README.md new file mode 100644 index 00000000..b6647e9a --- /dev/null +++ b/submissions/2024-11-11-OpenFF-Iodine-Chemistry-Hessian-Dataset-v1.0/README.md @@ -0,0 +1,50 @@ +# OpenFF Iodine Chemistry Hessian Dataset v1.0 + +## Description + +Hessian single points for the final molecules in the [OpenFF Iodine Chemistry Optimization Dataset v1.0](https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2022-07-27-OpenFF-iodine-optimization-set) at the B3LYP-D3BJ/DZVP level of theory. + +## General information + +* Date: 2024-11-11 +* Class: OpenFF Basic Dataset +* Purpose: Hessian dataset generation for MSM parameters +* Name: OpenFF Iodine Chemistry Hessian Dataset v1.0 +* Number of unique molecules: 99 +* Number of conformers: 327 +* Number of conformers (min, mean, max): 1, 3.30, 22 +* Molecular weight (min, mean, max): 221.95 318.54 533.92 +* Charges: -1.0, 0.0, 1.0 +* Dataset submitter: Alexandra McIsaac +* Dataset generator: Alexandra McIsaac + + +## QCSubmit generation pipeline + +* `generate-dataset.ipynb`: This notebook shows how the dataset was prepared from the input files. + + +## QCSubmit Manifest + +* `dataset.json.bz2`: Compressed dataset ready for submission +* `dataset.pdf`: Visualization of dataaset molecules +* `dataset.smi`: Smiles strings for dataset molecules +* `generate-dataset.ipynb`: Notebook describing dataset generation and submission +* `input-environment.yaml`: Environment file used to create Python environment for the notebook +* `full-environment.yaml`: Fully-resolved environment used to execute the notebook. + + +## Metadata + +* elements: I, F, Br, C, Cl, O, S, N, H +* unique molecules: 99 +* Spec: B3LYP-D3BJ/DZVP + * SCF properties: + * dipole + * quadrupole + * wiberg_lowdin_indices + * mayer_indices + * QC Spec: + * name: default + * method: B3LYP-D3BJ + * basis: DZVP diff --git a/submissions/2024-11-11-OpenFF-Iodine-Chemistry-Hessian-Dataset-v1.0/dataset.json.bz2 b/submissions/2024-11-11-OpenFF-Iodine-Chemistry-Hessian-Dataset-v1.0/dataset.json.bz2 new file mode 100644 index 00000000..378b12cb --- /dev/null +++ b/submissions/2024-11-11-OpenFF-Iodine-Chemistry-Hessian-Dataset-v1.0/dataset.json.bz2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b77e8e72afd332de289774b74688a0538436042e30aacf35151bb3c14b2ee962 +size 246511 diff --git a/submissions/2024-11-11-OpenFF-Iodine-Chemistry-Hessian-Dataset-v1.0/dataset.pdf b/submissions/2024-11-11-OpenFF-Iodine-Chemistry-Hessian-Dataset-v1.0/dataset.pdf new file mode 100644 index 00000000..b33311ef Binary files /dev/null and b/submissions/2024-11-11-OpenFF-Iodine-Chemistry-Hessian-Dataset-v1.0/dataset.pdf differ diff --git a/submissions/2024-11-11-OpenFF-Iodine-Chemistry-Hessian-Dataset-v1.0/dataset.smi b/submissions/2024-11-11-OpenFF-Iodine-Chemistry-Hessian-Dataset-v1.0/dataset.smi new file mode 100644 index 00000000..640643cf --- /dev/null +++ b/submissions/2024-11-11-OpenFF-Iodine-Chemistry-Hessian-Dataset-v1.0/dataset.smi @@ -0,0 +1,99 @@ +c1cc2c(cc(c(c2nc1)O)I)Cl +c1cc2c(cc(c(c2nc1)[O-])I)Cl +c1ccc2c(c1)ccc(n2)/C=N/c3ccccc3I +c1ccc2c(c1)c(cc(n2)/C=N/c3ccccc3I)C(=O)O +c1cc(c(c(c1Br)C=O)F)I +c1cc(c(c(c1)Br)I)[N+](=O)[O-] +c1cc(c(cc1Br)I)[N+](=O)[O-] +c1ccc(c(c1)[C@@H]2CCN2)I +c1cc(c(cc1Cl)Cl)NC(=O)c2cc(cc(c2O)I)I +c1cc(c(c(c1Cl)C=O)F)I +c1c(cc(c(c1Cl)I)Cl)OC(F)(F)F +c1ccc(c(c1)/C=N/NC2=Nc3ccccc3S2)I +c1ccc(c(c1)/C=N/NC2=NCCCCC2)I +c1cc(ccc1C(=O)CC#N)I +c1ccc(c(c1)C(=O)NN=C2CCCC2)I +c1ccc(c(c1)C(=O)NN=C2CCCCC2)I +c1c(cc(c(c1C(=O)N)N)I)I +c1cc(c(c(c1C(=O)O)F)F)I +c1c(cc(c(c1C(=O)O)O)C(F)(F)F)I +c1cc(c(cc1I)Br)C(=O)O +c1cc(c(cc1I)C(F)(F)F)F +c1cc(cc(c1)I)[C@@H]2CCN2 +c1cc(c(c(c1)I)Cl)C(=O)O +c1c(c(cc(c1I)Cl)F)C(=O)O +c1cc(c(cc1I)C(=O)O)C#N +c1cc(c(cc1I)C(=O)O)N2C=NN=N2 +c1c(c(cc(c1I)F)Br)[N+](=O)[O-] +c1c(c(cc(c1I)F)Cl)C(=O)O +c1cc(c(c(c1)I)[N+](=O)[O-])N +c1cc(c(c(c1)I)O)C(=O)O +c1cc(ccc1N2CC(=O)N(CC2=O)N)I +c1cc(ccc1N/N=C(/C#N)\C2=NN=C3N2CCCCC3)I +c1cc(ccc1/N=N\C(C#N)C#N)I +c1c(cc(c(c1[N+](=O)[O-])N)I)Cl +c1c(cc(c(c1[N+](=O)[O-])O)I)C#N +c1cc(ccc1OC2CC2)I +c1cc(c(cc1OC(F)(F)F)N)I +c1cc(ccc1S(=O)(=O)NN)I +c1c(c(cnc1Cl)C(=O)O)I +C1CCC(=NNC(=O)C2=C(C=NN2)I)C1 +c1c(cnc(c1C(=O)O)Br)I +c1cc(nc(c1)I)SC(F)(F)F +C1=C(C(=NN1CC(=O)O)C(=O)O)I +C1=C(SC(=C1)I)C(=O)CC#N +CC1(c2ccccc2[N@@]([C@@]13C=Cc4cc(ccc4O3)I)C)C +Cc1cccc2c1N=C(S2)NS(=O)(=O)c3ccc(cc3)I +Cc1c(cc(c(c1Br)I)Br)[N+](=O)[O-] +Cc1ccc(cc1C)C(=O)Nc2ccc(c(c2)I)C +Cc1ccc(cc1Cl)C(=O)Nc2c(cc(cn2)I)C +Cc1c(cc(c(c1Cl)I)Cl)[N+](=O)[O-] +Cc1c(cc(cc1I)Cl)/N=N\OC(=O)C +Cc1ccc(c(c1)I)NC(=O)C(C)(C)C +Cc1c(c(ccc1OC)I)C +CC1=CC(=C(C(=O)O1)/C=N/c2ccc(cc2)I)O +Cc1cc(cnc1NC(=O)c2ccc(cc2)F)I +Cc1ccc(nc1)NC(=O)c2cccc(c2)I +CC1CCC(=NNC(=O)c2ccccc2I)CC1 +Cc1c(cnc(c1Br)I)[N+](=O)[O-] +Cc1cc(nc(n1)N/N=C/C2=CC=C(O2)I)C +Cc1cc(nc(n1)N/N=C/C2=C(N=C3N2C=C(C=C3)I)C)C +CC1=C(C=NN1CC(=O)C)I +CC1=C(N2C=C(C=CC2=N1)I)/C=N/Nc3ccc4ccccc4c3 +CC1=NN(C=C1I)CC(=O)C +CC(C)(C#N)C(=O)Nc1cc(ccc1Cl)I +C=C(C(F)(F)F)I +CC(C)N1C=C(C(=N1)C(=O)OC)I +CCN1C=C(C(=N1)C(=O)OC)I +CCOc1cccc(c1I)Br +CC(=O)C1=NN(C=C1I)C +CCOC(=O)c1cc(cc(c1C)I)Br +CCOC(=O)C1=C(SC(=N1)N)I +CCOC(=O)C1=NN(C=C1I)C +CC(=O)N1CCCc2c1cc(cc2)I +CC(=O)Oc1cccc(c1)I +CN1C(=c2cccc(c2=N1)C(=O)OC)I +CN1C=C(C(=N1)C(=O)N2CCCc3c2cccc3)I +CN1C=C(C(=N1)C(=O)OC)I +CN1C(=C(C(=N1)C(=O)OC)I)[N+](=O)[O-] +CN1C=C(C(=O)N(C1=O)CC(=O)N2CC(=O)Nc3c2cccc3)I +CN(c1ccccc1)C(=O)c2ccc(cc2)I +CNc1c(nc2=NON=c2n1)Nc3ccc(cc3)I +CN(C1=Nc2ccccc2S1)C(=O)c3ccc(cc3)I +CN(C)C(=O)Cc1cccc(c1)I +COc1ccc(c2c1NN=C2N)I +COc1c(cc(cc1F)I)F +COc1ccccc1I +COc1ccc(cc1I)Cl +COc1ccc(cc1I)C=O +COc1cccc(c1I)F +COc1cc(ccc1I)[N+](=O)[O-] +COc1ccc(cc1I)[N+](=O)[O-] +COc1cc(c(cc1I)O)F +COc1cccc(c1[N+](=O)[O-])I +COc1cc(cc(c1O)I)C#N +COc1ccc(c(c1O)I)C=O +COc1ccnc2c1cc(cc2)I +COC(=O)c1cc(cc(c1O)I)C#N +COC(=O)C1=C(SC(=N1)N)I +COC(=O)[C@@H]1[C@H]2CC[C@H]([N@@H+]2CCCF)C[C@@H]1c3ccc(cc3)I diff --git a/submissions/2024-11-11-OpenFF-Iodine-Chemistry-Hessian-Dataset-v1.0/full-environment.yaml b/submissions/2024-11-11-OpenFF-Iodine-Chemistry-Hessian-Dataset-v1.0/full-environment.yaml new file mode 100644 index 00000000..cf29ad68 --- /dev/null +++ b/submissions/2024-11-11-OpenFF-Iodine-Chemistry-Hessian-Dataset-v1.0/full-environment.yaml @@ -0,0 +1,326 @@ +name: submit-hess-ds-newqcs +channels: + - openeye + - conda-forge + - defaults +dependencies: + - ambertools=23.6=cuda_None_nompi_py312hc98840c_105 + - annotated-types=0.7.0=pyhd8ed1ab_0 + - anyio=4.6.2.post1=pyhd8ed1ab_0 + - appnope=0.1.4=pyhd8ed1ab_0 + - apsw=3.46.1.0=py312h920bacf_1 + - argcomplete=3.5.1=pyhd8ed1ab_0 + - argon2-cffi=23.1.0=pyhd8ed1ab_0 + - argon2-cffi-bindings=21.2.0=py312hb553811_5 + - arpack=3.9.1=nompi_hf81eadf_101 + - arrow=1.3.0=pyhd8ed1ab_0 + - asttokens=2.4.1=pyhd8ed1ab_0 + - async-lru=2.0.4=pyhd8ed1ab_0 + - attrs=24.2.0=pyh71513ae_0 + - babel=2.16.0=pyhd8ed1ab_0 + - basis_set_exchange=0.10=pyhd8ed1ab_1 + - beautifulsoup4=4.12.3=pyha770c72_0 + - bleach=6.2.0=pyhd8ed1ab_0 + - blosc=1.21.6=h7d75f6d_0 + - brotli=1.1.0=h00291cd_2 + - brotli-bin=1.1.0=h00291cd_2 + - brotli-python=1.1.0=py312h5861a67_2 + - bson=0.5.9=py_0 + - bzip2=1.0.8=hfdf4475_7 + - c-ares=1.34.3=hf13058a_0 + - c-blosc2=2.15.1=hb9356d3_0 + - ca-certificates=2024.8.30=h8857fd0_0 + - cached-property=1.5.2=hd8ed1ab_1 + - cached_property=1.5.2=pyha770c72_1 + - cachetools=5.5.0=pyhd8ed1ab_0 + - cairo=1.18.0=h37bd5c4_3 + - certifi=2024.8.30=pyhd8ed1ab_0 + - cffi=1.17.1=py312hf857d28_0 + - chardet=5.2.0=py312hb401068_2 + - charset-normalizer=3.4.0=pyhd8ed1ab_0 + - cmiles=0.1.6=ha770c72_2 + - cmiles-base=0.1.6=pyhd8ed1ab_2 + - colorama=0.4.6=pyhd8ed1ab_0 + - comm=0.2.2=pyhd8ed1ab_0 + - contourpy=1.3.0=py312h2a50410_2 + - cpp-expected=1.1.0=hb8565cd_0 + - cycler=0.12.1=pyhd8ed1ab_0 + - debugpy=1.8.8=py312haafddd8_0 + - decorator=5.1.1=pyhd8ed1ab_0 + - defusedxml=0.7.1=pyhd8ed1ab_0 + - entrypoints=0.4=pyhd8ed1ab_0 + - exceptiongroup=1.2.2=pyhd8ed1ab_0 + - executing=2.1.0=pyhd8ed1ab_0 + - fftw=3.3.10=nompi_h292e606_110 + - fmt=11.0.2=h3c5361c_0 + - font-ttf-dejavu-sans-mono=2.37=hab24e00_0 + - font-ttf-inconsolata=3.000=h77eed37_0 + - font-ttf-source-code-pro=2.038=h77eed37_0 + - font-ttf-ubuntu=0.83=h77eed37_3 + - fontconfig=2.15.0=h37eeddb_1 + - fonts-conda-ecosystem=1=0 + - fonts-conda-forge=1=0 + - fonttools=4.54.1=py312hbe3f5e4_1 + - fqdn=1.5.1=pyhd8ed1ab_0 + - freetype=2.12.1=h60636b9_2 + - freetype-py=2.3.0=pyhd8ed1ab_0 + - greenlet=3.1.1=py312h5861a67_0 + - h11=0.14.0=pyhd8ed1ab_0 + - h2=4.1.0=pyhd8ed1ab_0 + - hdf4=4.2.15=h8138101_7 + - hdf5=1.14.4=nompi_h1607680_103 + - hpack=4.0.0=pyh9f0ad1d_0 + - httpcore=1.0.6=pyhd8ed1ab_0 + - httpx=0.27.2=pyhd8ed1ab_0 + - hyperframe=6.0.1=pyhd8ed1ab_0 + - icu=75.1=h120a0e1_0 + - idna=3.10=pyhd8ed1ab_0 + - importlib-metadata=8.5.0=pyha770c72_0 + - importlib_metadata=8.5.0=hd8ed1ab_0 + - importlib_resources=6.4.5=pyhd8ed1ab_0 + - iniconfig=2.0.0=pyhd8ed1ab_0 + - ipdb=0.13.13=pyhd8ed1ab_0 + - ipykernel=6.29.5=pyh57ce528_0 + - ipython=8.29.0=pyh707e725_0 + - ipywidgets=8.1.5=pyhd8ed1ab_0 + - isoduration=20.11.0=pyhd8ed1ab_0 + - jedi=0.19.2=pyhff2d567_0 + - jinja2=3.1.4=pyhd8ed1ab_0 + - joblib=1.4.2=pyhd8ed1ab_0 + - json5=0.9.28=pyhff2d567_0 + - jsonpointer=3.0.0=py312hb401068_1 + - jsonschema=4.23.0=pyhd8ed1ab_0 + - jsonschema-specifications=2024.10.1=pyhd8ed1ab_0 + - jsonschema-with-format-nongpl=4.23.0=hd8ed1ab_0 + - jupyter=1.1.1=pyhd8ed1ab_0 + - jupyter-lsp=2.2.5=pyhd8ed1ab_0 + - jupyter_client=8.6.3=pyhd8ed1ab_0 + - jupyter_console=6.6.3=pyhd8ed1ab_0 + - jupyter_core=5.7.2=pyh31011fe_1 + - jupyter_events=0.10.0=pyhd8ed1ab_0 + - jupyter_server=2.14.2=pyhd8ed1ab_0 + - jupyter_server_terminals=0.5.3=pyhd8ed1ab_0 + - jupyterlab=4.2.5=pyhd8ed1ab_0 + - jupyterlab_pygments=0.3.0=pyhd8ed1ab_1 + - jupyterlab_server=2.27.3=pyhd8ed1ab_0 + - jupyterlab_widgets=3.0.13=pyhd8ed1ab_0 + - khronos-opencl-icd-loader=2024.05.08=h00291cd_0 + - kiwisolver=1.4.7=py312hc5c4d5f_0 + - krb5=1.21.3=h37d8d59_0 + - lcms2=2.16=ha2f27b4_0 + - lerc=4.0.0=hb486fe8_0 + - libaec=1.1.3=h73e2aa4_0 + - libarchive=3.7.4=h20e244c_0 + - libblas=3.9.0=25_osx64_openblas + - libboost=1.84.0=hbe88bda_6 + - libboost-python=1.84.0=py312h082b758_6 + - libbrotlicommon=1.1.0=h00291cd_2 + - libbrotlidec=1.1.0=h00291cd_2 + - libbrotlienc=1.1.0=h00291cd_2 + - libcblas=3.9.0=25_osx64_openblas + - libcurl=8.10.1=h58e7537_0 + - libcxx=19.1.3=hf95d169_0 + - libdeflate=1.22=h00291cd_0 + - libedit=3.1.20191231=h0678c8f_2 + - libev=4.33=h10d778d_2 + - libexpat=2.6.4=h240833e_0 + - libffi=3.4.2=h0d85af4_5 + - libgfortran=5.0.0=13_2_0_h97931a8_3 + - libgfortran5=13.2.0=h2873a65_3 + - libglib=2.82.2=hb6ef654_0 + - libiconv=1.17=hd75f5a5_2 + - libintl=0.22.5=hdfe23c8_3 + - libjpeg-turbo=3.0.0=h0dc2134_1 + - liblapack=3.9.0=25_osx64_openblas + - libmamba=2.0.2=h7eaea7e_2 + - libnetcdf=4.9.2=nompi_h976d569_115 + - libnghttp2=1.64.0=hc7306c3_0 + - libopenblas=0.3.28=openmp_hbf64a52_1 + - libpng=1.6.44=h4b8f8c9_0 + - libpq=16.4=h365486b_3 + - librdkit=2024.03.5=hbc19afa_3 + - libsodium=1.0.20=hfdf4475_0 + - libsolv=0.7.30=h69d5d9b_0 + - libsqlite=3.46.1=h4b8f8c9_0 + - libssh2=1.11.0=hd019ec5_0 + - libtiff=4.7.0=h583c2ba_1 + - libwebp-base=1.4.0=h10d778d_0 + - libxcb=1.17.0=hf1f96e2_0 + - libxml2=2.13.4=h12808cf_2 + - libxslt=1.1.39=h03b04e6_0 + - libzip=1.11.2=h31df5bb_0 + - libzlib=1.3.1=hd23fc13_2 + - llvm-openmp=19.1.3=hf78d878_0 + - lxml=5.3.0=py312h4feaf87_2 + - lz4-c=1.9.4=hf0c8a7f_0 + - lzo=2.10=h10d778d_1001 + - mamba=2.0.2=h5750114_2 + - markupsafe=3.0.2=py312hbe3f5e4_0 + - matplotlib-base=3.9.2=py312h30cc4df_2 + - matplotlib-inline=0.1.7=pyhd8ed1ab_0 + - mda-xdrlib=0.2.0=pyhd8ed1ab_0 + - mdtraj=1.10.1=py312hd8e3cac_0 + - mistune=3.0.2=pyhd8ed1ab_0 + - msgpack-python=1.1.0=py312hc5c4d5f_0 + - munkres=1.1.4=pyh9f0ad1d_0 + - nbclient=0.10.0=pyhd8ed1ab_0 + - nbconvert-core=7.16.4=pyhd8ed1ab_1 + - nbformat=5.10.4=pyhd8ed1ab_0 + - ncurses=6.5=hf036a51_1 + - nest-asyncio=1.6.0=pyhd8ed1ab_0 + - netcdf-fortran=4.6.1=nompi_h4dd9276_107 + - networkx=3.4.2=pyhd8ed1ab_1 + - nglview=3.1.2=pyhceb8b5e_1 + - nlohmann_json=3.11.3=hf036a51_1 + - notebook=7.2.2=pyhd8ed1ab_0 + - notebook-shim=0.2.4=pyhd8ed1ab_0 + - numexpr=2.10.1=py312h6b48bed_3 + - numpy=1.26.4=py312he3a82b2_0 + - ocl_icd_wrapper_apple=1.0.0=hbcb3906_0 + - openeye-toolkits=2024.1.3=py312_0 + - openff-amber-ff-ports=0.0.4=pyhca7485f_0 + - openff-forcefields=2024.09.0=pyhff2d567_0 + - openff-interchange=0.4.0=pyhd8ed1ab_0 + - openff-interchange-base=0.4.0=pyhd8ed1ab_0 + - openff-qcsubmit=0.54.0=pyhd8ed1ab_0 + - openff-toolkit=0.16.6=pyhd8ed1ab_0 + - openff-toolkit-base=0.16.6=pyhd8ed1ab_0 + - openff-units=0.2.2=pyhca7485f_0 + - openff-utilities=0.1.12=pyhd8ed1ab_0 + - openjpeg=2.5.2=h7310d3a_0 + - openmm=8.2.0=py312h182294e_0_khronos + - openmmforcefields=0.14.1=pyhd8ed1ab_0 + - openssl=3.3.2=hd23fc13_0 + - overrides=7.7.0=pyhd8ed1ab_0 + - packaging=24.1=pyhd8ed1ab_0 + - pandas=2.2.3=py312h98e817e_1 + - pandocfilters=1.5.0=pyhd8ed1ab_0 + - panedr=0.8.0=pyhd8ed1ab_0 + - parmed=4.3.0=py312h93b526f_0 + - parso=0.8.4=pyhd8ed1ab_0 + - pcre2=10.44=h7634a1b_2 + - perl=5.32.1=7_h10d778d_perl5 + - pexpect=4.9.0=pyhd8ed1ab_0 + - pickleshare=0.7.5=py_1003 + - pillow=11.0.0=py312h66fe14f_0 + - pint=0.23=pyhd8ed1ab_1 + - pip=24.3.1=pyh8b19718_0 + - pixman=0.43.4=h73e2aa4_0 + - pkgutil-resolve-name=1.3.10=pyhd8ed1ab_1 + - platformdirs=4.3.6=pyhd8ed1ab_0 + - pluggy=1.5.0=pyhd8ed1ab_0 + - prometheus_client=0.21.0=pyhd8ed1ab_0 + - prompt-toolkit=3.0.48=pyha770c72_0 + - prompt_toolkit=3.0.48=hd8ed1ab_0 + - psutil=6.1.0=py312h3d0f464_0 + - pthread-stubs=0.4=h00291cd_1002 + - ptyprocess=0.7.0=pyhd3deb0d_0 + - pure_eval=0.2.3=pyhd8ed1ab_0 + - py-cpuinfo=9.0.0=pyhd8ed1ab_0 + - pycairo=1.27.0=py312hc3eca63_0 + - pycalverter=1.6.1=pyhd8ed1ab_1 + - pycparser=2.22=pyhd8ed1ab_0 + - pydantic=2.9.2=pyhd8ed1ab_0 + - pydantic-core=2.23.4=py312h669792a_0 + - pyedr=0.8.0=pyhd8ed1ab_0 + - pygments=2.18.0=pyhd8ed1ab_0 + - pyjwt=2.9.0=pyhd8ed1ab_1 + - pyobjc-core=10.3.1=py312hab44e94_1 + - pyobjc-framework-cocoa=10.3.1=py312hab44e94_1 + - pyparsing=3.2.0=pyhd8ed1ab_1 + - pysocks=1.7.1=pyha2e5f31_6 + - pytables=3.10.1=py312h9f7c63d_3 + - pytest=8.3.3=pyhd8ed1ab_0 + - python=3.12.7=h8f8b54e_0_cpython + - python-constraint=1.4.0=py_0 + - python-dateutil=2.9.0=pyhd8ed1ab_0 + - python-fastjsonschema=2.20.0=pyhd8ed1ab_0 + - python-json-logger=2.0.7=pyhd8ed1ab_0 + - python-tzdata=2024.2=pyhd8ed1ab_0 + - python_abi=3.12=5_cp312 + - pytz=2024.1=pyhd8ed1ab_0 + - pyyaml=6.0.2=py312hb553811_1 + - pyzmq=26.2.0=py312h1060d5c_3 + - qcelemental=0.28.0=pyhd8ed1ab_1 + - qcportal=0.56=pyhd8ed1ab_1 + - qhull=2020.2=h3c5361c_5 + - rdkit=2024.03.5=py312hcfd6466_3 + - readline=8.2=h9e318b2_1 + - referencing=0.35.1=pyhd8ed1ab_0 + - regex=2024.11.6=py312h01d7ebd_0 + - reportlab=4.2.5=py312hb553811_0 + - reproc=14.2.4.post0=h10d778d_1 + - reproc-cpp=14.2.4.post0=h93d8f39_1 + - requests=2.32.3=pyhd8ed1ab_0 + - rfc3339-validator=0.1.4=pyhd8ed1ab_0 + - rfc3986-validator=0.1.1=pyh9f0ad1d_0 + - rlpycairo=0.2.0=pyhd8ed1ab_0 + - rpds-py=0.21.0=py312h0d0de52_0 + - scipy=1.13.1=py312hb9702fa_0 + - send2trash=1.8.3=pyh31c8845_0 + - setuptools=75.3.0=pyhd8ed1ab_0 + - simdjson=3.10.1=h37c8870_0 + - six=1.16.0=pyh6c4a22f_0 + - smirnoff-plugins=2024.07.0=pyhd8ed1ab_0 + - smirnoff99frosst=1.1.0=pyh44b312d_0 + - snappy=1.2.1=he1e6707_0 + - sniffio=1.3.1=pyhd8ed1ab_0 + - soupsieve=2.5=pyhd8ed1ab_1 + - spdlog=1.14.1=h325aa07_1 + - sqlalchemy=2.0.36=py312h3d0f464_0 + - sqlite=3.46.1=he26b093_0 + - stack_data=0.6.2=pyhd8ed1ab_0 + - tabulate=0.9.0=pyhd8ed1ab_1 + - terminado=0.18.1=pyh31c8845_0 + - tinycss2=1.4.0=pyhd8ed1ab_0 + - tinydb=4.8.2=pyhd8ed1ab_0 + - tk=8.6.13=h1abcd95_1 + - toml=0.10.2=pyhd8ed1ab_0 + - tomli=2.0.2=pyhd8ed1ab_0 + - tornado=6.4.1=py312hb553811_1 + - tqdm=4.67.0=pyhd8ed1ab_0 + - traitlets=5.14.3=pyhd8ed1ab_0 + - types-python-dateutil=2.9.0.20241003=pyhff2d567_0 + - typing-extensions=4.12.2=hd8ed1ab_0 + - typing_extensions=4.12.2=pyha770c72_0 + - typing_utils=0.1.0=pyhd8ed1ab_0 + - tzdata=2024b=hc8b5060_0 + - unicodedata2=15.1.0=py312h3d0f464_1 + - unidecode=1.3.8=pyhd8ed1ab_0 + - uri-template=1.3.0=pyhd8ed1ab_0 + - urllib3=2.2.3=pyhd8ed1ab_0 + - validators=0.34.0=pyhd8ed1ab_0 + - wcwidth=0.2.13=pyhd8ed1ab_0 + - webcolors=24.8.0=pyhd8ed1ab_0 + - webencodings=0.5.1=pyhd8ed1ab_2 + - websocket-client=1.8.0=pyhd8ed1ab_0 + - wheel=0.45.0=pyhd8ed1ab_0 + - widgetsnbextension=4.0.13=pyhd8ed1ab_0 + - xmltodict=0.14.2=pyhd8ed1ab_0 + - xorg-libice=1.1.1=h00291cd_1 + - xorg-libsm=1.2.4=h00291cd_1 + - xorg-libx11=1.8.10=ha6c16c8_0 + - xorg-libxau=1.0.11=h00291cd_1 + - xorg-libxdmcp=1.1.5=h00291cd_0 + - xorg-libxext=1.3.6=h00291cd_0 + - xorg-libxt=1.3.0=h00291cd_2 + - xorg-xorgproto=2024.1=h00291cd_1 + - xz=5.2.6=h775f41a_0 + - yaml=0.2.5=h0d85af4_2 + - yaml-cpp=0.8.0=he965462_0 + - zeromq=4.3.5=he4ceba3_6 + - zipp=3.21.0=pyhd8ed1ab_0 + - zlib=1.3.1=hd23fc13_2 + - zlib-ng=2.2.2=hac325c4_0 + - zstandard=0.23.0=py312h7122b0e_1 + - zstd=1.5.6=h915ae27_0 + - pip: + - amberutils==21.0 + - edgembar==0.2 + - mmpbsa-py==16.0 + - packmol-memgen==2024.2.9 + - pdb4amber==22.0 + - pymsmt==22.0 + - pytraj==2.0.6 + - sander==22.0 +prefix: /Users/lexiemcisaac/miniconda3/envs/submit-hess-ds-newqcs diff --git a/submissions/2024-11-11-OpenFF-Iodine-Chemistry-Hessian-Dataset-v1.0/generate-dataset.ipynb b/submissions/2024-11-11-OpenFF-Iodine-Chemistry-Hessian-Dataset-v1.0/generate-dataset.ipynb new file mode 100644 index 00000000..311faca7 --- /dev/null +++ b/submissions/2024-11-11-OpenFF-Iodine-Chemistry-Hessian-Dataset-v1.0/generate-dataset.ipynb @@ -0,0 +1,556 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "id": "3d3cf7ba-e0db-439e-bb2e-983a2aa463ef", + "metadata": { + "execution": { + "iopub.execute_input": "2024-11-12T01:22:01.466893Z", + "iopub.status.busy": "2024-11-12T01:22:01.465947Z", + "iopub.status.idle": "2024-11-12T01:22:31.512562Z", + "shell.execute_reply": "2024-11-12T01:22:31.512027Z", + "shell.execute_reply.started": "2024-11-12T01:22:01.466826Z" + } + }, + "outputs": [], + "source": [ + "from qcportal import PortalClient\n", + "from openff.qcsubmit.results import OptimizationResultCollection,BasicResultCollection\n", + "from openff.qcsubmit.datasets import BasicDataset\n", + "from openff.qcsubmit.results.filters import ConnectivityFilter, RecordStatusEnum, RecordStatusFilter\n", + "from openff.qcsubmit.factories import BasicDatasetFactory\n", + "from openff.qcsubmit.common_structures import Metadata, QCSpec" + ] + }, + { + "cell_type": "markdown", + "id": "c3600edd-6772-47fa-b4fa-9d4428851b8e", + "metadata": {}, + "source": [ + "# Load optimization dataset" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "b2d28e39-5696-4577-a0a8-c02b0a98d019", + "metadata": { + "execution": { + "iopub.execute_input": "2024-11-12T01:22:31.514238Z", + "iopub.status.busy": "2024-11-12T01:22:31.513903Z", + "iopub.status.idle": "2024-11-12T01:22:32.948609Z", + "shell.execute_reply": "2024-11-12T01:22:32.947680Z", + "shell.execute_reply.started": "2024-11-12T01:22:31.514220Z" + } + }, + "outputs": [], + "source": [ + "client = PortalClient(\"https://api.qcarchive.molssi.org:443/\")" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "ad9d088a-8000-41e0-b1e5-9b3978b886da", + "metadata": { + "execution": { + "iopub.execute_input": "2024-11-12T01:22:32.952187Z", + "iopub.status.busy": "2024-11-12T01:22:32.950461Z", + "iopub.status.idle": "2024-11-12T01:22:35.255590Z", + "shell.execute_reply": "2024-11-12T01:22:35.255226Z", + "shell.execute_reply.started": "2024-11-12T01:22:32.952145Z" + } + }, + "outputs": [], + "source": [ + "opt_ds = OptimizationResultCollection.from_server(client=client,datasets=['OpenFF Iodine Chemistry Optimization Dataset v1.0'])" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "bf13b1f2-66ae-44e7-aed2-103bdfdfe822", + "metadata": { + "execution": { + "iopub.execute_input": "2024-11-12T01:22:35.256959Z", + "iopub.status.busy": "2024-11-12T01:22:35.256823Z", + "iopub.status.idle": "2024-11-12T01:22:41.041278Z", + "shell.execute_reply": "2024-11-12T01:22:41.040913Z", + "shell.execute_reply.started": "2024-11-12T01:22:35.256947Z" + } + }, + "outputs": [], + "source": [ + "filtered = opt_ds.filter(\n", + " RecordStatusFilter(status=RecordStatusEnum.complete),\n", + " ConnectivityFilter(tolerance=1.2),\n", + " )" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "b3571231-289b-41f1-8287-85abca89b6bf", + "metadata": { + "execution": { + "iopub.execute_input": "2024-11-12T01:22:41.041896Z", + "iopub.status.busy": "2024-11-12T01:22:41.041755Z", + "iopub.status.idle": "2024-11-12T01:22:41.044577Z", + "shell.execute_reply": "2024-11-12T01:22:41.044219Z", + "shell.execute_reply.started": "2024-11-12T01:22:41.041880Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "99 327\n", + "99 327\n" + ] + } + ], + "source": [ + "print(opt_ds.n_molecules,opt_ds.n_results)\n", + "print(filtered.n_molecules,filtered.n_results)" + ] + }, + { + "cell_type": "markdown", + "id": "4a825469-397a-415b-8d5d-016e798dac76", + "metadata": {}, + "source": [ + "# Set up single points" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "bba475b9-1fa9-4655-a6d7-bf52fae73682", + "metadata": { + "execution": { + "iopub.execute_input": "2024-11-12T01:23:04.213882Z", + "iopub.status.busy": "2024-11-12T01:23:04.212599Z", + "iopub.status.idle": "2024-11-12T01:23:09.259702Z", + "shell.execute_reply": "2024-11-12T01:23:09.259232Z", + "shell.execute_reply.started": "2024-11-12T01:23:04.213831Z" + } + }, + "outputs": [], + "source": [ + "from qcelemental.models import DriverEnum\n", + "\n", + "dataset = filtered.create_basic_dataset(dataset_name=\"OpenFF Iodine Chemistry Hessian Dataset v1.0\",\n", + " tagline=\"Hessian single points for the OpenFF Iodine Chemistry Optimization Dataset v1.0 dataset.\", \n", + " description=\"Hessian single points for the final molecules in the OpenFF Iodine Chemistry Optimization Dataset v1.0 dataset at the B3LYP-D3BJ/DZVP level of theory.\",\n", + " driver=DriverEnum.hessian,\n", + " metadata=Metadata(submitter=\"amcisaac\",\n", + " long_description_url=(\n", + " \"https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2024-10-08-OpenFF-Iodine-Chemistry-Hessian-Dataset-v1.0\"\n", + " )\n", + " )\n", + " )" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "8e419645-dbaa-40aa-bec5-957cfd1586c7", + "metadata": { + "execution": { + "iopub.execute_input": "2024-11-12T01:23:10.841447Z", + "iopub.status.busy": "2024-11-12T01:23:10.840499Z", + "iopub.status.idle": "2024-11-12T01:23:10.856787Z", + "shell.execute_reply": "2024-11-12T01:23:10.855388Z", + "shell.execute_reply.started": "2024-11-12T01:23:10.841408Z" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "{'default': QCSpecification(program='psi4', driver=, method='b3lyp-d3bj', basis='dzvp', keywords={'maxiter': 200, 'scf_properties': [, , , ]}, protocols=AtomicResultProtocols(wavefunction=, stdout=True, error_correction=ErrorCorrectionProtocol(default_policy=True, policies=None), native_files=))}" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "dataset._get_specifications()" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "34a3958a-d774-481a-9e81-69bebbd06493", + "metadata": { + "execution": { + "iopub.execute_input": "2024-11-12T01:23:52.820602Z", + "iopub.status.busy": "2024-11-12T01:23:52.819671Z", + "iopub.status.idle": "2024-11-12T01:23:55.318347Z", + "shell.execute_reply": "2024-11-12T01:23:55.317976Z", + "shell.execute_reply.started": "2024-11-12T01:23:52.820560Z" + } + }, + "outputs": [], + "source": [ + "opt_hashes = {\n", + " rec.final_molecule.get_hash() for rec, _mol in filtered.to_records()\n", + " }" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "1e0980b0-0e57-4d08-a0fc-aa49c15e9cf8", + "metadata": { + "execution": { + "iopub.execute_input": "2024-11-12T01:24:16.132512Z", + "iopub.status.busy": "2024-11-12T01:24:16.131645Z", + "iopub.status.idle": "2024-11-12T01:24:16.146242Z", + "shell.execute_reply": "2024-11-12T01:24:16.144938Z", + "shell.execute_reply.started": "2024-11-12T01:24:16.132471Z" + } + }, + "outputs": [], + "source": [ + "new_hashes = {\n", + " qcemol.identifiers.molecule_hash\n", + " for moldata in dataset.dataset.values()\n", + " for qcemol in moldata.initial_molecules\n", + "}\n" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "4e1be1c3-d67c-4703-a454-9c989f2dc7c8", + "metadata": { + "execution": { + "iopub.execute_input": "2024-11-12T01:24:22.433788Z", + "iopub.status.busy": "2024-11-12T01:24:22.432933Z", + "iopub.status.idle": "2024-11-12T01:24:22.447630Z", + "shell.execute_reply": "2024-11-12T01:24:22.446630Z", + "shell.execute_reply.started": "2024-11-12T01:24:22.433746Z" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "opt_hashes==new_hashes" + ] + }, + { + "cell_type": "markdown", + "id": "3edb7cf3-0065-4031-9e99-e9e432f840d8", + "metadata": {}, + "source": [ + "Final optimization molecules and new Hessian molecules should be identical." + ] + }, + { + "cell_type": "markdown", + "id": "8e812a07-bd89-47f2-9380-6c939838aac6", + "metadata": {}, + "source": [ + "# Exporting dataset" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "d69c1de7-42a6-407f-a555-6fc5a7ef0e61", + "metadata": { + "execution": { + "iopub.execute_input": "2024-11-12T01:24:38.040314Z", + "iopub.status.busy": "2024-11-12T01:24:38.039564Z", + "iopub.status.idle": "2024-11-12T01:24:39.498041Z", + "shell.execute_reply": "2024-11-12T01:24:39.497645Z", + "shell.execute_reply.started": "2024-11-12T01:24:38.040266Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{'default': QCSpec(method='B3LYP-D3BJ', basis='DZVP', program='psi4', spec_name='default', spec_description='Standard OpenFF optimization quantum chemistry specification.', store_wavefunction=, implicit_solvent=None, maxiter=200, scf_properties=[, , , ], keywords={})}\n" + ] + } + ], + "source": [ + "dataset.export_dataset(\"dataset.json.bz2\")\n", + "dataset.molecules_to_file('dataset.smi', 'smi')\n", + "dataset.visualize(\"dataset.pdf\", columns=8)\n", + "\n", + "print(dataset.qc_specifications)" + ] + }, + { + "cell_type": "markdown", + "id": "88a45064-d703-457e-b29b-ba0aa758ca9a", + "metadata": {}, + "source": [ + "# Dataset information" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "dc940078-e0e4-4395-9e71-69e49064d3a1", + "metadata": { + "execution": { + "iopub.execute_input": "2024-11-12T01:25:40.255588Z", + "iopub.status.busy": "2024-11-12T01:25:40.253994Z", + "iopub.status.idle": "2024-11-12T01:25:40.265387Z", + "shell.execute_reply": "2024-11-12T01:25:40.264074Z", + "shell.execute_reply.started": "2024-11-12T01:25:40.255523Z" + } + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "from collections import Counter" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "388aff3d-ff0a-4c36-9d83-e6ec82d31db8", + "metadata": { + "execution": { + "iopub.execute_input": "2024-11-12T01:25:40.431555Z", + "iopub.status.busy": "2024-11-12T01:25:40.429905Z", + "iopub.status.idle": "2024-11-12T01:25:40.441579Z", + "shell.execute_reply": "2024-11-12T01:25:40.440425Z", + "shell.execute_reply.started": "2024-11-12T01:25:40.431497Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "n_molecules: 99\n", + "n_conformers: 327\n" + ] + } + ], + "source": [ + "print(\"n_molecules:\", dataset.n_molecules)\n", + "print(\"n_conformers:\", dataset.n_records)" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "8e7d50e8-201d-4f32-a2fd-c3daa60fe35d", + "metadata": { + "execution": { + "iopub.execute_input": "2024-11-12T01:25:49.370929Z", + "iopub.status.busy": "2024-11-12T01:25:49.370087Z", + "iopub.status.idle": "2024-11-12T01:25:50.323727Z", + "shell.execute_reply": "2024-11-12T01:25:50.323356Z", + "shell.execute_reply.started": "2024-11-12T01:25:49.370892Z" + } + }, + "outputs": [], + "source": [ + "n_confs = np.array(\n", + " [mol.n_conformers for mol in dataset.molecules]\n", + ")\n", + "n_heavy_atoms = np.array(\n", + " [mol.to_rdkit().GetNumHeavyAtoms() for mol in dataset.molecules]\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "7ccb51cc-523a-43cd-8ab0-b5d63f9141fc", + "metadata": { + "execution": { + "iopub.execute_input": "2024-11-12T01:25:51.205755Z", + "iopub.status.busy": "2024-11-12T01:25:51.205259Z", + "iopub.status.idle": "2024-11-12T01:25:51.216055Z", + "shell.execute_reply": "2024-11-12T01:25:51.215261Z", + "shell.execute_reply.started": "2024-11-12T01:25:51.205722Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Number of conformers (min, mean, max): 1 3.303030303030303 22\n", + "# heavy atoms\n", + " 7: 1\n", + " 9: 1\n", + " 10: 3\n", + " 11: 24\n", + " 12: 20\n", + " 13: 11\n", + " 14: 6\n", + " 15: 6\n", + " 16: 3\n", + " 17: 5\n", + " 18: 3\n", + " 19: 6\n", + " 20: 2\n", + " 21: 1\n", + " 22: 4\n", + " 23: 1\n", + " 24: 2\n" + ] + } + ], + "source": [ + "print(\n", + " \"Number of conformers (min, mean, max):\",\n", + " n_confs.min(), n_confs.mean(), n_confs.max()\n", + ")\n", + "print(\"# heavy atoms\")\n", + "counts = Counter(n_heavy_atoms)\n", + "for n_heavy in sorted(counts):\n", + " print(f\"{str(n_heavy):>3}: {counts[n_heavy]}\")" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "62625c83-8904-44f7-b219-8c88827bb971", + "metadata": { + "execution": { + "iopub.execute_input": "2024-11-12T01:25:51.854833Z", + "iopub.status.busy": "2024-11-12T01:25:51.853913Z", + "iopub.status.idle": "2024-11-12T01:25:52.236624Z", + "shell.execute_reply": "2024-11-12T01:25:52.236307Z", + "shell.execute_reply.started": "2024-11-12T01:25:51.854788Z" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "{-1.0, 0.0, 1.0}" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from openff.units import unit\n", + "unique_charges = set([\n", + " mol.total_charge.m_as(unit.elementary_charge)\n", + " for mol in dataset.molecules\n", + "])\n", + "unique_charges" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "f0291cf3-b3b4-4dfd-a2ac-07612c8d713a", + "metadata": { + "execution": { + "iopub.execute_input": "2024-11-12T01:25:52.571884Z", + "iopub.status.busy": "2024-11-12T01:25:52.571371Z", + "iopub.status.idle": "2024-11-12T01:25:52.929324Z", + "shell.execute_reply": "2024-11-12T01:25:52.929009Z", + "shell.execute_reply.started": "2024-11-12T01:25:52.571852Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "MW (min, mean, max): 221.94791675 318.5357888080808 533.916695\n" + ] + } + ], + "source": [ + "masses = np.array([\n", + " sum([atom.mass.m for atom in mol.atoms])\n", + " for mol in dataset.molecules\n", + "])\n", + "print(\"MW (min, mean, max):\", masses.min(), masses.mean(), masses.max())" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "4d17e60d-eb09-4a0c-a243-8c05fba536db", + "metadata": { + "execution": { + "iopub.execute_input": "2024-11-12T01:25:53.423114Z", + "iopub.status.busy": "2024-11-12T01:25:53.422037Z", + "iopub.status.idle": "2024-11-12T01:25:53.777148Z", + "shell.execute_reply": "2024-11-12T01:25:53.776806Z", + "shell.execute_reply.started": "2024-11-12T01:25:53.423069Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{'F', 'H', 'N', 'Cl', 'C', 'O', 'Br', 'I', 'S'}\n" + ] + } + ], + "source": [ + "elements = set(\n", + " atom.symbol\n", + " for mol in dataset.molecules\n", + " for atom in mol.atoms\n", + ")\n", + "print(elements)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9bdd7a35-9d41-4307-bb8c-26d1d5115435", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.7" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/submissions/2024-11-11-OpenFF-Iodine-Chemistry-Hessian-Dataset-v1.0/input-environment.yaml b/submissions/2024-11-11-OpenFF-Iodine-Chemistry-Hessian-Dataset-v1.0/input-environment.yaml new file mode 100644 index 00000000..fe4da71f --- /dev/null +++ b/submissions/2024-11-11-OpenFF-Iodine-Chemistry-Hessian-Dataset-v1.0/input-environment.yaml @@ -0,0 +1,20 @@ +name: submit-hess-ds-newqcs +channels: + - conda-forge + - openeye +dependencies: + - python + - openff-toolkit + - openff-qcsubmit >= 0.54 + - openmmforcefields + - openeye::openeye-toolkits + - openff-interchange-base + - qcportal + - jupyter + - ipython + - ipdb + - smirnoff-plugins + - tqdm + - cmiles + - mamba==2.0.2 + - nglview