Inovace vzdělávání v chemii a biologii s ohledem na aktuální trendy
Transkript
Inovace vzdělávání v chemii a biologii s ohledem na aktuální trendy
OPVK CZ.1.07/2.2.00/28.0184 Drug-design - racionální návrh léčiv KFC/DD 04 – strategie hledání léčiv RNDr. Karel Berka, Ph.D. ZS 2012/2013 Motto tj. 18 hod 8 min 18 s Osnova • Možnosti racionálního návrhu léčiv – Computer-aided drug design (CADD) 1. Neznámá struktura cíle Ligand-based drug design (LBDD) Hledání podobných struktur Farmakofor QSAR 2. Známá struktura cíle Structure-based drug design (SBDD) Dokování de novo design Racionální návrh léčiv – možnosti Ligand–based DD Structure–based DD Hledání podobných ligandů QSAR Farmakofor Dokování Virtual screening De novo design Neznámá struktura cíle Známá struktura cíle Racionální návrh léčiv – možnosti II Známe ligand Neznáme ligand Structure-based drug design (SBDD) De novo design Dokování Ligand-based drug design (LBDD) 1 a více ligandů • Hledání podobných ligandů Několik ligandů • Hledání Farmakoforu Hodně ligandů (20+) • Quantitative StructureActivity Relationships (QSAR) CADD nelze použít Nutné získat experimentální data Lze použít filtrování pro ADMET Virtuální screening • Virtuální screening - in silico analog biologického testování • S pomocí jedné, nebo více in silico techniky molekulu – Skórovat – Zařadit – Vyfiltrovat • K čemu? – Které molekuly testovat (experimentálně) – Jakou knihovnu nasyntetizovat – Které molekuly koupit – Analyzovat výsledky experimentů, např. HTS Virtuální screening - průběh AR Leach, VJ Gillet, An Introduction to Cheminformatics Ligand-based drug design • 1 a více ligandů Podobnost (viz dříve) • Několik ligandů Farmakofor • Hodně ligandů (20+) Quantitative Structure-Activity Relationships (QSAR) Hledání podobnosti Hledání látek s podobnou strukturou jako již existující lead – může to vést k vylepšení biologické aktivity • 2D Substruktura • 3D Substruktura • 3D Konformační flexibilita Podobnost k přírodnímu ligandu NH2 N(CH3)2 H N HO H3C S O N H 5-Hydroxytryptamine (5-HT) Serotonin (a natural neurotransmitter synthesized in certain neurons in the CNS) O N H Sumatriptan (Imitrex) Used to treat migrain headaches known to be a 5-HT1 agonist Hledání 2D Substruktury • Funkční skupiny • Konektivita Např. Halogen na O [F,Cl,Br,I] O aromatickém kruhu společně s karboxylovou skupinou Cl O Cl O O N N F O N O F N N I O O O O F N O Hledání 3D Substruktury A • Vzdálenosti v prostoru hrají větší roli • Bioisostericita • Ukládá se nejčastěji konformace s nejnižší energií O(s1) O(s1) 3.3 - 4.3 Å O 6.8 - 7.8 Å 6 Steric Energy (kcal/mol) 5 3.6 - 4.6 Å 4 3 2 [O,S] 1 A 0 0 60 12 0 18 0 Dihedra l angle 24 0 30 0 36 0 C(u) Bioisostericita Young, D.C. Computational Drug Design. Wiley, 2009. Bioisostericita II Young, D.C. Computational Drug Design. Wiley, 2009. Hledání vhodné konformace (cca do 30 kJ/mol) • ! „Klíče“ se přizpůsobují „zámkům“ ale i „zámky“ se přizpůsobují „klíčům“ 3.2Å Cl Cl O H 4.3Å O H 6 5 S te ric E ne rgy (kca l/mol) • Rotuje všechny volně rotovatelné vazby • Hodně konformací => hodně hitů • Nutno zohlednit ty konfoemace, které jsou trochu energeticky nevýhodné 4 3 2 1 0 0 60 12 0 18 0 Dihedra l angle 24 0 30 0 36 0 Quercetin • Antioxidant • skoro planární (bariéra 16 kJ/mol) • Úhel v krystalech kolem 180° • Ale 90° v dokování! Wu, Chien-Ming, et al. Antiplatelet Effect and Selective Binding to Cyclooxygenase (COX) by Molecular Docking Analysis of Flavonoids and Lignans . Intl J Mol Sci 2007, 8, 830–841. Paradox podobnosti aneb ne vždy je vše jednoduché… Aminogenistein (x cystické fibróze) 7-Hydroxy-2-(4-nitro-phenyl)-chromen-4-one Pargyline (x hypertensi) N-benzyl-N,1-dimethyl-2-propynylamine Farmakofor • Pharmacophore • Hledání strukturního motivu odpovědného za farmakologickou aktivitu (analog chromoforu) • set geometrických omezení mezi specifickými funkčními skupinami, které jsou odpovědné za biologickou aktivitu Bojarski, Curr. Top. Med. Chem. 2006, 6, 2005. Přehled Pharmacophore-based Drug Design Naměřená aktivita Testování aktivity Koupě a nebo syntéza hitů See also John Van Drie’s http://pharmacophore.org Vytvoření farmakoforu Prohledávání knihoven pro kandidáty na aktivní látky Hledání farmakoforů - postup • Není zapotřebí proteinová struktura – Ale může se hodit – například se dá vytvořit farmakofor na základě analýzy aktivního místa • Předpoklad: – Všechny (většina) známých aktivních látek se váže do stejného místa • Generování farmakoforů – Identifikace charakteristických „farmakoforických“ vlastností • (donory a akceptory vodíkových vazeb, lipofilické skupiny, rozložení náboje) – Nalezení geometrického uspořádání farmakoforických vlastností, které se ve všech aktivních molekulách vyskytuje ve stabilní konformaci (s nízkou energií) • Hledání farmakoforů – prohledáváme všechny molekuly, které splňují farmakofor ve stabilní konformaci – Scaffold-hopping • Není zapotřebí strukturní podobnost ve smyslu 2D porovnávání • Stačí se trefit do farmakoforu Příklad farmakoforu pro HIV Asp25 O Gly27 O Geometrické uspořádání různých typů funkčních skupin, které jsou zapotřebí pro aktivitu HIV proteázy (aktivní místo) O Donor 6.9 Å Donor 6.0 Å 10.4 Å 5.2 Å 6.3 Å Hydrophobic H3C Ile50 CH3 CH C H2 Acceptor H N Identifikace farmakoforu 1) analýza receptoru Asp25 Asp25 O Gly27 Gly27 O O O Acceptor O 6.9 Acceptor or Anion O 12.2 Å 9.6 Å 8.8 Å Donor Hydrophobic H3C Ile50 CH3 CH C H2 6.3Å CH3 H N H3C Ile50 CH C H2 H N Identifikace farmakoforu 2) definice typů 3) hledání vzdáleností Asp25 Gly27 O Acceptor O or Anion Acceptor Hydrophobic Hydrophobic Ile50 CH3 CH C H2 O 8.8 Å Donor 6.9 Å 12.2 Å H H3 C Ile50 6.3 Å 6.3Å CH3 CH C H2 9.6 Å 6.0 Å 10.4 Å 5.2 Å Donor Hydrophobic N O 6.9 Donor Acceptor O Gly27 Donor Donor H3 C Asp25 O Acceptor H N Finální Farmakofor 6.9 Å Donor Donor 6.0 Å 10.4 Å 5.2 Å 6.3 Å Acceptor Hydrophobic Poslední krok: Hledání molekul v databázi (konformací), které splňují dotyčný farmakofor. QSAR • Quantative Structure-Activity Relationships • matematický vztah mezi biologickou aktivitou a jejími geometrickými a chemickými vlastnostmi • nalezená “pravidla” – hledání aktivity nových molekul Sloučeniny s aktivitou QSAR Nové sloučeniny s predikovanou aktivitou Why QSAR? The number of compounds required for synthesis in order to place 10 different groups in 4 positions of benzene ring is 104 Solution: synthesize a small number of compounds and from their data derive rules to predict the biological activity of other compounds. 3D-QSAR Assumptions The effect is produced by modeled compound and not it’s metabolites. The proposed conformation is the bioactive one. The binding site is the same for all modeled compounds. The biological activity is largely explained by enthalpic processes. Entropic terms are similar for all the compounds. The system is considered to be at equilibrium, and kinetics aspects are usually not considered. Pharmacokinetics: solvent effects, diffusion, transport are not included. General Procedure of QSAR • Select a set of molecules interacting with the same receptor with known activities. • Calculate features (e.g. physicalchemical properties, etc., 2D, 3D) • Divide the set to two subgroups: one for training and one for testing. • Build a model: find the relations between the activities and properties (regression problem, statistic methods, machine learning approaches, etc). • Test the model on the testing dataset. • Publish a paper if your results are good! • You can also develop new descriptors, new methodologies, algorithms, etc. Advantages of QSAR • Quantifying the relationship between structure and activity provides an understanding of the effect of structure on activity. • It is also possible to make predictions leading to the synthesis of novel analogues. • The results can be used to help understand interactions between functional groups in the molecules of greatest activity, with those of their target Statistical Concepts • Input: n descriptors P1,..Pn and the value of biological activity (EC50 for example) for m compounds Bio P1 Cpd 1 0.7 3.7 Cpd2 3.2 0.4 ……. Cpdm P2 …… .. .. .. .. Pn Outline • Hammett Relationships • log P : Octanol-water partition coefficients – uses in Pharmaceutical Chemistry – uses in Environmental Chemistry – uses in Chromatography • Other Descriptors • Multivariate Least Squares • Nicotinic Agonists - Neurobiology Hammett Relationships • pKa of benzoic acids • Effect of electron withdrawing and donating groups • based on rG = - RT ln Keq pKa Substituted Benzoic Acids • log Ka - log KaH = • K aH is the reference compound- unsubstituted log Ka O O H -1 R1 -0,5 1 0,8 0,6 0,4 0,2 0 -0,2 0 -0,4 -0,6 -0,8 0,5 1 sigma Hammett Constants Group -NH 2 -OH -OCH 3 -CH 3 -H -F -Cl -COOH -CN -NO 2 p m -0.57 -0.38 -0.28 -0.14 0 0.15 0.24 0.44 0.70 0.81 -0.09 0.13 0.10 -0.06 0 0.34 0.37 0.35 0.62 0.71 Sigma-rho plots • • • • • One application of QSPR Activity = r + constant Y = mx + b : descriptor r : slope Octanol-Water Partition Coefficients • P = C(octanol) C(water) • log P like rG = - RT ln Keq • Hydrophobic hydrophilic character • P increases then more hydrophobic Octanol H2O QSAR and log P Isonarcotic Activity of Esters, Alcohols, Ketones, and Ethers with Tadpoles Compound CH3 OH C2 H5 OH CH3 COCH3 (CH3 ) 2 CHOH (CH3 ) 3 COH CH3 CH2 CH2 OH CH3 COOCH3 C2 H5 COCH3 HCOOC2 H5 C2 H5 COC2 H5 (CH3 ) 2 C( C2 H5 )OH CH3 (CH2 ) 3 OH (CH3 ) 2 CHCH2 OH CH3 COOC2 H5 C2 H5 COC2 H5 CH3 (CH2 ) 4 OH CH3 CH2 CH2 COCH3 CH3 COOCH2 C2 H5 C2 H5 COOC2 H5 (CH3 ) 2 CHCOOC2 H5 log(1/C) 0.30 0.50 0.65 0.90 0.90 1.00 1.10 1.10 1.20 1.20 1.20 1.40 1.40 1.50 1.50 1.60 1.70 2.00 2.00 2.20 log P -1.27 -0.75 -0.73 -0.36 0.07 -0.23 -0.38 -0.27 -0.38 0.59 0.59 0.29 0.16 0.14 0.31 0.81 0.31 0.66 0.66 1.05 QSAR and log P Isonarcotic Activity of Esters, Alcohols, Ketones, and Ethers with Tadpoles log(1/C) 2.5 y = 0.7315x + 1.2211 2 R2 = 0.7767 R = 0.881 1.5 n = 20 1 0.5 0 -2 -1 0 log P 1 2 Isonarcotic Activity of Esters, Alcohols, Ketones, and Ethers with Tadpoles • log(1/C) = 0.869 log P + 1.242 • n = 28 r = 0.965 • subset of alcohols: log(1/C) = 1.49 log P - 0.10 (log P)2 + 0.50 n = 10 r = 0.995 log P hydrophobic benzene 2.13 pentanol 0.81 n-propanol -0.23 isopropanol -0.36 ethanol -.75 methanol -1.27 hydrophillic butylamine 0.85 pyridine 0.64 diethylamine 0.45 imidazole -0.08 phenylalanine -1.38 tetraethylammonium iodide -2.82 alanine -2.85 Molecule Properties SPC : Structure Property Correlation MOLECULE STRUCTURE INTRINSIC PROPERTIES Molar Volume Connectivity Indices Charge Distribution Molecular Weight Polar surface Area.... ....... CHEMICAL PROPERTIES pKa Log P Solubility Stability BIOLOGICAL PROPERTIES Activity Toxicity Biotransformation Pharmacokinetics Molecule Descriptors o Molecular descriptors are numerical values that characterize properties of molecules. o The descriptors fall into Four classes . a) Topological b) Geometrical c) Electronic d) Hybrid or 3D Descriptors Classification of Descriptors Topological Descriptors Topological descriptors are derived directly from the connection table representation of the structure which include: a) Atom and Bond Counts b) substructure counts c) molecular connectivity Indices (Weiner Index , Randic Index, Chi Index) d) Kappa Indices e) path descriptors f) distance-sum Connectivity g) Molecular Symmetry Deskriptory • • • • Molar Volume, Vm Surface area Rotatable Bonds, Rotbonds, b_rotN Atomic Polarizability, Apol – Ease of distortion of electron clouds – sum of Van der Waals A coefficients • Molecular Refractivity, MR – size and polarizability – local non-lipophilic interactions Geometrical Descriptors Geometrical descriptors are derived from the three-dimensional representations and include: a) principal moments of inertia, b) molecular volume, c)solvent-accessible surface area, d) Charged partial Surface area e) Molecular Surface area Electronic Descriptors Electronic descriptors characterize the molecular Strcutures with such quantities : a) b) c) d) e) f) dipole moment, Quadrupole moment, polarizibility, HOMO and LUMO energies, Dielectric energy Molar Refractivity Hybrid and 3D Descriptors a) b) c) d) e) f) g) h) i) geometric atom pairs and topological torsions spatial autocorrelation vectors WHIM indices BCUTs GETAWAY descriptors Topomers pharmacophore fingerprints Eva Descriptors Descriptors of Molecular Field Limit Of Descriptors The data set should contain at least 5 times as many compounds as descriptor in the QSAR. The reason for this is that too few compounds relative to the number of descriptors will give a falsely high correlation: 2 point exactly determine a line. 3 points exactly determine a plane (etc.) A data set of drug candidate that is similar in size meaningless correlation Atomic Polarizability, Apol • Atomic Polarizability – Ease of distortion of electron clouds – sum of Van der Waals A coefficients A B EVdW,ij = - r 6 + r 12 ij ij Molecular Refractivity, MR • Molecular Refractivity, MR – size and polarizability – local non-lipophilic interactions Lorentz-Lorentz equation: 2 (n - 1) MW MR = (n2 + 2) d Group Additive Properties, GAPs Substituent Volume (SA) -H 1.48 -CH3 18.78 -CH2CH3 35.35 -CH2CH2CH3 51.99 -CH(CH3)2 51.33 -CH2CH2CH2CH3 68.63 -C(CH3)3 86.99 -C6H5 72.20 -F 7.05 -Cl 15.85 MR Rot Bonds 0.10 0 (reference) 0 0.57 0.56 0 1.03 1.02 1 1.5 1.55 2 1.5 1.53 1 1.96 2.13 3 1.96 1.98 1 2.54 1.96 1 0.10 0.14 0 0.60 0.71 0 QSAR and 3D-QSAR Software Tripos – CoMFA VolSurf Catalyst Serius QSAR+ Schrodinger DISCOVER Tools To calculate Molecular Descriptors Freely available • CDK tool http://rguha.net/code/java/cdkdesc.html • POWER MV http://nisla05.niss.org/PowerMV/?q=PowerMV/ • MOLD2 http://www.fda.gov/ScienceResearch/BioinformaticsT ools/Mold2/default.htm • PADEL Descriptor http://www.downv.com/Windows/install-PaDELDescriptor-10439915.htm Admet Descriptors to Screen Molecules Bioavailability The Bioavailability of a compound is classified as : Bioavailability Absorbtion Permeability Lipophilicity Hydrogen Bonding Liver Metabolism Gut-wall Metabolism Solubility Molecular Size/Shape Transporters Flexibility PREDICTION OF ADMET PROPERTIES • Requirements for a drug: – Must bind tightly to the biological target in vivo – Must pass through one or more physiological barriers (cell membrane or blood-brain barrier) – Must remain long enough to take effect – Must be removed from the body by metabolism, excretion, or other means • ADMET: Absorption, Distribution, metabolism, Excretion (Elimination), Toxicity Lipinski Rule of Five(Oral Drug Properties) • Poor absorption or permeation is more likely when: – MW > 500 – LogP >5 – More than 5 H-bond donors (sum of OH and NH groups) – More than 10 H-bond acceptors (sum of N and O atoms) Polar Surface Area o Defined as amount of molecular surface(vander-walls) arising from polar atoms(Nitrogen and oxygen atom together with attached hydrogens) o PSA seems to optimally encode those drug properties which play an important role in membrane penetration: molecular polarity, H - bonding features and also solubility. o It provide excellent correlations with transport properties of drugs.(PSA used in the Prediction of Oral absorbtion,Brain penetration, Intestinal Absorption, Caco-2permeability) o It has also been effectively used to characterize drug likeness during virtual screening & combinatorial library design. o The calculation of PSA, however, is rather timeconsuming because of the necessity to generate a reasonable 3D molecular geometry and the calculation of the surface itself. o Peter Ertl introduced an extremely rapid method to obtain PSA descriptor simply from the sum of contributions of polar fragments in a molecule without the necessity to generate its three - dimensional (3D) geometry. PSA In Intestinal absorption • • • • Intestinal absorption is usually expressed as fraction absorbed (FA), expressing the percentage of initial dose appearing in a portal vein. A model for PSA was done for the β - adrenoreceptor antagonists[1].A excellent sigmoidal relationship between PSA and FA after oral administration was obtained. Similar sigmoidal relationships can also be obtained for the topological PSA (TPSA). These results suggest that drugs with a PSA < 60 Å 2 are completely (more than 90%) absorbed, whereas drugs with a PSA > 40 Å are absorbed to less than 10%.This conclusion was later confirmed with the correct classification of a set endothelin receptor antagonists as having either low, intermediate or high permeability. PSA was also shown to play an important role in explaining human in vivo jejunum permeability[2]. A Model based on PSA and LogP for the prediction of drug absorption was developed for 199 well absorbed and 35 poorly absorbed compounds[3]. PSA In Blood brain barrier penetration(BBB) Drugs that act on the CNS need to be able to cross the BBB in order to reach their target, while minimal BBB penetration is required for other drugs to prevent CNS side effects. A common measure of BBB penetration is the ratio of drug conc’s in the brain and the blood, which is expressed as log (C brain /Cblood ). Van de Waterbeemd and Kansy were probably the first to correlate the PSA of a series of CNS drugs to their membrane transport. They obtained a fair correlation of brain uptake with single conformer PSA and molecular volume descriptors. Clark etal. Derived a model of 55 compounds using TPSA and LogP LogBB= 0.516-0.115* TPSA n= 55 r2 =0.686 r= 0.828 σ = 0.42 TPSA in combiantion with ClogP LogBB= 0.070-0.014*TPSA+0.169*ClogP n=55 r2 =0.787 r=0.887 σ =0.35 Great majority of orally administered CNS drugs have a PSA <70 Å2 . Non CNS compounds suggested that these have a PSA < 120Å2 . Thus to conclude a majority of the Non CNS penetrating and orally absorbed compounds have PSA values between 70 and 120 A2. . Partition coefficients P Xaqueous Xoctanol Partition coefficient P (usually expressed as log10P or logP) is defined as: P= [X]octanol [X]aqueous P is a measure of the relative affinity of a molecule for the lipid and aqueous phases in the absence of ionisation. 1-Octanol is the most frequently used lipid phase in pharmaceutical research. This is because: It has a polar and non polar region (like a membrane phospholipid) Po/w is fairly easy to measure Po/w often correlates well with many biological properties It can be predicted fairly accurately using computational models Calculation of logP LogP for a molecule can be calculated from a sum of fragmental or atom-based terms plus various corrections. logP = S fragments + S corrections H H C Branch O H H H C C C C H H C H C C N H H C H C H H H clogP for windows output C H C C N C O H Phenylbutazone C H C C C H H C H C: 3.16 M: 3.16 PHENYLBUTAZONE Class | Type | Log(P) Contribution Description Value FRAGMENT | # 1 | 3,5-pyrazolidinedione -3.240 ISOLATING |CARBON| 5 Aliphatic isolating carbon(s) 0.975 ISOLATING |CARBON| 12 Aromatic isolating carbon(s) 1.560 EXFRAGMENT|BRANCH| 1 chain and 0 cluster branch(es) -0.130 EXFRAGMENT|HYDROG| 20 H(s) on isolating carbons 4.540 EXFRAGMENT|BONDS | 3 chain and 2 alicyclic (net) -0.540 RESULT | 2.11 |All fragments measured clogP 3.165 What else does logP affect? logP Binding to enzyme / receptor Aqueous solubility Binding to P450 metabolising enzymes So log P needs to be optimised Absorption through membrane Binding to blood / tissue proteins – less drug free to act Binding to hERG heart ion channel cardiotoxicity risk Admet Descriptors Calculation Tools • PreADMET http://preadmet.bmdrc.org/ Molecular Descriptors Calculation - 1081 diverse molecular descriptors Drug-Likeness Prediction - Lipinski rule, lead-like rule, Drug DB like rule ADME Prediction - caco-2, MDCK, BBB, HIA, plasima protein binding and skin permeability data Toxicity Prediction - Ames test and rodent carcinogenicity assay • SPARC Online Calculator http://ibmlc2.chem.uga.edu/sparc/ • SPARC on-line calculator for prediction of pK,, solubility, polarizability, and other properties; search in the database of experimental pKa values is also available Daylight Chemical Information Systems www.daylight .com/ daycgi/clogp Calculation of log P by the CLOGP algorithm from BioByte; also access to the LOGPSTARdatabase of experimental log P data . Admet Tools Continued.. • Molinspiration Cheminformatics www.molinspiration.com/seruices/index. Calculation of molecular properties relevant to drug design and QSAR, including log P, polar surface area, Rule of Five parameters, and drug-likeness index • Pirika - www.pirika.com Calculation of various types of molecular properties, including boiling point, vapor pressure, and solubility; web demo restricted to only aliphatic molecules • Actelion -www.actelion.com/page/property_explorer Calculation of molecular weight, logP, solubility, drug-score and toxlcity risk . • Virtual Computational Chemistry Laboratory www. vcclab. org Prediction of log P and water solubility based on associative neural networks as well as other parameters; comparison of various prediction methods Continued....... b)QSAR: The goal of QSAR studies is to predict the activity of new compounds based solely on their chemical structure. The underlying assumption is that the biological activity can be attributed to incremental contributions of the molecular fragments determining the biological activity. This assumption is called the linear free energy principle. Information about the strength of interactions is captured for each compound by,for example, steric,electronic,and hydrophobic descriptors. Molecular similarity and searching Molecules What is it? Chemical, pharmacological or biological properties of two compounds match. The more the common features, the higher the similarity between two molecules. Chemical The two structures on top are chemically similar to each other. This is reflected in their common sub-graph, or scaffold: they share 14 atoms Pharmacophore The two structures above are less similar chemically (topologically) yet have the same pharmacological activity, namely they both are Angiotensin-Converting Enzyme (ACE) inhibitors Molecular similarity How to calculate it? Quantitative assessment of similarity/dissimilarity of structures need a numerically tractable form molecular descriptors, fingerprints, structural keys Sequences/vectors of bits, or numeric values that can be compared by distance functions, similarity metrics . E= Euclidean distance T = Tanimoto index E ( x, y ) n x i 1 i yi 2 T ( x, y) B( x & y ) B( x) B( y ) B( x & y ) Molecular descriptors a) chemical fingerprint hashed binary fingerprint o encodes topological properties of the chemical graph: connectivity, edge label (bond type), node label (atom type) o allows the comparison of two molecules with respect to their chemical structure Construction 1. find all 0, 1, …, n step walks in the chemical graph 2. generate a bit array for each walks with given number of bits set 3. merge the bit arrays with logical OR operation Molecular descriptors Example 1: chemical fingerprint Example CH3 – CH2 – OH walks from the first carbon atom length walk bit array 0 C 1010000000 1 C–H 0001010000 1 C–C 0001000100 2 C–C–H 0001000010 2 C–C–O 0100010000 3 C–C–O–H 0000011000 merge bit arrays for the first carbon atom: 1111011110 This example illustrates how a 10 bits long topological chemical fingerprint is created for a simple chain structure. In this example all walks up to 3 steps are considered, and 2 bits are set for each pattern. Molecular Similarity Example 1: chemical fingerprint 0100010100010100010000000001101010011010100000010100000000100000 0100010100010100010000000001101010011010100000000100000000100000 Molecular descriptors Example 2: pharmacophore fingerprint encodes pharmacophore properties of molecules as frequency counts of pharmacophore point pairs at given topological distance allows the comparison of two molecules with respect to their pharmacophore Construction 1. map pharmacophore point type to atoms 2. calculate length of shortest path between each pair of atoms 3. assign a histogram to every pharmacophore point pairs and count the frequency of the pair with respect to its distance Molecular descriptors Example 2: pharmacophore fingerprint Pharmacophore point type based coloring of atoms: acceptor, donor, hydrophobic, none. 12 12 11 11 10 10 9 9 8 8 7 7 6 6 5 5 4 4 3 3 2 2 1 1 0 0 A A A A A A D D D D D D D D D D D D H H H H H H H H H H H H H H H H H H A A A A A A A A A A A A D D D D D D A A A A A A D D D D D D H H H H H H 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 A A A A A A D D D D D D D D D D D D H H H H H H H H H H H H H H H H H H A A A A A A A A A A A A D D D D D D A A A A A A D D D D D D H H H H H H 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 Virtual screening using fingerprints Individual query structure 0101010100010100010100100000000000010010000010010100100100010000 query fingerprint query proximity 0000000100001101000000101010000000000110000010000100001000001000 0100010110010010010110011010011100111101000000110000000110001000 0100010100011101010000110000101000010011000010100000000100100000 0001101110011101111110100000100010000110110110000000100110100000 0100010100110100010000000010000000010010000000100100001000101000 0100011100011101000100001011101100110110010010001101001100001000 0101110100110101010111111000010000011111100010000100001000101000 0100010100111101010000100010000000010010000010100100001000101000 0001000100010100010100100000000000001010000010000100000100000000 0100010100010011000000000000000000010100000010000000000000000000 0100010100010100000000000000101000010010000000000100000000000000 0101010101111100111110100000000000011010100011100100001100101000 0100010100011000010000011000000000010001000000110000000001100000 0000000100000000010000100000000000001010100000000100000100100000 0100010100010100000000100000000000010000000000000100001000011000 0001000100001100010010100000010100101011100010000100001000101000 0100011100010100010000100001001110010010000010001100000000101000 0101010100010100010100100000000000010010000010010100100100010000 targets target fingerprints hits Hypothesis Fingerprints Advantages • strict conditions for hits if actives are fairly similar Disadvantages • false results with asymmetric metrics • misses common features of highly diverse sets • very sensitive to one missing feature • captures common features • less selective if actives are of more diverse active sets very similar • captures common features • less selective if actives are of more diverse active sets very similar • specific treatment of the absence of a feature • less sensitive to outliers SUMMARY • Virtual screening methods are central to many cheminformatics problems in: – Design – Selection – Analysis • Increasing numbers of molecules can be evaluated using these techniques • Reliability and accuracy remain as problems in docking and predicting ADMET properties • Need much more reliable and consistent experimental data