Inovace vzdělávání v chemii a biologii s ohledem na aktuální trendy

Transkript

Inovace vzdělávání v chemii a biologii s ohledem na aktuální trendy
OPVK CZ.1.07/2.2.00/28.0184
Drug-design - racionální návrh léčiv
KFC/DD
04 – strategie hledání léčiv
RNDr. Karel Berka, Ph.D.
ZS 2012/2013
Motto
tj. 18 hod 8 min 18 s
Osnova
• Možnosti racionálního návrhu léčiv
– Computer-aided drug design (CADD)
1. Neznámá struktura cíle
Ligand-based drug design (LBDD)
Hledání podobných struktur
Farmakofor
QSAR
2. Známá struktura cíle
Structure-based drug design (SBDD)
Dokování
de novo design
Racionální návrh léčiv – možnosti
Ligand–based DD
Structure–based DD
Hledání podobných ligandů
QSAR
Farmakofor
Dokování
Virtual screening
De novo design
Neznámá
struktura cíle
Známá struktura
cíle
Racionální návrh léčiv – možnosti II
Známe ligand
Neznáme ligand
Structure-based drug
design (SBDD)
De novo design
Dokování
Ligand-based drug design
(LBDD)
1 a více ligandů
• Hledání podobných ligandů
Několik ligandů
• Hledání Farmakoforu
Hodně ligandů (20+)
• Quantitative StructureActivity Relationships (QSAR)
CADD nelze použít
Nutné získat
experimentální data
Lze použít filtrování
pro ADMET
Virtuální screening
• Virtuální screening - in silico analog biologického testování
• S pomocí jedné, nebo více in silico techniky molekulu
– Skórovat
– Zařadit
– Vyfiltrovat
• K čemu?
– Které molekuly testovat (experimentálně)
– Jakou knihovnu nasyntetizovat
– Které molekuly koupit
– Analyzovat výsledky experimentů, např. HTS
Virtuální screening - průběh
AR Leach, VJ Gillet, An
Introduction to
Cheminformatics
Ligand-based drug design
• 1 a více ligandů
Podobnost (viz dříve)
• Několik ligandů
Farmakofor
• Hodně ligandů (20+)
Quantitative Structure-Activity Relationships
(QSAR)
Hledání podobnosti
Hledání látek s podobnou strukturou jako již
existující lead – může to vést k vylepšení biologické
aktivity
• 2D Substruktura
• 3D Substruktura
• 3D Konformační flexibilita
Podobnost k přírodnímu ligandu
NH2
N(CH3)2
H
N
HO
H3C
S
O
N
H
5-Hydroxytryptamine (5-HT)
Serotonin (a natural neurotransmitter
synthesized in certain neurons in the CNS)
O
N
H
Sumatriptan (Imitrex)
Used to treat migrain headaches
known to be a 5-HT1 agonist
Hledání 2D Substruktury
• Funkční skupiny
• Konektivita
Např. Halogen na
O
[F,Cl,Br,I]
O
aromatickém kruhu
společně s karboxylovou
skupinou
Cl
O
Cl
O
O
N
N
F
O
N
O
F
N
N
I
O
O
O
O
F
N
O
Hledání 3D Substruktury
A
• Vzdálenosti v prostoru
hrají větší roli
• Bioisostericita
• Ukládá se nejčastěji
konformace s nejnižší
energií
O(s1)
O(s1)
3.3 - 4.3 Å
O
6.8 - 7.8 Å
6
Steric Energy (kcal/mol)
5
3.6 - 4.6 Å
4
3
2
[O,S]
1
A
0
0
60
12 0
18 0
Dihedra l angle
24 0
30 0
36 0
C(u)
Bioisostericita
Young, D.C. Computational
Drug Design. Wiley, 2009.
Bioisostericita II
Young, D.C. Computational
Drug Design. Wiley, 2009.
Hledání vhodné konformace
(cca do 30 kJ/mol)
• ! „Klíče“ se přizpůsobují
„zámkům“ ale i „zámky“
se přizpůsobují „klíčům“
3.2Å
Cl
Cl
O H
4.3Å
O H
6
5
S te ric E ne rgy (kca l/mol)
• Rotuje všechny volně
rotovatelné vazby
• Hodně konformací => hodně
hitů
• Nutno zohlednit ty
konfoemace, které jsou
trochu energeticky
nevýhodné
4
3
2
1
0
0
60
12 0
18 0
Dihedra l angle
24 0
30 0
36 0
Quercetin
• Antioxidant
• skoro planární
(bariéra 16 kJ/mol)
• Úhel v krystalech
kolem 180°
• Ale 90° v dokování!
Wu, Chien-Ming, et al. Antiplatelet Effect and Selective Binding to
Cyclooxygenase (COX) by Molecular Docking Analysis of
Flavonoids and Lignans . Intl J Mol Sci 2007, 8, 830–841.
Paradox podobnosti
aneb ne vždy je vše jednoduché…
Aminogenistein (x cystické fibróze)
7-Hydroxy-2-(4-nitro-phenyl)-chromen-4-one
Pargyline (x hypertensi)
N-benzyl-N,1-dimethyl-2-propynylamine
Farmakofor
• Pharmacophore
• Hledání strukturního
motivu odpovědného za
farmakologickou aktivitu
(analog chromoforu)
• set geometrických
omezení mezi
specifickými funkčními
skupinami, které jsou
odpovědné za
biologickou aktivitu
Bojarski, Curr. Top. Med. Chem. 2006, 6, 2005.
Přehled Pharmacophore-based Drug Design
Naměřená aktivita
Testování aktivity
Koupě a nebo
syntéza hitů
See also John Van Drie’s
http://pharmacophore.org
Vytvoření
farmakoforu
Prohledávání
knihoven pro
kandidáty na
aktivní látky
Hledání farmakoforů - postup
• Není zapotřebí proteinová struktura
– Ale může se hodit – například se dá vytvořit farmakofor na základě
analýzy aktivního místa
• Předpoklad:
– Všechny (většina) známých aktivních látek se váže do stejného
místa
• Generování farmakoforů
– Identifikace charakteristických „farmakoforických“ vlastností
• (donory a akceptory vodíkových vazeb, lipofilické skupiny, rozložení
náboje)
– Nalezení geometrického uspořádání farmakoforických vlastností,
které se ve všech aktivních molekulách vyskytuje ve stabilní
konformaci (s nízkou energií)
• Hledání farmakoforů
– prohledáváme všechny molekuly, které splňují farmakofor ve
stabilní konformaci
– Scaffold-hopping
• Není zapotřebí strukturní podobnost ve smyslu 2D porovnávání
• Stačí se trefit do farmakoforu
Příklad farmakoforu pro HIV
Asp25
O
Gly27
O
Geometrické uspořádání
různých typů funkčních
skupin, které jsou
zapotřebí pro aktivitu HIV
proteázy (aktivní místo)
O
Donor
6.9 Å
Donor
6.0 Å
10.4 Å
5.2 Å
6.3 Å
Hydrophobic
H3C
Ile50
CH3
CH
C
H2
Acceptor
H
N
Identifikace farmakoforu
1) analýza receptoru
Asp25
Asp25
O
Gly27
Gly27
O
O
O
Acceptor
O
6.9
Acceptor
or Anion
O
12.2 Å
9.6 Å
8.8 Å
Donor
Hydrophobic
H3C
Ile50
CH3
CH
C
H2
6.3Å CH3
H
N
H3C
Ile50
CH
C
H2
H
N
Identifikace farmakoforu
2) definice typů
3) hledání vzdáleností
Asp25
Gly27
O
Acceptor O
or Anion
Acceptor
Hydrophobic
Hydrophobic
Ile50
CH3
CH
C
H2
O
8.8 Å
Donor
6.9 Å
12.2 Å
H
H3 C
Ile50
6.3 Å
6.3Å CH3
CH
C
H2
9.6 Å
6.0 Å
10.4 Å
5.2 Å
Donor Hydrophobic
N
O
6.9
Donor
Acceptor
O
Gly27
Donor
Donor
H3 C
Asp25
O
Acceptor
H
N
Finální Farmakofor
6.9 Å
Donor
Donor
6.0 Å
10.4 Å
5.2 Å
6.3 Å
Acceptor
Hydrophobic
Poslední krok:
Hledání molekul v databázi (konformací), které splňují dotyčný farmakofor.
QSAR
• Quantative Structure-Activity Relationships
• matematický vztah mezi biologickou
aktivitou a jejími geometrickými a
chemickými vlastnostmi
• nalezená “pravidla” – hledání aktivity nových
molekul
Sloučeniny s aktivitou
QSAR
Nové sloučeniny s
predikovanou aktivitou
Why QSAR?
The number of compounds required for
synthesis in order to place 10 different groups
in 4 positions of benzene ring is 104
Solution: synthesize a small number of
compounds and from their data derive rules
to predict the biological activity of other
compounds.
3D-QSAR Assumptions
The effect is produced by modeled compound and
not it’s metabolites.
The proposed conformation is the bioactive one.
The binding site is the same for all modeled
compounds.
The biological activity is largely explained by enthalpic
processes.
Entropic terms are similar for all the compounds.
The system is considered to be at equilibrium, and
kinetics aspects are usually not considered.
Pharmacokinetics: solvent effects, diffusion, transport
are not included.
General Procedure of QSAR
• Select a set of molecules interacting with the same
receptor with known activities.
• Calculate features (e.g. physicalchemical properties, etc.,
2D, 3D)
• Divide the set to two subgroups: one for training and one
for testing.
• Build a model: find the relations between the activities
and properties (regression problem, statistic methods,
machine learning approaches, etc).
• Test the model on the testing dataset.
• Publish a paper if your results are good!
• You can also develop new descriptors, new
methodologies, algorithms, etc.
Advantages of QSAR
• Quantifying the relationship between structure
and activity provides an understanding of the
effect of structure on activity.
• It is also possible to make predictions leading to
the synthesis of novel analogues.
• The results can be used to help understand
interactions between functional groups in the
molecules of greatest activity, with those of their
target
Statistical Concepts
• Input: n descriptors P1,..Pn and the value of
biological activity (EC50 for example) for m
compounds
Bio
P1
Cpd 1
0.7
3.7
Cpd2
3.2
0.4
…….
Cpdm
P2
……
..
..
..
..
Pn
Outline
• Hammett Relationships
• log P : Octanol-water partition coefficients
– uses in Pharmaceutical Chemistry
– uses in Environmental Chemistry
– uses in Chromatography
• Other Descriptors
• Multivariate Least Squares
• Nicotinic Agonists - Neurobiology
Hammett Relationships
• pKa of benzoic acids
• Effect of electron withdrawing and donating
groups
• based on rG = - RT ln Keq
pKa Substituted Benzoic Acids
• log Ka - log KaH = 
• K aH is the reference compound- unsubstituted
log Ka
O
O
H
-1
R1
-0,5
1
0,8
0,6
0,4
0,2
0
-0,2 0
-0,4
-0,6
-0,8
0,5
1
sigma
Hammett  Constants
Group
-NH 2
-OH
-OCH 3
-CH 3
-H
-F
-Cl
-COOH
-CN
-NO 2
p
m
-0.57
-0.38
-0.28
-0.14
0
0.15
0.24
0.44
0.70
0.81
-0.09
0.13
0.10
-0.06
0
0.34
0.37
0.35
0.62
0.71
Sigma-rho plots
•
•
•
•
•
One application of QSPR
Activity = r  + constant
Y = mx + b
: descriptor
r : slope
Octanol-Water Partition Coefficients
• P = C(octanol)
C(water)
• log P
like rG = - RT ln Keq
• Hydrophobic hydrophilic character
• P increases then more
hydrophobic
Octanol
H2O
QSAR and log P
Isonarcotic Activity of Esters, Alcohols, Ketones, and Ethers with
Tadpoles
Compound
CH3 OH
C2 H5 OH
CH3 COCH3
(CH3 ) 2 CHOH
(CH3 ) 3 COH
CH3 CH2 CH2 OH
CH3 COOCH3
C2 H5 COCH3
HCOOC2 H5
C2 H5 COC2 H5
(CH3 ) 2 C( C2 H5 )OH
CH3 (CH2 ) 3 OH
(CH3 ) 2 CHCH2 OH
CH3 COOC2 H5
C2 H5 COC2 H5
CH3 (CH2 ) 4 OH
CH3 CH2 CH2 COCH3
CH3 COOCH2 C2 H5
C2 H5 COOC2 H5
(CH3 ) 2 CHCOOC2 H5
log(1/C)
0.30
0.50
0.65
0.90
0.90
1.00
1.10
1.10
1.20
1.20
1.20
1.40
1.40
1.50
1.50
1.60
1.70
2.00
2.00
2.20
log P
-1.27
-0.75
-0.73
-0.36
0.07
-0.23
-0.38
-0.27
-0.38
0.59
0.59
0.29
0.16
0.14
0.31
0.81
0.31
0.66
0.66
1.05
QSAR and log P
Isonarcotic Activity of Esters, Alcohols, Ketones, and Ethers with
Tadpoles
log(1/C)
2.5
y = 0.7315x + 1.2211
2
R2 = 0.7767
R = 0.881
1.5
n = 20
1
0.5
0
-2
-1
0
log P
1
2
Isonarcotic Activity of Esters, Alcohols,
Ketones, and Ethers with Tadpoles
• log(1/C) = 0.869 log P + 1.242
• n = 28
r = 0.965
• subset of alcohols:
log(1/C) = 1.49 log P - 0.10 (log P)2 + 0.50
n = 10 r = 0.995
log P
hydrophobic
benzene 2.13
pentanol 0.81
n-propanol -0.23
isopropanol -0.36
ethanol -.75
methanol -1.27
hydrophillic
butylamine 0.85
pyridine 0.64
diethylamine 0.45
imidazole -0.08
phenylalanine -1.38
tetraethylammonium iodide -2.82
alanine -2.85
Molecule Properties
SPC : Structure Property Correlation
MOLECULE
STRUCTURE
INTRINSIC PROPERTIES
Molar Volume
Connectivity Indices
Charge Distribution
Molecular Weight
Polar surface Area....
.......
CHEMICAL PROPERTIES
pKa
Log P
Solubility
Stability
BIOLOGICAL PROPERTIES
Activity
Toxicity
Biotransformation
Pharmacokinetics
Molecule Descriptors
o Molecular descriptors are numerical values that
characterize properties of molecules.
o The descriptors fall into Four classes .
a) Topological
b) Geometrical
c) Electronic
d) Hybrid or 3D Descriptors
Classification of Descriptors
Topological Descriptors
Topological descriptors are derived directly from the connection table representation of the
structure which include:
a) Atom and Bond Counts
b) substructure counts
c) molecular connectivity Indices (Weiner Index , Randic Index, Chi Index)
d) Kappa Indices
e) path descriptors
f) distance-sum Connectivity
g) Molecular Symmetry
Deskriptory
•
•
•
•
Molar Volume, Vm
Surface area
Rotatable Bonds, Rotbonds, b_rotN
Atomic Polarizability, Apol
– Ease of distortion of electron clouds
– sum of Van der Waals A coefficients
• Molecular Refractivity, MR
– size and polarizability
– local non-lipophilic interactions
Geometrical Descriptors
Geometrical descriptors are derived from the three-dimensional
representations and include:
a) principal moments of inertia,
b) molecular volume,
c)solvent-accessible surface area,
d) Charged partial Surface area
e) Molecular Surface area
Electronic Descriptors
Electronic descriptors characterize the molecular Strcutures with such
quantities :
a)
b)
c)
d)
e)
f)
dipole moment,
Quadrupole moment,
polarizibility,
HOMO and LUMO energies,
Dielectric energy
Molar Refractivity
Hybrid and 3D Descriptors
a)
b)
c)
d)
e)
f)
g)
h)
i)
geometric atom pairs and topological torsions
spatial autocorrelation vectors
WHIM indices
BCUTs
GETAWAY descriptors
Topomers
pharmacophore fingerprints
Eva Descriptors
Descriptors of Molecular Field
Limit Of Descriptors
 The data set should contain at least 5 times as
many compounds as descriptor in the QSAR.
 The reason for this is that too few compounds
relative to the number of descriptors will give a
falsely high correlation:



2 point exactly determine a line.
3 points exactly determine a plane (etc.)
A data set of drug candidate that is similar in
size meaningless correlation
Atomic Polarizability, Apol
• Atomic Polarizability
– Ease of distortion of electron clouds
– sum of Van der Waals A coefficients
A
B
EVdW,ij = - r 6 + r 12
ij
ij
Molecular Refractivity, MR
• Molecular Refractivity, MR
– size and polarizability
– local non-lipophilic interactions
Lorentz-Lorentz equation:
2
(n - 1) MW
MR = (n2 + 2)  d 


Group Additive Properties, GAPs
Substituent
Volume (SA)
-H
1.48
-CH3
18.78
-CH2CH3
35.35
-CH2CH2CH3
51.99
-CH(CH3)2
51.33
-CH2CH2CH2CH3
68.63
-C(CH3)3
86.99
-C6H5
72.20
-F
7.05
-Cl
15.85
MR

Rot Bonds
0.10 0 (reference)
0
0.57
0.56
0
1.03
1.02
1
1.5
1.55
2
1.5
1.53
1
1.96
2.13
3
1.96
1.98
1
2.54
1.96
1
0.10
0.14
0
0.60
0.71
0
QSAR and 3D-QSAR Software
Tripos – CoMFA
VolSurf
Catalyst
Serius
QSAR+
Schrodinger
DISCOVER
Tools To calculate Molecular
Descriptors Freely available
• CDK tool
http://rguha.net/code/java/cdkdesc.html
• POWER MV
http://nisla05.niss.org/PowerMV/?q=PowerMV/
• MOLD2
http://www.fda.gov/ScienceResearch/BioinformaticsT
ools/Mold2/default.htm
• PADEL Descriptor
http://www.downv.com/Windows/install-PaDELDescriptor-10439915.htm
Admet Descriptors to Screen Molecules
Bioavailability
The Bioavailability of a compound is classified as :
Bioavailability
Absorbtion
Permeability
Lipophilicity
Hydrogen Bonding
Liver Metabolism
Gut-wall Metabolism
Solubility
Molecular Size/Shape
Transporters
Flexibility
PREDICTION OF
ADMET PROPERTIES
• Requirements for a drug:
– Must bind tightly to the biological target in vivo
– Must pass through one or more physiological barriers (cell
membrane or blood-brain barrier)
– Must remain long enough to take effect
– Must be removed from the body by metabolism, excretion,
or other means
• ADMET: Absorption, Distribution, metabolism,
Excretion (Elimination), Toxicity
Lipinski Rule of Five(Oral Drug Properties)
• Poor absorption or permeation is more likely
when:
– MW > 500
– LogP >5
– More than 5 H-bond donors (sum of OH and NH
groups)
– More than 10 H-bond acceptors (sum of N and O
atoms)
Polar Surface Area
o Defined as amount of molecular surface(vander-walls) arising from polar
atoms(Nitrogen and oxygen atom together with attached hydrogens)
o PSA seems to optimally encode those drug properties which play an important
role in membrane penetration: molecular polarity, H - bonding features and also
solubility.
o It provide excellent correlations with transport properties of drugs.(PSA used in
the Prediction of Oral absorbtion,Brain penetration, Intestinal Absorption, Caco-2permeability)
o It has also been effectively used to characterize drug likeness during virtual
screening & combinatorial library design.
o The
calculation
of
PSA,
however,
is
rather
timeconsuming because of the necessity to generate a reasonable 3D
molecular geometry and the calculation of the surface itself.
o Peter Ertl introduced an extremely rapid method to obtain PSA descriptor
simply from the sum of contributions of polar fragments in a molecule without
the necessity to generate its three - dimensional (3D) geometry.
PSA In Intestinal absorption
•
•
•
•
Intestinal absorption is usually expressed as fraction absorbed (FA), expressing the
percentage of initial dose appearing in a portal vein.
A model for PSA was done for the β - adrenoreceptor antagonists[1].A excellent
sigmoidal relationship between PSA and FA after oral administration was obtained.
Similar sigmoidal relationships can also be obtained for the topological PSA (TPSA).
These results suggest that drugs with a PSA < 60 Å 2 are completely (more than
90%) absorbed, whereas drugs with a PSA > 40 Å are absorbed to less than
10%.This conclusion was later confirmed with the correct classification of a set
endothelin receptor antagonists as having either low, intermediate or high
permeability.
PSA was also shown to play an important role in explaining human in vivo jejunum
permeability[2]. A Model based on PSA and LogP for the prediction of drug
absorption was developed for 199 well absorbed and 35 poorly absorbed
compounds[3].
PSA In Blood brain barrier
penetration(BBB)
 Drugs that act on the CNS need to be able to cross the BBB in order to reach their target,
while minimal BBB penetration is required for other drugs to prevent CNS side effects.
 A common measure of BBB penetration is the ratio of drug conc’s in the brain and the blood,
which is expressed as log (C brain /Cblood ).
 Van de Waterbeemd and Kansy were probably the first to correlate the PSA of a series of CNS
drugs to their membrane transport. They obtained a fair correlation of brain uptake with
single conformer PSA and molecular volume descriptors.
 Clark etal. Derived a model of 55 compounds using TPSA and LogP
LogBB= 0.516-0.115* TPSA
n= 55 r2 =0.686 r= 0.828 σ = 0.42
TPSA in combiantion with ClogP
LogBB= 0.070-0.014*TPSA+0.169*ClogP
n=55 r2 =0.787 r=0.887 σ =0.35
 Great majority of orally administered CNS drugs have a PSA <70 Å2 . Non CNS compounds
suggested that these have a PSA < 120Å2 .
 Thus to conclude a majority of the Non CNS penetrating and orally absorbed compounds
have PSA values between 70 and 120 A2.
.
Partition coefficients
P
Xaqueous
Xoctanol
Partition coefficient P (usually expressed as log10P or logP) is defined as:
P=
[X]octanol
[X]aqueous
P is a measure of the relative affinity of a molecule for the lipid and aqueous phases in the
absence of ionisation.
1-Octanol is the most frequently used lipid phase in pharmaceutical research. This is
because:




It has a polar and non polar region (like a membrane phospholipid)
Po/w is fairly easy to measure
Po/w often correlates well with many biological properties
It can be predicted fairly accurately using computational models
Calculation of logP
LogP for a molecule can be calculated from a sum of fragmental or
atom-based terms plus various corrections.
logP = S fragments + S corrections
H
H
C
Branch
O
H
H
H
C
C
C
C H
H
C H
C
C
N
H H
C
H
C
H
H
H
clogP for windows output
C
H C
C
N
C
O
H
Phenylbutazone
C
H
C
C
C
H
H
C
H
C: 3.16 M: 3.16 PHENYLBUTAZONE
Class
| Type | Log(P) Contribution Description
Value
FRAGMENT | # 1 | 3,5-pyrazolidinedione
-3.240
ISOLATING |CARBON| 5 Aliphatic isolating carbon(s)
0.975
ISOLATING |CARBON| 12 Aromatic isolating carbon(s)
1.560
EXFRAGMENT|BRANCH| 1 chain and 0 cluster branch(es) -0.130
EXFRAGMENT|HYDROG| 20 H(s) on isolating carbons
4.540
EXFRAGMENT|BONDS | 3 chain and 2 alicyclic (net)
-0.540
RESULT | 2.11 |All fragments measured
clogP 3.165
What else does logP affect?
logP
Binding to
enzyme /
receptor
Aqueous
solubility
Binding to
P450
metabolising
enzymes
So log P needs to be optimised
Absorption
through
membrane
Binding to
blood / tissue
proteins –
less drug free
to act
Binding to
hERG heart
ion channel cardiotoxicity
risk
Admet Descriptors Calculation Tools
• PreADMET http://preadmet.bmdrc.org/
 Molecular Descriptors Calculation - 1081 diverse molecular descriptors
 Drug-Likeness Prediction - Lipinski rule, lead-like rule, Drug DB like rule
 ADME Prediction - caco-2, MDCK, BBB, HIA, plasima protein binding and skin
permeability data
 Toxicity Prediction - Ames test and rodent carcinogenicity assay
• SPARC Online Calculator http://ibmlc2.chem.uga.edu/sparc/
•
SPARC on-line calculator for prediction of pK,, solubility, polarizability, and other
properties; search in the database of experimental pKa values is also available
Daylight Chemical Information Systems
www.daylight .com/ daycgi/clogp
Calculation of log P by the CLOGP algorithm from BioByte; also access to the
LOGPSTARdatabase of experimental log P data .
Admet Tools Continued..
• Molinspiration Cheminformatics
www.molinspiration.com/seruices/index.
Calculation of molecular properties relevant to drug design and QSAR, including log P, polar
surface area, Rule of Five parameters, and drug-likeness index
• Pirika - www.pirika.com
Calculation of various types of molecular properties, including boiling point, vapor pressure,
and solubility; web demo restricted to only aliphatic molecules
•
Actelion -www.actelion.com/page/property_explorer
Calculation of molecular weight, logP, solubility, drug-score and toxlcity risk .
•
Virtual Computational Chemistry Laboratory www. vcclab. org
Prediction of log P and water solubility based on associative neural networks as well as other
parameters; comparison of various prediction methods
Continued.......
b)QSAR: The goal of QSAR studies is to predict the activity of new
compounds based solely on their chemical structure. The underlying
assumption is that the biological activity can be attributed to incremental
contributions of the molecular fragments determining the biological
activity. This assumption is called the linear free energy principle.
Information about the strength of interactions is captured for each
compound by,for example, steric,electronic,and hydrophobic descriptors.
Molecular similarity and searching Molecules
What is it?
Chemical, pharmacological or biological properties of two compounds match.
The more the common features, the higher the similarity between two
molecules.
Chemical
The two structures on top are chemically similar to each other. This is reflected in their
common sub-graph, or scaffold: they share 14 atoms
Pharmacophore
The two structures above are less similar chemically (topologically) yet have the same
pharmacological activity, namely they both are Angiotensin-Converting Enzyme (ACE)
inhibitors
Molecular similarity
How to calculate it?
Quantitative assessment of similarity/dissimilarity of structures
 need a numerically tractable form
 molecular descriptors, fingerprints, structural keys
Sequences/vectors of bits, or numeric values that can be compared by
distance functions, similarity metrics .
E= Euclidean distance
T = Tanimoto index
E ( x, y ) 
n
 x
i 1
i
 yi 
2
T ( x, y) 
B( x & y )
B( x)  B( y )  B( x & y )
Molecular descriptors
a) chemical fingerprint
hashed binary fingerprint
o encodes topological properties of the chemical graph: connectivity,
edge label (bond type), node label (atom type)
o allows the comparison of two molecules with respect to their
chemical structure
Construction
1. find all 0, 1, …, n step walks in the chemical graph
2. generate a bit array for each walks with given number of bits set
3. merge the bit arrays with logical OR operation
Molecular descriptors
Example 1: chemical fingerprint
Example
CH3 – CH2 – OH
walks from the first carbon atom
length walk
bit array
0
C
1010000000
1
C–H
0001010000
1
C–C
0001000100
2
C–C–H
0001000010
2
C–C–O
0100010000
3
C–C–O–H
0000011000
merge bit arrays for the first carbon atom: 1111011110
This example illustrates how a 10 bits long topological chemical fingerprint is
created for a simple chain structure. In this example all walks up to 3 steps are
considered, and 2 bits are set for each pattern.
Molecular Similarity
Example 1: chemical fingerprint
0100010100010100010000000001101010011010100000010100000000100000
0100010100010100010000000001101010011010100000000100000000100000
Molecular descriptors
Example 2: pharmacophore fingerprint
 encodes pharmacophore properties of molecules as frequency
counts of pharmacophore point pairs at given topological distance
 allows the comparison of two molecules with respect to their
pharmacophore
Construction
1. map pharmacophore point type to atoms
2. calculate length of shortest path between each pair of atoms
3. assign a histogram to every pharmacophore point pairs and count
the frequency of the pair with respect to its distance
Molecular descriptors
Example 2: pharmacophore fingerprint
Pharmacophore point type based
coloring of atoms: acceptor, donor,
hydrophobic, none.
12
12
11
11
10
10
9
9
8
8
7
7
6
6
5
5
4
4
3
3
2
2
1
1
0
0
A A A A A A D D D D D D D D D D D D H H H H H H H H H H H H H H H H H H
A A A A A A A A A A A A D D D D D D A A A A A A D D D D D D H H H H H H
1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6
A A A A A A D D D D D D D D D D D D H H H H H H H H H H H H H H H H H H
A A A A A A A A A A A A D D D D D D A A A A A A D D D D D D H H H H H H
1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6
Virtual screening using fingerprints
Individual query structure
0101010100010100010100100000000000010010000010010100100100010000
query fingerprint
query
proximity
0000000100001101000000101010000000000110000010000100001000001000
0100010110010010010110011010011100111101000000110000000110001000
0100010100011101010000110000101000010011000010100000000100100000
0001101110011101111110100000100010000110110110000000100110100000
0100010100110100010000000010000000010010000000100100001000101000
0100011100011101000100001011101100110110010010001101001100001000
0101110100110101010111111000010000011111100010000100001000101000
0100010100111101010000100010000000010010000010100100001000101000
0001000100010100010100100000000000001010000010000100000100000000
0100010100010011000000000000000000010100000010000000000000000000
0100010100010100000000000000101000010010000000000100000000000000
0101010101111100111110100000000000011010100011100100001100101000
0100010100011000010000011000000000010001000000110000000001100000
0000000100000000010000100000000000001010100000000100000100100000
0100010100010100000000100000000000010000000000000100001000011000
0001000100001100010010100000010100101011100010000100001000101000
0100011100010100010000100001001110010010000010001100000000101000
0101010100010100010100100000000000010010000010010100100100010000
targets
target fingerprints
hits
Hypothesis Fingerprints
Advantages
• strict conditions for hits if
actives are fairly similar
Disadvantages
• false results with
asymmetric metrics
• misses common features of
highly diverse sets
• very sensitive to one
missing feature
• captures common features • less selective if actives are
of more diverse active sets very similar
• captures common features • less selective if actives are
of more diverse active sets very similar
• specific treatment of the
absence of a feature
• less sensitive to outliers
SUMMARY
• Virtual screening methods are central to many
cheminformatics problems in:
– Design
– Selection
– Analysis
• Increasing numbers of molecules can be evaluated
using these techniques
• Reliability and accuracy remain as problems in
docking and predicting ADMET properties
• Need much more reliable and consistent
experimental data