Bioinformatika

Transkript

Bioinformatika
Bioinformatika
pro PřfUK 2003
Jiří Vondrášek
Ústav organické chemie a biochemie
[email protected]
Jan Pačes
Ústav molekulární genetiky
[email protected]
http://bio.img.cas.cz/PrfUK2003
Databáze: obsah
principy
SQL
formáty biologických sekvencí
IUB kódy
DNA databáze
proteinové a genomové databáze
strukturní databáze
organizace databází
Relační databáze
c_id
identifikátor, číslo
a_id
identifikátor
title
text
c_id
identifikátor
journal
krátký text
name
krátký text
year
datum
…
…
k_id
identifikátor
c_id
identifikátor
keyword krátký text
SQL: Structured Query Language
c_id
identifikátor, číslo
title
text
journal
krátký text
year
datum
…
…
CREATE TABLE article (
c_id
INTEGER,
title
TEXT,
journal
VARCHAR(30),
year
DATE
);
SQL: Structured Query Language
a_id
identifikátor
c_id
identifikátor
name
krátký text
CREATE TABLE author (
a_id
INTEGER,
c_id
INTEGER,
name
VARCHAR(30)
);
SQL: Structured Query Language
INSERT INTO article SET
c_id
= '1',
title
= 'Something absolutely fantastic',
journal = 'Bioinformatics',
year
= '2002';
INSERT INTO author SET
a_id
= '1',
c_id
= '1',
name = 'Paces, Jan';
INSERT INTO author SET
a_id
= '2',
c_id
= '1',
name = 'Vondrasek, Jiri';
SQL: Structured Query Language
SELECT article.title,article.journal,author.name
FROM article,journal
WHERE article.c_id = author.c_id AND
article.year > '2000' AND
author.name LIKE 'Paces%';
IUB kódy
nukleotidy
kód
A
C
G
T
(U
M
R
W
S
Y
K
V
H
D
B
N
-
nukleotidy komplement
A
T
C
G
G
C
T
A
U)
A
AC
K
AG
Y
AT
S
CG
W
CT
R
GT
M
ACG
B
ACT
D
AGT
H
CGT
V
ACGT
N
mezera
-
aminokyseliny
kód
A
C
D
G
H
I
K
L
M
N
P
Q
R
S
T
V
W
Y
B
třípísmenný kód
Ala
Cys
Asp
Glu
His
Ile
Lys
Leu
Met
Asn
Pro
Gln
Arg
Ser
Thr
Val
Trp
Tyr
Asx
Z
Glx
X
Xxx
*
---
aminokyselina
alanin
cystein
asparagová kyselina
glutamová kyselina
histidin
isoleucin
lysin
leucin
methionin
asparagin
prolin
glutamin
arginin
serin
threonin
valin
tryptofan
tyrosin
asparagová kys. nebo
asparagin
glutamová kys. nebo
glutamin
jakákoliv
aminokyselina
stop
formáty sekvencí
binární
textové
s chromatogramy
SCF
ALF
ABI
pro programy
interní formáty databází
minimální
text
fasta
anotované
EMBL
GenBank
ASN
XML
formáty sekvencí - SCF
SCF (standart chromatogram file)
formáty sekvencí - EMBL
EMBL (formát databáze EMBL)
ID
XX
AC
XX
SV
XX
DT
DT
XX
DE
XX
KW
XX
OS
OC
OC
XX
RN
RP
RA
RT
RT
RL
XX
RN
RP
RA
RT
RL
RL
RL
XX
FH
…
AF031150
standard; RNA; ROD; 1379 BP.
AF031150;
AF031150.1
27-FEB-1998 (Rel. 54, Created)
27-FEB-1998 (Rel. 54, Last updated, Version 1)
Mus musculus paired-box transcription factor (Pax4) mRNA, complete cds.
.
Mus musculus (house mouse)
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia;
Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Mus.
[1]
1-1379
Inoue H., Nomiyama J., Nakai K., Matsutani A., Tanizawa Y., Oka Y.;
Isolation of full-length cDNA of mouse PAX4 gene and identification of its
human homologue;
Biochem. Biophys. Res. Commun. 243:628-633(1998).
[2]
1-1379
Inoue H., Nomiyama J., Nakai K., Tanizawa Y., Oka Y.;
;
Submitted (23-OCT-1997) to the EMBL/GenBank/DDBJ databases.
Third Dept. of Int. Med., Yamaguchi University, 1144 Kogushi, Ube,
Yamaguchi 755, Japan
Key
Location/Qualifiers
formáty sekvencí - EMBL
EMBL (formát databáze EMBL)
…
FH
FH
FT
FT
FT
FT
FT
FT
FT
FT
FT
FT
FT
FT
FT
FT
FT
FT
XX
SQ
Key
Location/Qualifiers
source
1..1379
/db_xref=taxon:10090
/organism=Mus musculus
/cell_line=MIN6
297..1346
/codon_start=1
/gene=Pax4
/product=paired-box transcription factor
/protein_id=AAC40046.1
/translation=MQQDGLSSVNQLGGLFVNGRPLPLDTRQQIVQLAIRGMRPCDISR
SLKVSNGCVSKILGRYYRTGVLEPKCIGGSKPRLATPAVVARIAQLKDEYPALFAWEIQ
HQLCTEGLCTQDKAPSVSSINRVLRALQEDQSLHWTQLRSPAVLAPVLPSPHSNCGAPR
GPHPGTSHRNRTIFSPGQAEALEKEFQRGQYPDSVARGKLAAATSLPEDTVRVWFSNRR
AKWRRQEKLKWEAQLPGASQDLTVPKNSPGIISAQQSPGSVPSAALPVLEPLSPSFCQL
CCGTAPGRCSSDTSSQAYLQPYWDCQSLLPVASSSYVEFAWPCLTTHPVHHLIGGPGQV
PSTHCSNWP
CDS
Sequence 1379 BP; 327
aaaaaaaaaa aaaaagcggc
aaggctctgt gaagctctgg
accagaccac cagcaaaccc
ccaccttttt tcctccatcc
gttttcagtt tgccagttgg
agcaggacgg actcagcagt
A; 402 C; 347 G; 303 T; 0 other;
cgctgaattc tagcagaagg ctgccctctg
accccctggc aggactgaag cagctggagg
tggagcctgc acaggaccct gagacctctt
agaaccagtc ccaaagagaa acttccagaa
cttcctgtcc ttctgtgagg agtaccagtg
gtgaatcagc tagggggact ctttgtgaat
ctcctgagtg
ctgttacaag
cctggaattc
ggagctctcc
tgaagcatgc
ggccggcccc
60
120
180
240
300
360
gctgtgggac
cctactggga
ggccctgcct
caacccattg
agatgttcca
ctccttcctg
cctgtgcatc
ccataagagg
tatctccaac
gaatttgcct
caagtgccat
aaacctttt
1200
1260
1320
1379
…
//
agcaccaggc
ctgccaatcc
caccacccat
ctcaaactgg
gtgacacctc
tggcttcctc
atctgattgg
cctctatttg
atcccaggcc
ctcatatgtg
aggcccagga
acagtaataa
formáty sekvencí - GenBank
Genbank
LOCUS
DEFINITION
ACCESSION
VERSION
KEYWORDS
SOURCE
ORGANISM
AF145233
1360 bp
mRNA
ROD
23-OCT-1999
Mus musculus transcription factor PAX4 (Pax4) mRNA, complete cds.
AF145233
AF145233.1 GI:6102607
.
house mouse.
Mus musculus
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Mus.
REFERENCE
1 (bases 1 to 1360)
AUTHORS
Kalousova,A., Benes,V., Paces,J., Paces,V. and Kozmik,Z.
TITLE
DNA binding and transactivating properties of the paired and
homeobox protein Pax4
JOURNAL
Biochem. Biophys. Res. Commun. 259 (3), 510-518 (1999)
MEDLINE
99294619
PUBMED
10364449
REFERENCE
2 (bases 1 to 1360)
AUTHORS
Kalousova,A., Paces,J. and Kozmik,Z.
TITLE
Direct Submission
JOURNAL
Submitted (23-APR-1999) Dept. of Transcription Regulation,
Institute of Molecular Genetics, Videnska 1083, Prague 142 20,
Czech Republic
FEATURES
Location/Qualifiers
source
1..1360
/organism="Mus musculus"
/db_xref="taxon:10090"
gene
1..1360
/gene="Pax4"
CDS
211..1260
/gene="Pax4"
/note="DNA binding protein; paired box protein; homeobox
protein"
/codon_start=1
/product="transcription factor PAX4"
/protein_id="AAF03533.1"
…
formáty sekvencí - GenBank
Genbank
CDS
211..1260
/gene="Pax4"
/note="DNA binding protein; paired box protein; homeobox
protein"
/codon_start=1
/product="transcription factor PAX4"
/protein_id="AAF03533.1"
/db_xref="GI:6102608"
/translation="MQQDGLSSVNQLGGLFVNGRPLPLDTRQQIVQLAIRGMRPCDIS
RSLKVSNGCVSKILGRYYRTGVLEPKCIGGSKPRLATPAVVARIAQLKDEYPALFAWE
IQHQLCTEGLCTQDKAPSVSSINRVLRALQEDQSLHWTQLRSPAVLAPVLPSPHSNCG
APRGPHPGTSHRNRTIFSPGQAEALEKEFQRGQYPDSVARGKLAAATSLPEDTVRVWF
SNRRAKWRRQEKLKWEAQLPGASQDLTVPKNSPGIISAQQSPGSVPSAALPVLEPLSP
SFCQLCCGTAPGRCSSDTSSQAYLQPYWDCQSLLPVASSSYVEFAWPCLTTHPVHHLI
GGPGQVPSTHCSNWP"
359 a
381 c
328 g
292 t
BASE COUNT
ORIGIN
1 tggcaggact
61 ctgcacagga
121 agtcccaaag
181 gtccttctgt
…
1081 tccagtgaca
1141 cctgtggctt
1201 catcatctga
1261 gaggcctcta
1321 aaaaaaaaaa
//
gaagcagctg
ccctgagacc
agaaacttcc
gaggagtacc
gaggctgtta
tcttcctgga
agaaggagct
agtgtgaagc
caagaccaga
attcccacct
ctccgttttc
atgcagcagg
ccaccagcaa
tttttcctcc
agtttgccag
acggactcag
accctggagc
atccagaacc
ttggcttcct
cagtgtgaat
cctcatccca
cctcctcata
ttggaggccc
tttgacagta
aaaaaaaaaa
ggcctatctc
tgtggaattt
aggacaagtg
ataaaaacct
aaaaaaaaaa
caaccctact
gcctggccct
ccatcaaccc
tttcttagat
aaaaaaaaaa
gggactgcca
gcctcaccac
attgctcaaa
gttaaaaaaa
atccctcctt
ccatcctgtg
ctggccataa
aaaaaaaaaa
formáty sekvencí - FastA
fasta
>gi|6102607|gb|AF145233.1|AF145233 Mus musculus transcription factor PAX4 (Pax4) mRNA, complete cds
TGGCAGGACTGAAGCAGCTGGAGGCTGTTACAAGACCAGACCACCAGCAAACCCTGGAGCCTGCACAGGA
CCCTGAGACCTCTTCCTGGAATTCCCACCTTTTTTCCTCCATCCAGAACCAGTCCCAAAGAGAAACTTCC
AGAAGGAGCTCTCCGTTTTCAGTTTGCCAGTTGGCTTCCTGTCCTTCTGTGAGGAGTACCAGTGTGAAGC
ATGCAGCAGGACGGACTCAGCAGTGTGAATCAGCTAGGGGGACTCTTTGTGAATGGCCGGCCCCTTCCTC
TGGACACCAGGCAGCAGATTGTGCAGCTAGCAATAAGAGGGATGCGACCCTGTGACATTTCACGGAGCCT
TAAGGTATCTAATGGCTGTGTGAGCAAGATCCTAGGACGCTACTACCGCACAGGTGTCTTGGAACCCAAG
TGTATTGGGGGAAGCAAACCACGTCTGGCCACACCTGCTGTGGTGGCTCGAATTGCCCAGCTAAAGGATG
AGTACCCTGCTCTTTTTGCCTGGGAGATCCAACACCAGCTTTGCACTGAAGGGCTTTGTACCCAGGACAA
GGCTCCCAGTGTGTCCTCTATCAATCGAGTACTTCGGGCACTTCAGGAAGACCAGAGCTTGCACTGGACT
CAACTCAGATCACCAGCTGTGTTGGCTCCAGTTCTTCCCAGTCCCCACAGTAACTGTGGGGCTCCCCGAG
GCCCCCACCCAGGAACCAGCCACAGGAATCGGACTATCTTCTCCCCGGGACAAGCCGAGGCACTGGAGAA
AGAGTTTCAGCGTGGGCAGTATCCAGATTCAGTGGCCCGTGGGAAGCTGGCTGCTGCCACCTCTCTGCCT
GAAGACACGGTGAGGGTTTGGTTTTCTAACAGAAGAGCCAAATGGCGCAGGCAAGAGAAGCTGAAATGGG
AAGCACAGCTGCCAGGTGCTTCCCAGGACCTGACAGTACCAAAAAATTCTCCAGGGATCATCTCTGCACA
GCAGTCCCCCGGCAGTGTACCCTCAGCTGCCTTGCCTGTGCTGGAACCATTGAGTCCTTCCTTCTGTCAG
CTATGCTGTGGGACAGCACCAGGCAGATGTTCCAGTGACACCTCATCCCAGGCCTATCTCCAACCCTACT
GGGACTGCCAATCCCTCCTTCCTGTGGCTTCCTCCTCATATGTGGAATTTGCCTGGCCCTGCCTCACCAC
CCATCCTGTGCATCATCTGATTGGAGGCCCAGGACAAGTGCCATCAACCCATTGCTCAAACTGGCCATAA
GAGGCCTCTATTTGACAGTAATAAAAACCTTTTCTTAGATGTTAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
formáty sekvencí - ASN
ASN
Seq-entry ::= set {
class nuc-prot ,
descr {
title "Mus musculus transcription factor PAX4 (Pax4) mRNA, complete cds." ,
source {
org {
taxname "Mus musculus" ,
common "house mouse" ,
db {
{
db "taxon" ,
tag
id 10090 } } ,
orgname {
name
binomial {
genus "Mus" ,
species "musculus" } ,
lineage "Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;
Euteleostomi; Mammalia; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae;
Mus" ,
gcode 1 ,
mgcode 2 ,
div "ROD" } } } ,
pub {
pub {
sub {
authors {
names
std
Bioinformatic Links
GenBank
Swiss-Prot
Entrez
Entrez
•Literature (PubMed)
•Nucleotide (GenBank)
•Protein (PIR)
•Genome
•Structure (PDB)
•PopSet
•Taxonomy
•OMIM
Entrez
Entrez
Entrez
SRS
SRS
SRS
SRS
SRS
SRS
SRS
SRS
SRS - list
SRS - list
SRS - list
PDB
PDB
PDB
PDB
HEADER
GENE REGULATION/DNA
22-APR-99
TITLE
CRYSTAL STRUCTURE OF THE HUMAN PAX-6 PAIRED DOMAIN-DNA
TITLE
2 COMPLEX REVEALS A GENERAL MODEL FOR PAX PROTEIN-DNA
TITLE
3 INTERACTIONS
6PAX
COMPND
MOL_ID: 1;
COMPND
2 MOLECULE: HOMEOBOX PROTEIN PAX-6;
COMPND
3 CHAIN: A;
COMPND
4 ENGINEERED: YES;
COMPND
5 BIOLOGICAL_UNIT: MONOMER;
COMPND
6 MOL_ID: 2;
COMPND
7 MOLECULE: 26 NUCLEOTIDE DNA;
COMPND
8 CHAIN: B;
COMPND
9 ENGINEERED: YES;
COMPND
10 BIOLOGICAL_UNIT: MONOMER;
COMPND
11 MOL_ID: 3;
COMPND
12 MOLECULE: 26 NUCLEOTIDE DNA;
COMPND
13 CHAIN: C;
COMPND
14 ENGINEERED: YES;
COMPND
15 BIOLOGICAL_UNIT: MONOMER
SOURCE
MOL_ID: 1;
PDB
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
FORMUL
HELIX
HELIX
HELIX
HELIX
HELIX
HELIX
SHEET
SHEET
CRYST1
ORIGX1
ORIGX2
ORIGX3
SCALE1
SCALE2
SCALE3
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
1 A 133 SER HIS SER GLY VAL ASN GLN LEU GLY GLY VAL PHE
2 A 133 ASN GLY ARG PRO LEU PRO ASP SER THR ARG GLN ARG
3 A 133 VAL GLU LEU ALA HIS SER GLY ALA ARG PRO CYS ASP
4 A 133 SER ARG ILE LEU GLN VAL SER ASN GLY CYS VAL SER
5 A 133 ILE LEU GLY ARG TYR TYR ALA THR GLY SER ILE ARG
6 A 133 ARG ALA ILE GLY GLY SER LYS PRO ARG VAL ALA THR
7 A 133 GLU VAL VAL SER LYS ILE ALA GLN TYR LYS GLN GLU
8 A 133 PRO SER ILE PHE ALA TRP GLU ILE ARG ASP ARG LEU
9 A 133 SER GLU GLY VAL CYS THR ASN ASP ASN ILE PRO SER
10 A 133 SER SER ILE ASN ARG VAL LEU ARG ASN LEU ALA SER
11 A 133 LYS GLN GLN
1 B
26
A
A
G
C
A
T
T
T
T
C
A
C
2 B
26
C
A
T
G
A
G
T
G
C
A
C
A
1 C
26
T
T
C
T
G
T
G
C
A
C
T
C
2 C
26
T
G
C
G
T
G
A
A
A
A
T
G
4 HOH
*84(H2 O1)
1
1 ASP A
20 HIS A
31 1
2
2 PRO A
36 LEU A
43 1
3
3 ASN A
47 THR A
60 1
4
4 PRO A
78 GLU A
90 1
5
5 ALA A
96 SER A 105 1
6
6 VAL A 117 GLU A 130 1
1
A 2 SER A
3 VAL A
5 0
2
A 2 VAL A 11 VAL A 13 -1 N PHE A 12
O GLY A
33.840
61.686 171.111 90.00 90.00 90.00 P 21 21 21
1.000000 0.000000 0.000000
0.00000
0.000000 1.000000 0.000000
0.00000
0.000000 0.000000 1.000000
0.00000
0.029551 0.000000 0.000000
0.00000
0.000000 0.016211 0.000000
0.00000
0.000000 0.000000 0.005844
0.00000
1 N
SER A
1
-1.985 -12.356 81.201 1.00 60.11
2 CA SER A
1
-1.709 -12.440 82.636 1.00 60.41
3 C
SER A
1
-2.774 -13.282 83.373 1.00 59.35
4 O
SER A
1
-3.734 -13.763 82.751 1.00 58.16
5 CB SER A
1
-1.638 -11.029 83.229 1.00 64.08
6 OG SER A
1
-2.862 -10.345 83.045 1.00 69.46
7 H
SER A
1
-2.431 -11.538 80.917 1.00 40.00
8 HG SER A
1
-2.887 -9.549 83.596 1.00 40.00
9 N
HIS A
2
-2.634 -13.393 84.701 1.00 59.45
VAL
ILE
ILE
LYS
PRO
PRO
CYS
LEU
VAL
GLU
G
G
A
C
12
8
14
13
10
14
4
4
N
C
C
O
C
O
H
H
N
SCOP
PDBsum
PDBsum
PDBsum
CATH
CATH
FSSP - Fold classification
Structural genomics
Bioinformatické WWW rozcestníky
EBI:
Expasy:
Pasteur:
Lyon:
NCBI:
http://www.ebi.ac.uk/Tools
http://www.expasy.ch
http://bioweb.pasteur.fr
http://pbil.univ-lyon1.fr
http://ncbi.nlm.nih.gov
EBI
ExPASy
PBIL
Pasteur
Bioinformatic Links

Podobné dokumenty

GenBank

GenBank FT FT FT FT FT FT FT FT FT FT FT FT FT FT FT FT XX SQ

Více

Drug design - Racionální návrh léčiv - Biotrend

Drug design - Racionální návrh léčiv - Biotrend organizmu – tzv. Anatomicko-terapeuticko-chemická klasifikace léčiv (ATC-klasifikace), kterou spravuje Světová zdravotnická organizace (WHO) prost ednictvím World Health Organization Collaborating ...

Více

Člověk a šimpanz

Člověk a šimpanz difference that makes us human, but we can say, These are the regions of the genome that show a lot of potential and are excellent candidates to do further work on.”

Více

RET : ANAL : FREQ :

RET : ANAL : FREQ : Procenta ( z maximální hodnoty průtoku – Qabs. )představují pásmo, ve kterém je tlumení aktivní. (Např. při nastavení 1 je tlumení aktivní v pásmu 1 kolem okamžité hodnoty průtoku). Při skokové ...

Více

pdf2

pdf2 Bioinformatika pro PrfUK 2003

Více

Tvorba (nejen) 3D grafiky v příkazovém prostředí Asymptote

Tvorba (nejen) 3D grafiky v příkazovém prostředí Asymptote label(A.pdf("controls",delay=20,keep=!settings.inlinetex));

Více

MicroStation V8

MicroStation V8 AccuDraw, Bentley, emblém „B“ Bentley, MDL, MicroStation a SmartLine jsou registrované ochranné známky; Bentley SELECT je registrovaná známka pro služby; PopSet a Viecon jsou ochranné známky společ...

Více