7. Ficheiros de texto#

7.1. Leitura#

O modelo mais simples é ler todo o conteúdo de um ficheiro para uma string:

A leitura de um ficheiro segundo este modelo é feito através da função .read().

Mas o processo é um pouco mais complicado do que o uso simples de uma função.

O acesso (programático) a um ficheiro existente num computador requer que num programa se indique que esse acesso vai começar, a abertura de um ficheiro e que o acesso vai terminar, o fecho de um ficheiro.

7.1.1. .read(), com open() e close() explícitos#

a = open('eno1.fasta')
seq = a.read()
a.close()

print(type(seq))

print('A sequência, em FASTA é')
print(seq)
<class 'str'>
A sequência, em FASTA é
>sp|P00924|ENO1_YEAST Enolase 1 OS=Saccharomyces cerevisiae (strain ATCC 204508 / S288c) OX=559292 GN=ENO1 PE=1 SV=3
MAVSKVYARSVYDSRGNPTVEVELTTEKGVFRSIVPSGASTGVHEALEMRDGDKSKWMGK
GVLHAVKNVNDVIAPAFVKANIDVKDQKAVDDFLISLDGTANKSKLGANAILGVSLAASR
AAAAEKNVPLYKHLADLSKSKTSPYVLPVPFLNVLNGGSHAGGALALQEFMIAPTGAKTF
AEALRIGSEVYHNLKSLTKKRYGASAGNVGDEGGVAPNIQTAEEALDLIVDAIKAAGHDG
KIKIGLDCASSEFFKDGKYDLDFKNPNSDKSKWLTGPQLADLYHSLMKRYPIVSIEDPFA
EDDWEAWSHFFKTAGIQIVADDLTVTNPKRIATAIEKKAADALLLKVNQIGTLSESIKAA
QDSFAAGWGVMVSHRSGETEDTFIADLVVGLRTGQIKTGAPARSERLAKLNQLLRIEEEL
GDNAVFAGENFHHGDKL

7.1.2. .read(), dentro do bloco de um comando with#

Numa versão mais “moderna” podemos abrir e automaticamente fechar o ficheiro é utilizar o comando with:

with open('eno1.fasta') as a:
    seq = a.read()

print('A sequência, em FASTA é')
print(seq)
A sequência, em FASTA é
>sp|P00924|ENO1_YEAST Enolase 1 OS=Saccharomyces cerevisiae (strain ATCC 204508 / S288c) OX=559292 GN=ENO1 PE=1 SV=3
MAVSKVYARSVYDSRGNPTVEVELTTEKGVFRSIVPSGASTGVHEALEMRDGDKSKWMGK
GVLHAVKNVNDVIAPAFVKANIDVKDQKAVDDFLISLDGTANKSKLGANAILGVSLAASR
AAAAEKNVPLYKHLADLSKSKTSPYVLPVPFLNVLNGGSHAGGALALQEFMIAPTGAKTF
AEALRIGSEVYHNLKSLTKKRYGASAGNVGDEGGVAPNIQTAEEALDLIVDAIKAAGHDG
KIKIGLDCASSEFFKDGKYDLDFKNPNSDKSKWLTGPQLADLYHSLMKRYPIVSIEDPFA
EDDWEAWSHFFKTAGIQIVADDLTVTNPKRIATAIEKKAADALLLKVNQIGTLSESIKAA
QDSFAAGWGVMVSHRSGETEDTFIADLVVGLRTGQIKTGAPARSERLAKLNQLLRIEEEL
GDNAVFAGENFHHGDKL

O comando with faz o ficheiro permanecer aberto até ao fim do “bloco”, (também aqui) indicado pelo alinhamento mais à direita de um ou mais comandos a seguir à linha em que se encontra o with. Quando termina o bloco o ficheiro é fechado sem usar a função close().

Além de read(), em que todo o conteúdo de um ficheiro é lido para uma string, existem outras maneiras de ler um ficheiro.

7.1.3. .readlines()#

A função readlines() lê e separa as linhas de um ficheiro para uma lista:

with open('eno1.fasta') as a:
    seq = a.readlines()

print(seq)
['>sp|P00924|ENO1_YEAST Enolase 1 OS=Saccharomyces cerevisiae (strain ATCC 204508 / S288c) OX=559292 GN=ENO1 PE=1 SV=3\n', 'MAVSKVYARSVYDSRGNPTVEVELTTEKGVFRSIVPSGASTGVHEALEMRDGDKSKWMGK\n', 'GVLHAVKNVNDVIAPAFVKANIDVKDQKAVDDFLISLDGTANKSKLGANAILGVSLAASR\n', 'AAAAEKNVPLYKHLADLSKSKTSPYVLPVPFLNVLNGGSHAGGALALQEFMIAPTGAKTF\n', 'AEALRIGSEVYHNLKSLTKKRYGASAGNVGDEGGVAPNIQTAEEALDLIVDAIKAAGHDG\n', 'KIKIGLDCASSEFFKDGKYDLDFKNPNSDKSKWLTGPQLADLYHSLMKRYPIVSIEDPFA\n', 'EDDWEAWSHFFKTAGIQIVADDLTVTNPKRIATAIEKKAADALLLKVNQIGTLSESIKAA\n', 'QDSFAAGWGVMVSHRSGETEDTFIADLVVGLRTGQIKTGAPARSERLAKLNQLLRIEEEL\n', 'GDNAVFAGENFHHGDKL\n']

O que são os \n no fim das strings?

Numa string, \n indica a mudança de linha. (Conta como apenas 1 caractere).

Neste caso eles aparecem porque no ficheiro original há mudanças de linha.

Muitas vezes, é necessário elimina-los. Para isso podemos usar a função .strip():

with open('eno1.fasta') as a:
    seq = a.readlines()

seq = [linha.strip() for linha in seq]
print(seq)
['>sp|P00924|ENO1_YEAST Enolase 1 OS=Saccharomyces cerevisiae (strain ATCC 204508 / S288c) OX=559292 GN=ENO1 PE=1 SV=3', 'MAVSKVYARSVYDSRGNPTVEVELTTEKGVFRSIVPSGASTGVHEALEMRDGDKSKWMGK', 'GVLHAVKNVNDVIAPAFVKANIDVKDQKAVDDFLISLDGTANKSKLGANAILGVSLAASR', 'AAAAEKNVPLYKHLADLSKSKTSPYVLPVPFLNVLNGGSHAGGALALQEFMIAPTGAKTF', 'AEALRIGSEVYHNLKSLTKKRYGASAGNVGDEGGVAPNIQTAEEALDLIVDAIKAAGHDG', 'KIKIGLDCASSEFFKDGKYDLDFKNPNSDKSKWLTGPQLADLYHSLMKRYPIVSIEDPFA', 'EDDWEAWSHFFKTAGIQIVADDLTVTNPKRIATAIEKKAADALLLKVNQIGTLSESIKAA', 'QDSFAAGWGVMVSHRSGETEDTFIADLVVGLRTGQIKTGAPARSERLAKLNQLLRIEEEL', 'GDNAVFAGENFHHGDKL']

Ou, de uma forma sucinta, usando uma lista em compreensão:

with open('eno1.fasta') as a:
    seq = [linha.strip() for linha in a.readlines()]
print(seq)
['>sp|P00924|ENO1_YEAST Enolase 1 OS=Saccharomyces cerevisiae (strain ATCC 204508 / S288c) OX=559292 GN=ENO1 PE=1 SV=3', 'MAVSKVYARSVYDSRGNPTVEVELTTEKGVFRSIVPSGASTGVHEALEMRDGDKSKWMGK', 'GVLHAVKNVNDVIAPAFVKANIDVKDQKAVDDFLISLDGTANKSKLGANAILGVSLAASR', 'AAAAEKNVPLYKHLADLSKSKTSPYVLPVPFLNVLNGGSHAGGALALQEFMIAPTGAKTF', 'AEALRIGSEVYHNLKSLTKKRYGASAGNVGDEGGVAPNIQTAEEALDLIVDAIKAAGHDG', 'KIKIGLDCASSEFFKDGKYDLDFKNPNSDKSKWLTGPQLADLYHSLMKRYPIVSIEDPFA', 'EDDWEAWSHFFKTAGIQIVADDLTVTNPKRIATAIEKKAADALLLKVNQIGTLSESIKAA', 'QDSFAAGWGVMVSHRSGETEDTFIADLVVGLRTGQIKTGAPARSERLAKLNQLLRIEEEL', 'GDNAVFAGENFHHGDKL']

Com ficheiros muito grandes, a leitura pelas funções .read() e .readlines() pode esgotar a memória de um computador e “congelar” um programa.

Existe uma terceira maneira de ler um ficheiro (que não traz problemas com ficheiros grandes):

7.1.4. Iteração de ficheiros com for.#

A iteração de um ficheiro “percorre” as linhas do ficheiro

with open('eno1.fasta') as a:
    for linha in a:
        linha = linha.strip()
        print('Linha:', linha)
Linha: >sp|P00924|ENO1_YEAST Enolase 1 OS=Saccharomyces cerevisiae (strain ATCC 204508 / S288c) OX=559292 GN=ENO1 PE=1 SV=3
Linha: MAVSKVYARSVYDSRGNPTVEVELTTEKGVFRSIVPSGASTGVHEALEMRDGDKSKWMGK
Linha: GVLHAVKNVNDVIAPAFVKANIDVKDQKAVDDFLISLDGTANKSKLGANAILGVSLAASR
Linha: AAAAEKNVPLYKHLADLSKSKTSPYVLPVPFLNVLNGGSHAGGALALQEFMIAPTGAKTF
Linha: AEALRIGSEVYHNLKSLTKKRYGASAGNVGDEGGVAPNIQTAEEALDLIVDAIKAAGHDG
Linha: KIKIGLDCASSEFFKDGKYDLDFKNPNSDKSKWLTGPQLADLYHSLMKRYPIVSIEDPFA
Linha: EDDWEAWSHFFKTAGIQIVADDLTVTNPKRIATAIEKKAADALLLKVNQIGTLSESIKAA
Linha: QDSFAAGWGVMVSHRSGETEDTFIADLVVGLRTGQIKTGAPARSERLAKLNQLLRIEEEL
Linha: GDNAVFAGENFHHGDKL

Podemos até usar a função enumerate() com um ficheiro. São gerados os pares de valores

(num linha, linha).

with open('eno1.fasta') as a:
    for i, linha in enumerate(a):
        linha = linha.strip()
        print('linha', i, ':', linha)
linha 0 : >sp|P00924|ENO1_YEAST Enolase 1 OS=Saccharomyces cerevisiae (strain ATCC 204508 / S288c) OX=559292 GN=ENO1 PE=1 SV=3
linha 1 : MAVSKVYARSVYDSRGNPTVEVELTTEKGVFRSIVPSGASTGVHEALEMRDGDKSKWMGK
linha 2 : GVLHAVKNVNDVIAPAFVKANIDVKDQKAVDDFLISLDGTANKSKLGANAILGVSLAASR
linha 3 : AAAAEKNVPLYKHLADLSKSKTSPYVLPVPFLNVLNGGSHAGGALALQEFMIAPTGAKTF
linha 4 : AEALRIGSEVYHNLKSLTKKRYGASAGNVGDEGGVAPNIQTAEEALDLIVDAIKAAGHDG
linha 5 : KIKIGLDCASSEFFKDGKYDLDFKNPNSDKSKWLTGPQLADLYHSLMKRYPIVSIEDPFA
linha 6 : EDDWEAWSHFFKTAGIQIVADDLTVTNPKRIATAIEKKAADALLLKVNQIGTLSESIKAA
linha 7 : QDSFAAGWGVMVSHRSGETEDTFIADLVVGLRTGQIKTGAPARSERLAKLNQLLRIEEEL
linha 8 : GDNAVFAGENFHHGDKL

Problema: ler uma ficheiro FASTA e separar o cabeçalho da sequência em duas strings (juntando toda a sequência numa só string)

with open('eno1.fasta') as a:
    linhas = [k.strip() for k in a.readlines()]

header = linhas[0]
# usamos um slice de uma lista de 1 até ao fim
outras = linhas[1:]
# e a funçao .join() com separador vazio para
# juntá-las
seq = ''.join(outras)

print("cabeçalho:", header)
print('sequência, com', len(seq), 'aminoácidos:')
print(seq)
cabeçalho: >sp|P00924|ENO1_YEAST Enolase 1 OS=Saccharomyces cerevisiae (strain ATCC 204508 / S288c) OX=559292 GN=ENO1 PE=1 SV=3
sequência, com 437 aminoácidos:
MAVSKVYARSVYDSRGNPTVEVELTTEKGVFRSIVPSGASTGVHEALEMRDGDKSKWMGKGVLHAVKNVNDVIAPAFVKANIDVKDQKAVDDFLISLDGTANKSKLGANAILGVSLAASRAAAAEKNVPLYKHLADLSKSKTSPYVLPVPFLNVLNGGSHAGGALALQEFMIAPTGAKTFAEALRIGSEVYHNLKSLTKKRYGASAGNVGDEGGVAPNIQTAEEALDLIVDAIKAAGHDGKIKIGLDCASSEFFKDGKYDLDFKNPNSDKSKWLTGPQLADLYHSLMKRYPIVSIEDPFAEDDWEAWSHFFKTAGIQIVADDLTVTNPKRIATAIEKKAADALLLKVNQIGTLSESIKAAQDSFAAGWGVMVSHRSGETEDTFIADLVVGLRTGQIKTGAPARSERLAKLNQLLRIEEELGDNAVFAGENFHHGDKL

Às vezes os ficheiros não têm cabeçalho! É melhor testar se a primeira linha começa por “>” !

with open('eno1.fasta') as a:
    linhas = [k.strip() for k in a]

if linhas[0].startswith('>'):
    header = linhas[0]
    seq = ''.join(linhas[1:])
else:
    header = ""
    seq = ''.join(linhas)

print("cabeçalho:", header)
print('sequência, com', len(seq), 'aminoácidos:')
print(seq)
cabeçalho: >sp|P00924|ENO1_YEAST Enolase 1 OS=Saccharomyces cerevisiae (strain ATCC 204508 / S288c) OX=559292 GN=ENO1 PE=1 SV=3
sequência, com 437 aminoácidos:
MAVSKVYARSVYDSRGNPTVEVELTTEKGVFRSIVPSGASTGVHEALEMRDGDKSKWMGKGVLHAVKNVNDVIAPAFVKANIDVKDQKAVDDFLISLDGTANKSKLGANAILGVSLAASRAAAAEKNVPLYKHLADLSKSKTSPYVLPVPFLNVLNGGSHAGGALALQEFMIAPTGAKTFAEALRIGSEVYHNLKSLTKKRYGASAGNVGDEGGVAPNIQTAEEALDLIVDAIKAAGHDGKIKIGLDCASSEFFKDGKYDLDFKNPNSDKSKWLTGPQLADLYHSLMKRYPIVSIEDPFAEDDWEAWSHFFKTAGIQIVADDLTVTNPKRIATAIEKKAADALLLKVNQIGTLSESIKAAQDSFAAGWGVMVSHRSGETEDTFIADLVVGLRTGQIKTGAPARSERLAKLNQLLRIEEELGDNAVFAGENFHHGDKL

As linhas em branco podem por vezes causar alguns problemas. Mas é fácil “ignora-las”.

Vamos supor que o ficheiro gre3.txt tem o seguinte conteúdo:


:

>sp|P38715|GRE3_YEAST NADPH-dependent aldose reductase GRE3 OS=Saccharomyces cerevisiae (strain ATCC 204508 / S288c) GN=GRE3 PE=1 SV=1

MSSLVTLNNGLKMPLVGLGCWKIDKKVCANQIYEAIKLGYRLFDGACDYGNEKEVGEGIR
KAISEGLVSRKDIFVVSKLWNNFHHPDHVKLALKKTLSDMGLDYLDLYYIHFPIAFKYVP
FEEKYPPGFYTGADDEKKGHITEAHVPIIDTYRALEECVDEGLIKSIGVSNFQGSLIQDL
LRGCRIKPVALQIEHHPYLTQEHLVEFCKLHDIQVVAYSSFGPQSFIEMDLQLAKTTPTL
FENDVIKKVSQNHPGSTTSQVLLRWATQRGIAVIPKSSKKERLLGNLEIEKKFTLTEQEL
KDISALNANIRFNDPWTWLDGKFPTFA

Como separar o cabeçalho da sequência?

with open('gre3.txt') as a:
    linhas = [k.strip() for k in a]

linhas = [k for k in linhas if len(k) > 0]

if linhas[0].startswith('>'):
    header = linhas[0]
    seq = ''.join(linhas[1:])
else:
    header = ""
    seq = ''.join(linhas)

print("cabeçalho:")
print(header)
print('sequência, com', len(seq), 'aminoácidos:')
print(seq)
cabeçalho:
>sp|P38715|GRE3_YEAST NADPH-dependent aldose reductase GRE3 OS=Saccharomyces cerevisiae (strain ATCC 204508 / S288c) GN=GRE3 PE=1 SV=1
sequência, com 327 aminoácidos:
MSSLVTLNNGLKMPLVGLGCWKIDKKVCANQIYEAIKLGYRLFDGACDYGNEKEVGEGIRKAISEGLVSRKDIFVVSKLWNNFHHPDHVKLALKKTLSDMGLDYLDLYYIHFPIAFKYVPFEEKYPPGFYTGADDEKKGHITEAHVPIIDTYRALEECVDEGLIKSIGVSNFQGSLIQDLLRGCRIKPVALQIEHHPYLTQEHLVEFCKLHDIQVVAYSSFGPQSFIEMDLQLAKTTPTLFENDVIKKVSQNHPGSTTSQVLLRWATQRGIAVIPKSSKKERLLGNLEIEKKFTLTEQELKDISALNANIRFNDPWTWLDGKFPTFA

7.2. Exemplo: Extração de informação de um ficheiro FASTA múltiplo.#

Problema: extraír os cabeçalhos e as sequências de um ficheiro FASTA múltiplo. Mostrar o comprimento das proteínas e o número de triptofanos (W)

with open('proteins.fasta') as a:
    tudo = a.read()
prots = tudo.split('>')

for p in prots:
    print(len(p))
0
745
725
726
438
with open('proteins.fasta') as a:
    tudo = a.read()
prots = tudo.split('>')
prots = [p for p in prots if len(p) > 0]

for p in prots:
    print(len(p))
    print(p[:30])
745
sp|P38090|AGP2_YEAST General a
725
sp|Q12001|ALG6_YEAST Dolichyl 
726
sp|P53309|AP18B_YEAST Clathrin
438
sp|P40467|ASG1_YEAST Activator
with open('proteins.fasta') as a:
    tudo = a.read()
prots = tudo.split('>')
prots = [p for p in prots if len(p) > 0]

headers = []
seqs = []
for p in prots:
    linhas = [k.strip() for k in p.split('\n')]
    headers.append(linhas[0])
    seqs.append(''.join(linhas[1:]))

for h in headers:
    print(h)
sp|P38090|AGP2_YEAST General amino acid permease AGP2 OS=Saccharomyces cerevisiae (strain ATCC 204508 / S288c) OX=559292 GN=AGP2 PE=1 SV=1
sp|Q12001|ALG6_YEAST Dolichyl pyrophosphate Man9GlcNAc2 alpha-1,3-glucosyltransferase OS=Saccharomyces cerevisiae (strain ATCC 204508 / S288c) OX=559292 GN=ALG6 PE=1 SV=1
sp|P53309|AP18B_YEAST Clathrin coat assembly protein AP180B OS=Saccharomyces cerevisiae (strain ATCC 204508 / S288c) OX=559292 GN=YAP1802 PE=1 SV=1
sp|P40467|ASG1_YEAST Activator of stress genes 1 OS=Saccharomyces cerevisiae (strain ATCC 204508 / S288c) OX=559292 GN=ASG1 PE=1 SV=1
with open('proteins.fasta') as a:
    tudo = a.read()
prots = tudo.split('>')
prots = [p for p in prots if len(p) > 0]

headers = []
seqs = []
for p in prots:
    linhas = [k.strip() for k in p.split('\n')]
    headers.append(linhas[0])
    seqs.append(''.join(linhas[1:]))

ids = []
for h in headers:
    separados = h.split('|')
    ids.append(separados[1])

for i, s  in zip(ids, seqs):
    print(i, 'tem', len(s), 'aminoácidos,', s.count('W'), 'são triptofanos')
P38090 tem 596 aminoácidos, 11 são triptofanos
Q12001 tem 544 aminoácidos, 16 são triptofanos
P53309 tem 568 aminoácidos, 2 são triptofanos
P40467 tem 300 aminoácidos, 1 são triptofanos

7.3. Escrita#

7.3.1. Função print() para ficheiros#

Basta abrir o ficheiro em modo de escrita usando o argumento w na função open(). Depois, modificar a função print(), com o argumento file, indicando que o resultado da escrita deve ser enviado para o ficheiro.

with open('exp.txt', 'w') as a:
    print('1, 2, 3, experiência, som, som', file=a)
    for i in range(30):
        print(i, i**0.5, file=a)

Aparentemente não aconteceu nada, mas um ficheiro novo foi criado

Vamos ler o ficheiro:

with open('exp.txt') as a:
    print(a.read())
1, 2, 3, experiência, som, som
0 0.0
1 1.0
2 1.4142135623730951
3 1.7320508075688772
4 2.0
5 2.23606797749979
6 2.449489742783178
7 2.6457513110645907
8 2.8284271247461903
9 3.0
10 3.1622776601683795
11 3.3166247903554
12 3.4641016151377544
13 3.605551275463989
14 3.7416573867739413
15 3.872983346207417
16 4.0
17 4.123105625617661
18 4.242640687119285
19 4.358898943540674
20 4.47213595499958
21 4.58257569495584
22 4.69041575982343
23 4.795831523312719
24 4.898979485566356
25 5.0
26 5.0990195135927845
27 5.196152422706632
28 5.291502622129181
29 5.385164807134504

7.3.2. Função .write()#

Também existe a função .write() que funciona como o contrário de .read():

tudo = """
Um texto que ocupa
1 linha
2 linhas
3 linhas
"""

with open('exp2.txt', 'w') as a:
    a.write(tudo)
with open('exp2.txt') as a:
    print(a.read())
Um texto que ocupa
1 linha
2 linhas
3 linhas

Problema: ler uma ficheiro com dados numéricos e converter o ponto decimal em vírgula decimal

No ficheiro exp.txt, recentemente criado, podemos, de uma form sucinta, passar os . a , ?

with open('exp.txt') as a:
    tudo = a.read().replace('.', ',')

with open('exp.txt', 'w') as a:
    a.write(tudo)

with open('exp.txt') as a:
    print(a.read())
1, 2, 3, experiência, som, som
0 0,0
1 1,0
2 1,4142135623730951
3 1,7320508075688772
4 2,0
5 2,23606797749979
6 2,449489742783178
7 2,6457513110645907
8 2,8284271247461903
9 3,0
10 3,1622776601683795
11 3,3166247903554
12 3,4641016151377544
13 3,605551275463989
14 3,7416573867739413
15 3,872983346207417
16 4,0
17 4,123105625617661
18 4,242640687119285
19 4,358898943540674
20 4,47213595499958
21 4,58257569495584
22 4,69041575982343
23 4,795831523312719
24 4,898979485566356
25 5,0
26 5,0990195135927845
27 5,196152422706632
28 5,291502622129181
29 5,385164807134504

7.4. Módulo requests#

import requests

url = 'https://www.uniprot.org/uniprot/?query=proteome:UP000002407%20reviewed:yes&format=list'

data = requests.get(url).text

print(data)
{"url":"http://rest.uniprot.org/uniprotkb/query=proteome:UP000002407%20reviewed:yes&amp;format=list","messages":["The 'accession' value has invalid format. It should be a valid UniProtKB accession"]}
import requests
r = requests.get('http://www.uniprot.org/uniprot/P00924.fasta')
print(r.text)
>sp|P00924|ENO1_YEAST Enolase 1 OS=Saccharomyces cerevisiae (strain ATCC 204508 / S288c) OX=559292 GN=ENO1 PE=1 SV=3
MAVSKVYARSVYDSRGNPTVEVELTTEKGVFRSIVPSGASTGVHEALEMRDGDKSKWMGK
GVLHAVKNVNDVIAPAFVKANIDVKDQKAVDDFLISLDGTANKSKLGANAILGVSLAASR
AAAAEKNVPLYKHLADLSKSKTSPYVLPVPFLNVLNGGSHAGGALALQEFMIAPTGAKTF
AEALRIGSEVYHNLKSLTKKRYGASAGNVGDEGGVAPNIQTAEEALDLIVDAIKAAGHDG
KIKIGLDCASSEFFKDGKYDLDFKNPNSDKSKWLTGPQLADLYHSLMKRYPIVSIEDPFA
EDDWEAWSHFFKTAGIQIVADDLTVTNPKRIATAIEKKAADALLLKVNQIGTLSESIKAA
QDSFAAGWGVMVSHRSGETEDTFIADLVVGLRTGQIKTGAPARSERLAKLNQLLRIEEEL
GDNAVFAGENFHHGDKL
import requests
r = requests.get('http://www.uniprot.org/uniprot/P00924.fasta')
linhas = r.text.split('\n')

if linhas[0].startswith('>'):
    cab = linhas[0]
    seq = ''.join(linhas[1:])
else:
    cab = ""
    seq = ''.join(linhas)

print("cabeçalho: ", cab)
print("sequência:")
print(seq)
cabeçalho:  >sp|P00924|ENO1_YEAST Enolase 1 OS=Saccharomyces cerevisiae (strain ATCC 204508 / S288c) OX=559292 GN=ENO1 PE=1 SV=3
sequência:
MAVSKVYARSVYDSRGNPTVEVELTTEKGVFRSIVPSGASTGVHEALEMRDGDKSKWMGKGVLHAVKNVNDVIAPAFVKANIDVKDQKAVDDFLISLDGTANKSKLGANAILGVSLAASRAAAAEKNVPLYKHLADLSKSKTSPYVLPVPFLNVLNGGSHAGGALALQEFMIAPTGAKTFAEALRIGSEVYHNLKSLTKKRYGASAGNVGDEGGVAPNIQTAEEALDLIVDAIKAAGHDGKIKIGLDCASSEFFKDGKYDLDFKNPNSDKSKWLTGPQLADLYHSLMKRYPIVSIEDPFAEDDWEAWSHFFKTAGIQIVADDLTVTNPKRIATAIEKKAADALLLKVNQIGTLSESIKAAQDSFAAGWGVMVSHRSGETEDTFIADLVVGLRTGQIKTGAPARSERLAKLNQLLRIEEELGDNAVFAGENFHHGDKL
import requests
r = requests.get('http://www.uniprot.org/uniprot/P00924.txt')
print(r.text)
ID   ENO1_YEAST              Reviewed;         437 AA.
AC   P00924; D6VV34; P99013;
DT   21-JUL-1986, integrated into UniProtKB/Swiss-Prot.
DT   05-OCT-2010, sequence version 3.
DT   22-FEB-2023, entry version 235.
DE   RecName: Full=Enolase 1;
DE            EC=4.2.1.11;
DE   AltName: Full=2-phospho-D-glycerate hydro-lyase 1;
DE   AltName: Full=2-phosphoglycerate dehydratase 1;
GN   Name=ENO1; Synonyms=ENOA, HSP48; OrderedLocusNames=YGR254W;
GN   ORFNames=G9160;
OS   Saccharomyces cerevisiae (strain ATCC 204508 / S288c) (Baker's yeast).
OC   Eukaryota; Fungi; Dikarya; Ascomycota; Saccharomycotina; Saccharomycetes;
OC   Saccharomycetales; Saccharomycetaceae; Saccharomyces.
OX   NCBI_TaxID=559292;
RN   [1]
RP   NUCLEOTIDE SEQUENCE [GENOMIC DNA].
RX   PubMed=6256394; DOI=10.1016/s0021-9258(19)69976-x;
RA   Holland M.J., Holland J.P., Thill G.P., Jackson K.A.;
RT   "The primary structures of two yeast enolase genes. Homology between the 5'
RT   noncoding flanking regions of yeast enolase and glyceraldehyde-3-phosphate
RT   dehydrogenase genes.";
RL   J. Biol. Chem. 256:1385-1395(1981).
RN   [2]
RP   NUCLEOTIDE SEQUENCE [GENOMIC DNA].
RC   STRAIN=ATCC 204508 / S288c;
RX   PubMed=9133741;
RX   DOI=10.1002/(sici)1097-0061(19970330)13:4<369::aid-yea81>3.0.co;2-v;
RA   Mazzoni C., Ruzzi M., Rinaldi T., Solinas F., Montebove F., Frontali L.;
RT   "Sequence analysis of a 10.5 kb DNA fragment from the yeast chromosome VII
RT   reveals the presence of three new open reading frames and of a tRNAThr
RT   gene.";
RL   Yeast 13:369-372(1997).
RN   [3]
RP   NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
RC   STRAIN=ATCC 204508 / S288c;
RX   PubMed=9169869;
RA   Tettelin H., Agostoni-Carbone M.L., Albermann K., Albers M., Arroyo J.,
RA   Backes U., Barreiros T., Bertani I., Bjourson A.J., Brueckner M.,
RA   Bruschi C.V., Carignani G., Castagnoli L., Cerdan E., Clemente M.L.,
RA   Coblenz A., Coglievina M., Coissac E., Defoor E., Del Bino S., Delius H.,
RA   Delneri D., de Wergifosse P., Dujon B., Durand P., Entian K.-D., Eraso P.,
RA   Escribano V., Fabiani L., Fartmann B., Feroli F., Feuermann M.,
RA   Frontali L., Garcia-Gonzalez M., Garcia-Saez M.I., Goffeau A.,
RA   Guerreiro P., Hani J., Hansen M., Hebling U., Hernandez K., Heumann K.,
RA   Hilger F., Hofmann B., Indge K.J., James C.M., Klima R., Koetter P.,
RA   Kramer B., Kramer W., Lauquin G., Leuther H., Louis E.J., Maillier E.,
RA   Marconi A., Martegani E., Mazon M.J., Mazzoni C., McReynolds A.D.K.,
RA   Melchioretto P., Mewes H.-W., Minenkova O., Mueller-Auer S., Nawrocki A.,
RA   Netter P., Neu R., Nombela C., Oliver S.G., Panzeri L., Paoluzi S.,
RA   Plevani P., Portetelle D., Portillo F., Potier S., Purnelle B., Rieger M.,
RA   Riles L., Rinaldi T., Robben J., Rodrigues-Pousada C.,
RA   Rodriguez-Belmonte E., Rodriguez-Torres A.M., Rose M., Ruzzi M.,
RA   Saliola M., Sanchez-Perez M., Schaefer B., Schaefer M., Scharfe M.,
RA   Schmidheini T., Schreer A., Skala J., Souciet J.-L., Steensma H.Y.,
RA   Talla E., Thierry A., Vandenbol M., van der Aart Q.J.M., Van Dyck L.,
RA   Vanoni M., Verhasselt P., Voet M., Volckaert G., Wambutt R., Watson M.D.,
RA   Weber N., Wedler E., Wedler H., Wipfli P., Wolf K., Wright L.F.,
RA   Zaccaria P., Zimmermann M., Zollner A., Kleine K.;
RT   "The nucleotide sequence of Saccharomyces cerevisiae chromosome VII.";
RL   Nature 387:81-84(1997).
RN   [4]
RP   GENOME REANNOTATION.
RC   STRAIN=ATCC 204508 / S288c;
RX   PubMed=24374639; DOI=10.1534/g3.113.008995;
RA   Engel S.R., Dietrich F.S., Fisk D.G., Binkley G., Balakrishnan R.,
RA   Costanzo M.C., Dwight S.S., Hitz B.C., Karra K., Nash R.S., Weng S.,
RA   Wong E.D., Lloyd P., Skrzypek M.S., Miyasato S.R., Simison M., Cherry J.M.;
RT   "The reference genome sequence of Saccharomyces cerevisiae: Then and now.";
RL   G3 (Bethesda) 4:389-398(2014).
RN   [5]
RP   PROTEIN SEQUENCE OF 2-437.
RX   PubMed=7005235; DOI=10.1016/s0021-9258(19)69975-8;
RA   Chin C.C.Q., Brewer J.M., Wold F.;
RT   "The amino acid sequence of yeast enolase.";
RL   J. Biol. Chem. 256:1377-1384(1981).
RN   [6]
RP   PROTEIN SEQUENCE OF 2-12.
RC   STRAIN=ATCC 26786 / X2180-1A;
RA   Sanchez J.-C., Golaz O., Schaller D., Morch F., Frutiger S., Hughes G.J.,
RA   Appel R.D., Deshusses J., Hochstrasser D.F.;
RL   Submitted (AUG-1995) to UniProtKB.
RN   [7]
RP   PROTEIN SEQUENCE OF 30-47.
RC   STRAIN=ATCC 204508 / S288c;
RX   PubMed=7895733; DOI=10.1002/elps.11501501210;
RA   Garrels J.I., Futcher B., Kobayashi R., Latter G.I., Schwender B.,
RA   Volpe T., Warner J.R., McLaughlin C.S.;
RT   "Protein identifications for a Saccharomyces cerevisiae protein database.";
RL   Electrophoresis 15:1466-1486(1994).
RN   [8]
RP   PROTEIN SEQUENCE OF 69-79.
RC   STRAIN=ATCC 38531 / Y41;
RX   PubMed=7737086; DOI=10.1002/elps.1150160124;
RA   Norbeck J., Blomberg A.;
RT   "Gene linkage of two-dimensional polyacrylamide gel electrophoresis
RT   resolved proteins from isogene families in Saccharomyces cerevisiae by
RT   microsequencing of in-gel trypsin generated peptides.";
RL   Electrophoresis 16:149-156(1995).
RN   [9]
RP   MUTAGENESIS OF LYS-346, AND ACTIVE SITE.
RX   PubMed=8634301; DOI=10.1021/bi952186y;
RA   Poyner R.R., Laughlin L.T., Sowa G.A., Reed G.H.;
RT   "Toward identification of acid/base catalysts in the active site of
RT   enolase: comparison of the properties of K345A, E168Q, and E211Q
RT   variants.";
RL   Biochemistry 35:1692-1699(1996).
RN   [10]
RP   MUTAGENESIS OF HIS-160.
RX   PubMed=11027610; DOI=10.1006/bbrc.2000.3618;
RA   Brewer J.M., Holland M.J., Lebioda L.;
RT   "The H159A mutant of yeast enolase 1 has significant activity.";
RL   Biochem. Biophys. Res. Commun. 276:1199-1202(2000).
RN   [11]
RP   SUBCELLULAR LOCATION.
RX   PubMed=11502169; DOI=10.1021/bi010277r;
RA   Grandier-Vazeille X., Bathany K., Chaignepain S., Camougrand N., Manon S.,
RA   Schmitter J.-M.;
RT   "Yeast mitochondrial dehydrogenases are associated in a supramolecular
RT   complex.";
RL   Biochemistry 40:9758-9769(2001).
RN   [12]
RP   MUTAGENESIS OF HIS-160 AND ASN-208.
RX   PubMed=13678299; DOI=10.1023/a:1025390123761;
RA   Brewer J.M., Glover C.V., Holland M.J., Lebioda L.;
RT   "Enzymatic function of loop movement in enolase: preparation and some
RT   properties of H159N, H159A, H159F, and N207A enolases.";
RL   J. Protein Chem. 22:353-361(2003).
RN   [13]
RP   LEVEL OF PROTEIN EXPRESSION [LARGE SCALE ANALYSIS].
RX   PubMed=14562106; DOI=10.1038/nature02046;
RA   Ghaemmaghami S., Huh W.-K., Bower K., Howson R.W., Belle A., Dephoure N.,
RA   O'Shea E.K., Weissman J.S.;
RT   "Global analysis of protein expression in yeast.";
RL   Nature 425:737-741(2003).
RN   [14]
RP   PHOSPHORYLATION [LARGE SCALE ANALYSIS] AT SER-119, AND IDENTIFICATION BY
RP   MASS SPECTROMETRY [LARGE SCALE ANALYSIS].
RX   PubMed=17287358; DOI=10.1073/pnas.0607084104;
RA   Chi A., Huttenhower C., Geer L.Y., Coon J.J., Syka J.E.P., Bai D.L.,
RA   Shabanowitz J., Burke D.J., Troyanskaya O.G., Hunt D.F.;
RT   "Analysis of phosphorylation sites on proteins from Saccharomyces
RT   cerevisiae by electron transfer dissociation (ETD) mass spectrometry.";
RL   Proc. Natl. Acad. Sci. U.S.A. 104:2193-2198(2007).
RN   [15]
RP   UBIQUITINATION [LARGE SCALE ANALYSIS] AT LYS-358, AND IDENTIFICATION BY
RP   MASS SPECTROMETRY [LARGE SCALE ANALYSIS].
RX   PubMed=22106047; DOI=10.1002/pmic.201100166;
RA   Starita L.M., Lo R.S., Eng J.K., von Haller P.D., Fields S.;
RT   "Sites of ubiquitin attachment in Saccharomyces cerevisiae.";
RL   Proteomics 12:236-240(2012).
RN   [16]
RP   X-RAY CRYSTALLOGRAPHY (2.25 ANGSTROMS).
RX   PubMed=3374614; DOI=10.1038/333683a0;
RA   Lebioda L., Stec B.;
RT   "Crystal structure of enolase indicates that enolase and pyruvate kinase
RT   evolved from a common ancestor.";
RL   Nature 333:683-686(1988).
RN   [17]
RP   X-RAY CRYSTALLOGRAPHY (2.25 ANGSTROMS).
RX   PubMed=2645275; DOI=10.2210/pdb2enl/pdb;
RA   Lebioda L., Stec B., Brewer J.M.;
RT   "The structure of yeast enolase at 2.25-A resolution. An 8-fold beta +
RT   alpha-barrel with a novel beta beta alpha alpha (beta alpha)6 topology.";
RL   J. Biol. Chem. 264:3685-3693(1989).
RN   [18]
RP   X-RAY CRYSTALLOGRAPHY (2.25 ANGSTROMS).
RX   PubMed=2405163; DOI=10.1016/0022-2836(90)90023-f;
RA   Stec B., Lebioda L.;
RT   "Refined structure of yeast apo-enolase at 2.25-A resolution.";
RL   J. Mol. Biol. 211:235-248(1990).
RN   [19]
RP   X-RAY CRYSTALLOGRAPHY (1.8 ANGSTROMS) IN COMPLEX WITH SUBSTRATE AND
RP   MAGNESIUM IONS, AND COFACTOR.
RX   PubMed=8605183; DOI=10.1021/bi952859c;
RA   Larsen T.M., Wedekind J.E., Rayment I., Reed G.H.;
RT   "A carboxylate oxygen of the substrate bridges the magnesium ions at the
RT   active site of enolase: structure of the yeast enzyme complexed with the
RT   equilibrium mixture of 2-phosphoglycerate and phosphoenolpyruvate at 1.8-A
RT   resolution.";
RL   Biochemistry 35:4349-4358(1996).
RN   [20]
RP   X-RAY CRYSTALLOGRAPHY (2.0 ANGSTROMS) IN COMPLEX WITH SUBSTRATE.
RX   PubMed=9376357; DOI=10.1021/bi9712450;
RA   Zhang E., Brewer J.M., Minor W., Carreira L.A., Lebioda L.;
RT   "Mechanism of enolase: the crystal structure of asymmetric dimer enolase-2-
RT   phospho-D-glycerate/enolase-phosphoenolpyruvate at 2.0-A resolution.";
RL   Biochemistry 36:12526-12534(1997).
RN   [21]
RP   X-RAY CRYSTALLOGRAPHY (2.1 ANGSTROMS) OF MUTANT ALA-40 IN COMPLEX WITH
RP   MAGNESIUM IONS AND SUBSTRATE ANALOG.
RX   PubMed=12054465; DOI=10.1016/s0003-9861(02)00024-3;
RA   Poyner R.R., Larsen T.M., Wong S.-W., Reed G.H.;
RT   "Functional and structural changes due to a serine to alanine mutation in
RT   the active-site flap of enolase.";
RL   Arch. Biochem. Biophys. 401:155-163(2002).
RN   [22]
RP   X-RAY CRYSTALLOGRAPHY (1.8 ANGSTROMS) OF MUTANT GLN-212 AND MUTANT GLN-169,
RP   MUTAGENESIS OF GLU-212 AND LYS-346, AND ACTIVE SITE.
RX   PubMed=12846578; DOI=10.1021/bi0346345;
RA   Sims P.A., Larsen T.M., Poyner R.R., Cleland W.W., Reed G.H.;
RT   "Reverse protonation is the key to general acid-base catalysis in
RT   enolase.";
RL   Biochemistry 42:8298-8306(2003).
CC   -!- CATALYTIC ACTIVITY:
CC       Reaction=(2R)-2-phosphoglycerate = H2O + phosphoenolpyruvate;
CC         Xref=Rhea:RHEA:10164, ChEBI:CHEBI:15377, ChEBI:CHEBI:58289,
CC         ChEBI:CHEBI:58702; EC=4.2.1.11;
CC   -!- COFACTOR:
CC       Name=Mg(2+); Xref=ChEBI:CHEBI:18420;
CC         Evidence={ECO:0000269|PubMed:8605183};
CC       Note=Mg(2+) is required for catalysis and for stabilizing the dimer.
CC       {ECO:0000269|PubMed:8605183};
CC   -!- PATHWAY: Carbohydrate degradation; glycolysis; pyruvate from D-
CC       glyceraldehyde 3-phosphate: step 4/5.
CC   -!- SUBUNIT: Homodimer. {ECO:0000269|PubMed:12054465,
CC       ECO:0000269|PubMed:8605183, ECO:0000269|PubMed:9376357}.
CC   -!- SUBCELLULAR LOCATION: Cytoplasm {ECO:0000269|PubMed:11502169}.
CC   -!- MISCELLANEOUS: Present with 76700 molecules/cell in log phase SD
CC       medium. {ECO:0000269|PubMed:14562106}.
CC   -!- SIMILARITY: Belongs to the enolase family. {ECO:0000305}.
CC   ---------------------------------------------------------------------------
CC   Copyrighted by the UniProt Consortium, see https://www.uniprot.org/terms
CC   Distributed under the Creative Commons Attribution (CC BY 4.0) License
CC   ---------------------------------------------------------------------------
DR   EMBL; J01322; AAA88712.1; -; Genomic_DNA.
DR   EMBL; X99228; CAA67616.1; -; Genomic_DNA.
DR   EMBL; Z73039; CAA97283.1; -; Genomic_DNA.
DR   EMBL; BK006941; DAA08345.1; -; Genomic_DNA.
DR   PIR; S64586; NOBY.
DR   RefSeq; NP_011770.3; NM_001181383.3.
DR   PDB; 1EBG; X-ray; 2.10 A; A/B=2-437.
DR   PDB; 1EBH; X-ray; 1.90 A; A/B=2-437.
DR   PDB; 1ELS; X-ray; 2.40 A; A=2-437.
DR   PDB; 1L8P; X-ray; 2.10 A; A/B/C/D=2-437.
DR   PDB; 1NEL; X-ray; 2.60 A; A=2-437.
DR   PDB; 1ONE; X-ray; 1.80 A; A/B=2-437.
DR   PDB; 1P43; X-ray; 1.80 A; A/B=2-437.
DR   PDB; 1P48; X-ray; 2.00 A; A/B=2-437.
DR   PDB; 2AL1; X-ray; 1.50 A; A/B=2-437.
DR   PDB; 2AL2; X-ray; 1.85 A; A/B=2-437.
DR   PDB; 2ONE; X-ray; 2.00 A; A/B=2-437.
DR   PDB; 2XGZ; X-ray; 1.80 A; A/B=2-437.
DR   PDB; 2XH0; X-ray; 1.70 A; A/B/C/D=2-437.
DR   PDB; 2XH2; X-ray; 1.80 A; A/B/C/D=2-437.
DR   PDB; 2XH4; X-ray; 1.70 A; A/B/C/D=2-437.
DR   PDB; 2XH7; X-ray; 1.80 A; A/B=2-437.
DR   PDB; 3ENL; X-ray; 2.25 A; A=2-437.
DR   PDB; 4ENL; X-ray; 1.90 A; A=2-437.
DR   PDB; 5ENL; X-ray; 2.20 A; A=2-437.
DR   PDB; 6ENL; X-ray; 2.20 A; A=2-437.
DR   PDB; 7ENL; X-ray; 2.20 A; A=2-437.
DR   PDBsum; 1EBG; -.
DR   PDBsum; 1EBH; -.
DR   PDBsum; 1ELS; -.
DR   PDBsum; 1L8P; -.
DR   PDBsum; 1NEL; -.
DR   PDBsum; 1ONE; -.
DR   PDBsum; 1P43; -.
DR   PDBsum; 1P48; -.
DR   PDBsum; 2AL1; -.
DR   PDBsum; 2AL2; -.
DR   PDBsum; 2ONE; -.
DR   PDBsum; 2XGZ; -.
DR   PDBsum; 2XH0; -.
DR   PDBsum; 2XH2; -.
DR   PDBsum; 2XH4; -.
DR   PDBsum; 2XH7; -.
DR   PDBsum; 3ENL; -.
DR   PDBsum; 4ENL; -.
DR   PDBsum; 5ENL; -.
DR   PDBsum; 6ENL; -.
DR   PDBsum; 7ENL; -.
DR   AlphaFoldDB; P00924; -.
DR   SMR; P00924; -.
DR   BioGRID; 33505; 166.
DR   DIP; DIP-5561N; -.
DR   IntAct; P00924; 82.
DR   MINT; P00924; -.
DR   STRING; 4932.YGR254W; -.
DR   Allergome; 786; Sac c Enolase.
DR   MoonDB; P00924; Curated.
DR   MoonProt; P00924; -.
DR   CarbonylDB; P00924; -.
DR   iPTMnet; P00924; -.
DR   MetOSite; P00924; -.
DR   COMPLUYEAST-2DPAGE; P00924; -.
DR   SWISS-2DPAGE; P00924; -.
DR   UCD-2DPAGE; P00924; -.
DR   MaxQB; P00924; -.
DR   PaxDb; P00924; -.
DR   PeptideAtlas; P00924; -.
DR   TopDownProteomics; P00924; -.
DR   EnsemblFungi; YGR254W_mRNA; YGR254W; YGR254W.
DR   GeneID; 853169; -.
DR   KEGG; sce:YGR254W; -.
DR   AGR; SGD:S000003486; -.
DR   SGD; S000003486; ENO1.
DR   VEuPathDB; FungiDB:YGR254W; -.
DR   eggNOG; KOG2670; Eukaryota.
DR   GeneTree; ENSGT00950000182805; -.
DR   HOGENOM; CLU_031223_0_0_1; -.
DR   InParanoid; P00924; -.
DR   OMA; EFMIIPV; -.
DR   OrthoDB; 1093250at2759; -.
DR   BioCyc; YEAST:YGR254W-MON; -.
DR   BRENDA; 4.2.1.11; 984.
DR   Reactome; R-SCE-70171; Glycolysis.
DR   Reactome; R-SCE-70263; Gluconeogenesis.
DR   SABIO-RK; P00924; -.
DR   UniPathway; UPA00109; UER00187.
DR   BioGRID-ORCS; 853169; 6 hits in 10 CRISPR screens.
DR   EvolutionaryTrace; P00924; -.
DR   PRO; PR:P00924; -.
DR   Proteomes; UP000002311; Chromosome VII.
DR   RNAct; P00924; protein.
DR   GO; GO:0005737; C:cytoplasm; HDA:SGD.
DR   GO; GO:0005829; C:cytosol; HDA:SGD.
DR   GO; GO:0000324; C:fungal-type vacuole; IDA:SGD.
DR   GO; GO:0005739; C:mitochondrion; IDA:SGD.
DR   GO; GO:0000015; C:phosphopyruvate hydratase complex; IDA:SGD.
DR   GO; GO:0005886; C:plasma membrane; HDA:SGD.
DR   GO; GO:0000287; F:magnesium ion binding; IEA:InterPro.
DR   GO; GO:1904408; F:melatonin binding; IDA:SGD.
DR   GO; GO:0004634; F:phosphopyruvate hydratase activity; IMP:SGD.
DR   GO; GO:0006096; P:glycolytic process; IMP:SGD.
DR   GO; GO:0032889; P:regulation of vacuole fusion, non-autophagic; IDA:SGD.
DR   CDD; cd03313; enolase; 1.
DR   Gene3D; 3.20.20.120; Enolase-like C-terminal domain; 1.
DR   Gene3D; 3.30.390.10; Enolase-like, N-terminal domain; 1.
DR   HAMAP; MF_00318; Enolase; 1.
DR   InterPro; IPR000941; Enolase.
DR   InterPro; IPR036849; Enolase-like_C_sf.
DR   InterPro; IPR029017; Enolase-like_N.
DR   InterPro; IPR020810; Enolase_C.
DR   InterPro; IPR020809; Enolase_CS.
DR   InterPro; IPR020811; Enolase_N.
DR   PANTHER; PTHR11902; ENOLASE; 1.
DR   PANTHER; PTHR11902:SF1; ENOLASE; 1.
DR   Pfam; PF00113; Enolase_C; 1.
DR   Pfam; PF03952; Enolase_N; 1.
DR   PIRSF; PIRSF001400; Enolase; 1.
DR   PRINTS; PR00148; ENOLASE.
DR   SFLD; SFLDS00001; Enolase; 1.
DR   SFLD; SFLDF00002; enolase; 1.
DR   SMART; SM01192; Enolase_C; 1.
DR   SMART; SM01193; Enolase_N; 1.
DR   SUPFAM; SSF51604; Enolase C-terminal domain-like; 1.
DR   SUPFAM; SSF54826; Enolase N-terminal domain-like; 1.
DR   TIGRFAMs; TIGR01060; eno; 1.
DR   PROSITE; PS00164; ENOLASE; 1.
PE   1: Evidence at protein level;
KW   3D-structure; Cytoplasm; Direct protein sequencing; Glycolysis;
KW   Isopeptide bond; Lyase; Magnesium; Metal-binding; Phosphoprotein;
KW   Reference proteome; Ubl conjugation.
FT   INIT_MET        1
FT                   /note="Removed"
FT                   /evidence="ECO:0000269|PubMed:7005235, ECO:0000269|Ref.6"
FT   CHAIN           2..437
FT                   /note="Enolase 1"
FT                   /id="PRO_0000134062"
FT   ACT_SITE        212
FT                   /note="Proton donor"
FT                   /evidence="ECO:0000269|PubMed:12846578"
FT   ACT_SITE        346
FT                   /note="Proton acceptor"
FT                   /evidence="ECO:0000269|PubMed:12846578,
FT                   ECO:0000269|PubMed:8634301"
FT   BINDING         160
FT                   /ligand="substrate"
FT                   /evidence="ECO:0000269|PubMed:8605183,
FT                   ECO:0000269|PubMed:9376357"
FT   BINDING         169
FT                   /ligand="substrate"
FT                   /evidence="ECO:0000269|PubMed:8605183,
FT                   ECO:0000269|PubMed:9376357"
FT   BINDING         247
FT                   /ligand="Mg(2+)"
FT                   /ligand_id="ChEBI:CHEBI:18420"
FT                   /evidence="ECO:0000269|PubMed:8605183"
FT   BINDING         296
FT                   /ligand="Mg(2+)"
FT                   /ligand_id="ChEBI:CHEBI:18420"
FT                   /evidence="ECO:0000269|PubMed:8605183"
FT   BINDING         296
FT                   /ligand="substrate"
FT                   /evidence="ECO:0000269|PubMed:8605183,
FT                   ECO:0000269|PubMed:9376357"
FT   BINDING         321
FT                   /ligand="Mg(2+)"
FT                   /ligand_id="ChEBI:CHEBI:18420"
FT                   /evidence="ECO:0000269|PubMed:8605183"
FT   BINDING         321
FT                   /ligand="substrate"
FT                   /evidence="ECO:0000269|PubMed:8605183,
FT                   ECO:0000269|PubMed:9376357"
FT   BINDING         373..376
FT                   /ligand="substrate"
FT                   /evidence="ECO:0000269|PubMed:12054465,
FT                   ECO:0000269|PubMed:8605183, ECO:0000269|PubMed:9376357"
FT   BINDING         397
FT                   /ligand="substrate"
FT                   /evidence="ECO:0000269|PubMed:8605183,
FT                   ECO:0000269|PubMed:9376357"
FT   MOD_RES         119
FT                   /note="Phosphoserine"
FT                   /evidence="ECO:0007744|PubMed:17287358"
FT   MOD_RES         138
FT                   /note="Phosphoserine"
FT                   /evidence="ECO:0000250|UniProtKB:P00925"
FT   MOD_RES         188
FT                   /note="Phosphoserine"
FT                   /evidence="ECO:0000250|UniProtKB:P00925"
FT   MOD_RES         313
FT                   /note="Phosphothreonine"
FT                   /evidence="ECO:0000250|UniProtKB:P00925"
FT   MOD_RES         324
FT                   /note="Phosphothreonine"
FT                   /evidence="ECO:0000250|UniProtKB:P00925"
FT   CROSSLNK        60
FT                   /note="Glycyl lysine isopeptide (Lys-Gly) (interchain with
FT                   G-Cter in ubiquitin)"
FT                   /evidence="ECO:0000250|UniProtKB:P00925"
FT   CROSSLNK        243
FT                   /note="Glycyl lysine isopeptide (Lys-Gly) (interchain with
FT                   G-Cter in ubiquitin)"
FT                   /evidence="ECO:0000250|UniProtKB:P00925"
FT   CROSSLNK        358
FT                   /note="Glycyl lysine isopeptide (Lys-Gly) (interchain with
FT                   G-Cter in ubiquitin)"
FT                   /evidence="ECO:0007744|PubMed:22106047"
FT   MUTAGEN         40
FT                   /note="S->A: Reduces activity by 99.9%."
FT   MUTAGEN         160
FT                   /note="H->A,F,N: Reduces activity by 99%."
FT                   /evidence="ECO:0000269|PubMed:11027610,
FT                   ECO:0000269|PubMed:13678299"
FT   MUTAGEN         169
FT                   /note="E->Q: Reduces kcat over 100000-fold."
FT   MUTAGEN         208
FT                   /note="N->A: Reduces activity by 44%."
FT                   /evidence="ECO:0000269|PubMed:13678299"
FT   MUTAGEN         212
FT                   /note="E->Q: Reduces kcat over 100000-fold."
FT                   /evidence="ECO:0000269|PubMed:12846578"
FT   MUTAGEN         346
FT                   /note="K->A: Reduces kcat over 100000-fold. Abolishes of
FT                   the proton exchange reaction that initiates the enzymatic
FT                   reaction."
FT                   /evidence="ECO:0000269|PubMed:12846578,
FT                   ECO:0000269|PubMed:8634301"
FT   CONFLICT        242
FT                   /note="I -> V (in Ref. 1; AAA88712)"
FT                   /evidence="ECO:0000305"
FT   STRAND          5..12
FT                   /evidence="ECO:0007829|PDB:2AL1"
FT   STRAND          18..26
FT                   /evidence="ECO:0007829|PDB:2AL1"
FT   STRAND          29..34
FT                   /evidence="ECO:0007829|PDB:2AL1"
FT   STRAND          43..45
FT                   /evidence="ECO:0007829|PDB:1EBH"
FT   HELIX           57..59
FT                   /evidence="ECO:0007829|PDB:2AL1"
FT   HELIX           63..71
FT                   /evidence="ECO:0007829|PDB:2AL1"
FT   HELIX           73..80
FT                   /evidence="ECO:0007829|PDB:2AL1"
FT   HELIX           87..98
FT                   /evidence="ECO:0007829|PDB:2AL1"
FT   STRAND          100..102
FT                   /evidence="ECO:0007829|PDB:1EBH"
FT   TURN            104..106
FT                   /evidence="ECO:0007829|PDB:2AL1"
FT   HELIX           108..125
FT                   /evidence="ECO:0007829|PDB:2AL1"
FT   HELIX           130..138
FT                   /evidence="ECO:0007829|PDB:2AL1"
FT   STRAND          145..147
FT                   /evidence="ECO:0007829|PDB:2AL1"
FT   STRAND          152..156
FT                   /evidence="ECO:0007829|PDB:2AL1"
FT   HELIX           158..160
FT                   /evidence="ECO:0007829|PDB:2AL1"
FT   STRAND          161..164
FT                   /evidence="ECO:0007829|PDB:2AL1"
FT   STRAND          169..173
FT                   /evidence="ECO:0007829|PDB:2AL1"
FT   HELIX           180..202
FT                   /evidence="ECO:0007829|PDB:2AL1"
FT   HELIX           204..207
FT                   /evidence="ECO:0007829|PDB:2AL1"
FT   STRAND          213..215
FT                   /evidence="ECO:0007829|PDB:1P48"
FT   HELIX           222..236
FT                   /evidence="ECO:0007829|PDB:2AL1"
FT   TURN            239..241
FT                   /evidence="ECO:0007829|PDB:2AL1"
FT   STRAND          243..247
FT                   /evidence="ECO:0007829|PDB:2AL1"
FT   HELIX           250..253
FT                   /evidence="ECO:0007829|PDB:2AL1"
FT   TURN            261..264
FT                   /evidence="ECO:0007829|PDB:2AL1"
FT   HELIX           270..272
FT                   /evidence="ECO:0007829|PDB:2AL1"
FT   HELIX           276..289
FT                   /evidence="ECO:0007829|PDB:2AL1"
FT   STRAND          292..296
FT                   /evidence="ECO:0007829|PDB:2AL1"
FT   HELIX           304..311
FT                   /evidence="ECO:0007829|PDB:2AL1"
FT   TURN            312..314
FT                   /evidence="ECO:0007829|PDB:2AL2"
FT   STRAND          316..321
FT                   /evidence="ECO:0007829|PDB:2AL1"
FT   TURN            322..326
FT                   /evidence="ECO:0007829|PDB:2AL1"
FT   HELIX           328..336
FT                   /evidence="ECO:0007829|PDB:2AL1"
FT   STRAND          341..345
FT                   /evidence="ECO:0007829|PDB:2AL1"
FT   HELIX           347..350
FT                   /evidence="ECO:0007829|PDB:2AL1"
FT   HELIX           353..365
FT                   /evidence="ECO:0007829|PDB:2AL1"
FT   STRAND          369..373
FT                   /evidence="ECO:0007829|PDB:2AL1"
FT   HELIX           383..390
FT                   /evidence="ECO:0007829|PDB:2AL1"
FT   STRAND          394..397
FT                   /evidence="ECO:0007829|PDB:2AL1"
FT   HELIX           404..420
FT                   /evidence="ECO:0007829|PDB:2AL1"
FT   HELIX           421..423
FT                   /evidence="ECO:0007829|PDB:2AL1"
FT   STRAND          424..426
FT                   /evidence="ECO:0007829|PDB:2AL1"
FT   HELIX           428..430
FT                   /evidence="ECO:0007829|PDB:2AL1"
FT   HELIX           434..436
FT                   /evidence="ECO:0007829|PDB:2AL1"
SQ   SEQUENCE   437 AA;  46816 MW;  69F45214DBD375BE CRC64;
     MAVSKVYARS VYDSRGNPTV EVELTTEKGV FRSIVPSGAS TGVHEALEMR DGDKSKWMGK
     GVLHAVKNVN DVIAPAFVKA NIDVKDQKAV DDFLISLDGT ANKSKLGANA ILGVSLAASR
     AAAAEKNVPL YKHLADLSKS KTSPYVLPVP FLNVLNGGSH AGGALALQEF MIAPTGAKTF
     AEALRIGSEV YHNLKSLTKK RYGASAGNVG DEGGVAPNIQ TAEEALDLIV DAIKAAGHDG
     KIKIGLDCAS SEFFKDGKYD LDFKNPNSDK SKWLTGPQLA DLYHSLMKRY PIVSIEDPFA
     EDDWEAWSHF FKTAGIQIVA DDLTVTNPKR IATAIEKKAA DALLLKVNQI GTLSESIKAA
     QDSFAAGWGV MVSHRSGETE DTFIADLVVG LRTGQIKTGA PARSERLAKL NQLLRIEEEL
     GDNAVFAGEN FHHGDKL
//

Problema

  • obter a informação relativa à proteína P00924

  • filtar a linha começada por SQ

  • mostar o numero de aminoácidos e a massa molecular

A informação relativa ao formato desta linha (embora seja evidente olhando para um exemplo) está descrita na documentação da UniProt

A linha tem o formato

SQ   SEQUENCE XXXX AA; XXXXX MW; XXXXXXXXXXXXXXXX CRC64;

import requests
info = requests.get('http://www.uniprot.org/uniprot/P00924.txt').text

linhas = info.split('\n')

sq = ''
for i in linhas:
    if i.startswith('SQ'):
        sq = i

print('linha SQ:')
print(sq)

# SQ   SEQUENCE XXXX AA; XXXXX MW; XXXXXXXXXXXXXXXX CRC64;
partes = sq.split()
print(partes[2], 'aminoácidos')
print(partes[4], 'Da')
linha SQ:
SQ   SEQUENCE   437 AA;  46816 MW;  69F45214DBD375BE CRC64;
437 aminoácidos
46816 Da

Na documentação da UniProt, realtiva às linhas começadas por FT pode-se ler…

INIT_MET - Initiator methionine.

This feature key is associated with a '1' value in the 'FROM' and 'TO' fields to indicate that the initiator methionine has been cleaved off:


    FT   INIT_MET      1      1       Removed.

It is not used when the initiator methionine is not cleaved off

Problema

Para as seguintes proteínas,

Q96UH7, Q8J0N6, Q9URB4, Q9C2U0, P36580, P14540

gerar uma tabela com

AC       AA         MW       init M cleaved