AS6056 Bioinformatics & Molecular Modelling
2013-2014
Assessment 1
Illustrate all of your answers FULLY, including ALL details of what you did, at what websites, and why. Include all relevant output from online programs/servers as appropriate to provide evidence of what you did. Accession numbers of all sequence files must be given. Any references used should be cited in your answer.
Q1
Ferritin is an iron storage protein expressed in most cells when needed, i.e. when cells accumulate an excess of iron.
a) Use the NCBI or EBI portals to retrieve one file for each of the several different forms of ferritin found in humans. Each file should contain the complete mRNA sequence.
b) Compile a table, similar to the one from tutorial 1, comparing the sequence elements of the mRNA of each different ferritin that you find. Include extra columns indicating the length of the protein and the type of ferritin.
c) Retrieve files for each of the genes of the ferritins and compare the structure of the genes. Comment on your findings
d) What is meant by a pseudogene?
e) Are there any ferritin pseudogenes in the human genome?
35 marks
Q2
Cytochrome P450s are a group of heme-thiolate monooxygenases. In liver microsomes, this enzyme is involved in an NADPH-dependent electron transport pathway. It oxidizes a variety of structurally unrelated compounds, including steroids, fatty acids, and xenobiotics.
In this question you will explore the relationship of the Cytochrome P450 2E1 sequence for a number of different species.
(i) Locate the sequence for the full length human Cytochrome P450 2E1 sequence. Run a BLAST search for this sequence against the UniProt/Swiss-Prot database and identify 7 other different species of the same protein with close similarity to the human sequence (make sure they are full length sequences).
(ii) Give the Accession number for each protein sequence identified, together with the species. Give the percentage identity for each of the 7 sequences with that of the human sequence. State E values and the length of each sequence.
(iii) For all 8 sequences run a multiple sequence alignment using program Clustal Omega. Submit your sequence alignment. How many positions along the multiply aligned sequences are fully conserved between species?
(iv) Display both the cladogram and phylogram trees for the aligned sequences and submit with the assessment.
Briefly discuss the evolutionary relationship between the 8 species as indicated by the phylogram and cladograms. Which species is the closest relation to the human species.
30 marks
Q3
Detecting remote homologs with BLAST and PSI-BLAST.
The NCBI website (http://www.ncbi.nlm.nih.gov) gives the option to run both BLAST and PSI-BLAST for a query protein sequence. For this question you need to use the NCBI website to run both BLAST and PSI-BLAST.
The enzyme adenosine deaminase (UniProt accession number P03958) and the enzyme guanine deaminase (UniProt accession number P76641) perform a similar function and are remote homologs, both belonging to the SCOP superfamily metallo-dependent hydrolase. The two sequences have a percentage identity of only 14%.
Perform a protein-protein BLAST search using the sequence for the adenosine deaminase sequence (UniProt accession number P03958) searching against the UniProtKB/Swiss-Prot database. Search the results for the guanine deaminase enzyme (UniProt accession number P76641). Now repeat using PSI-BLAST and compare your results from those obtained from protein-protein BLAST.
Discuss what you observe from the BLAST and PSI-BLAST searches. Discuss which of the two search methods proved most effective and why. Include output as appropriate to illustrate your answer including the pairwise alignment for the two sequences generated from your work.
35 marks
Submission deadline is 15th November 2013 at 3pm to U/G registry.