Bioinformatics & Molecular Modelling
Assessment 1
Q1
Ferritin is an iron storage protein expressed in most cells when needed, i.e. when cells accumulate an excess of iron.
a) Use the NCBI or EBI portals to retrieve one file for each of the several different forms of ferritin found in humans. Each file should contain the complete mRNA sequence.
b) Compile a table, similar to the one from tutorial 1, comparing the sequence elements of the mRNA of each different ferritin that you find. Include extra columns indicating the length of the protein and the type of ferritin.
c) Retrieve files for each of the genes of the ferritins and compare the structure of the genes. Comment on your findings
d) What is meant by a pseudogene?
e) Are there any ferritin pseudogenes in the human genome?
35 marks
Q2
Cytochrome P450s are a group of heme-thiolate monooxygenases. In liver microsomes, this enzyme is involved in an NADPH-dependent electron transport pathway. It oxidizes a variety of structurally unrelated compounds, including steroids, fatty acids, and xenobiotics.
In this question you will explore the relationship of the Cytochrome P450 2E1 sequence for a number of different species.
(i) Locate the sequence for the full length human Cytochrome P450 2E1 sequence. Run a BLAST search for this sequence against the UniProt/Swiss-Prot database and identify 7 other different species of the same protein with close similarity to the human sequence (make sure they are full length sequences).
(ii) Give the Accession number for each protein sequence identified, together with the species. Give the percentage identity for each of the 7 sequences with that of the human sequence. State E values and the length of each sequence.
(iii) For all 8 sequences run a multiple sequence alignment using program Clustal Omega. Submit your sequence alignment. How many positions along the multiply aligned sequences are fully conserved between species?
(iv) Display both the cladogram and phylogram trees for the aligned sequences and submit with the assessment.
Briefly discuss the evolutionary relationship between the 8 species as indicated by the phylogram and cladograms. Which species is the closest relation to the human species.
30 marks
Q3
Detecting remote homologs with BLAST and PSI-BLAST.
The NCBI website (http://www.ncbi.nlm.nih.gov) gives the option to run both BLAST and PSI-BLAST for a query protein sequence. For this question you need to use the NCBI website to run both BLAST and PSI-BLAST.
The enzyme adenosine deaminase (UniProt accession number P03958) and the enzyme guanine deaminase (UniProt accession number P76641) perform a similar function and are remote homologs, both belonging to the SCOP superfamily metallo-dependent hydrolase. The two sequences have a percentage identity of only 14%.
Perform a protein-protein BLAST search using the sequence for the adenosine deaminase sequence (UniProt accession number P03958) searching against the UniProtKB/Swiss-Prot database. Search the results for the guanine deaminase enzyme (UniProt accession number P76641). Now repeat using PSI-BLAST and compare your results from those obtained from protein-protein BLAST.
Discuss what you observe from the BLAST and PSI-BLAST searches. Discuss which of the two search methods proved most effective and why. Include output as appropriate to illustrate your answer including the pairwise alignment for the two sequences generated from your work.
35 marks
AS6056 Bioinformatics & Molecular Modelling
Semester A 2013
Assessment 2 – Data Analysis
PLEASE READ
Illustrate all of your answers FULLY, including ALL details of what you did, at what websites, and why. Include all relevant output from online programs/servers as appropriate to provide evidence of what you did. Accession numbers of all sequence files must be given. Any references used should be cited in your answer.
Q1
Hepatitis C is a serious disease that affects about 200 million people worldwide. No anti-Hepatitis C Virus (HCV) vaccine or specific anti-viral drugs are available today. Non-structural protein 3 (NS3) of Hepatitis C virus is a bifunctional serine protease/helicase, and the protease has become a prime target in the search for anti-HCV drugs.
Locate within the Protein Data Bank the structure of the NS3 protease domain of the Hepatitis C Virus complexed with an inhibitor. Download the coordinates for the structure and use Rasmol to investigate the structure.
(a) Identify the active site residues for your chosen PDB entry.
(b) Produce an image of the structure that you think clearly illustrates the major structural features within the complex.
(c) Identify those interactions which exist between the inhibitor and the enzyme. Illustrate with appropriate images how the inhibitor binds close to the enzyme active site.
(d) Discuss how molecular modelling studies could be carried out to design improved inhibitors for the NS3 protease domain of HCV.
30 marks
Q2
You are told by a colleague that a new web-site has appeared which predicts targets for microRNAs, URL www.targetscan.org. You are asked your opinion of the website.
a) Decide on three criteria to assess the website and write a brief summary of the usefulness of the website to bioinformaticians, based on those three criteria.
b) Use the web-site to find the miRNA(s) predicted to interact with the mRNA of human transferrin receptor 1 and write a brief summary of your findings.
c) Retrieve the mRNA / cDNA sequence of human transferrin receptor (TfR) from the EMBL / ENA file X01060 and identify the IREs in the mRNA. You can use the paper “Chicken transferrin receptor gene: conservation 3? noncoding sequences and expression in erythroid cells” Nucleic Acids Research vol 17, p3763, 1989 posted on WebLearn to help you. Summarise your findings.
d) Using an RNA folding programme, plot the local folding of the 3? UTR of the human TfR in the following regions:
i) The binding site of the miRNAs
ii) The first two IREs
iii) The region between the second and third IREs.
Discuss your findings in relation to the expectation that (i) and (iii) are single-stranded and (ii) has significant secondary structure.
[Note: plots of local folding of RNA were discussed in the lecture on RNA Structure and Function, slides 60 and 62. It is recommended to use a sliding window of 20 bases and calculate folding energies every 5 to 10 bases.]
40 marks
Q3
Using the 509 amino acid Tyrosine-protein kinase from Homo sapien (Uniprot sequence entry P06239) determine the domains present within this protein sequence, using your choice of domain database.
Run homology modelling for this sequence using SWISS-MODEL to obtain a 3-dimensional structure for this sequence.
DISCUSS, in detail, the results of the modelling that you obtain, including a discussion of all output generated.
Download the coordinates of what you consider to be the best model obtained, as a protein databank (*.pdb) file, and create an image of your modelled structure using rasmol or Swiss-PdbViewer. Include your image together with the secondary structure predicted for that model.