Homework assignments

Important

I encourage you to discuss homework assignments with each other, but you may not view other student’s assignments or share your assignment with others.

Homework assignments will be listed here as they are assigned.

Turning in your homework by email

Your homework must always be turned in with a standardized name. That name should be <nau_id>_<homework_id>.<extension>, where <nau_id> is your NAU identifier (for example, mine is jgc53), and <homework_id> and <extension> are provided on a per-assignment basis. Assignments that are not turned in according to these specifications will lose 10%.

Unless otherwise noted, homework must be turned in by email to Arron (ams379@nau.edu) before class on the day it is due.

Extra credit assignment (due 5 May 2016)

Complete this assignment worksheet and the associated assigned reading (described in the worksheet). You will turn in a PDF containing the answers to the questions. You may answer some or all of the questions, and the points that you accumulate will be added to your total homework score for the semester.

Important

Homework id: clustering; Extension: pdf; For this assignment, the file I turn in would be named jgc53_clustering.pdf.

Final assignment (part 2): Interpreting and reporting QIIME results (due 10 May 2016 9:30am, 20 points)

This assignment continues on from Final assignment (part 1). You should continue working from the IPython Notebook that you developed for that assignment, this time addressing the questions in Part 3 only. For this assignment you will turn in a 2.5 to 3 page paper as a PDF. Details on the format are provided in the IPython Notebook.

Important

Homework id: qiime2; Extension: pdf; For this assignment, the file I turn in would be named jgc53_qiime2.pdf.

Final assignment (part 1): Microbiome-based forensics with QIIME (due 21 April 2016, 20 points)

For this assignment, you will run analyses and answer questions in the IPython Notebook provided here. Download that file (click the download icon on the top-right, and save it to your computer) and then upload it to the class IPython Notebook server. You should answer the questions in Parts 1 and 2 only in your copy of the IPython Notebook, and submit that notebook by email. You will also submit the rarefied BIOM table generated in your analysis as a gzipped BIOM file (instructions for locating that file are in the IPython Notebook).

Important

Homework id: qiime1; Extension: ipynb and .biom.gz; For this assignment, the files I turn in would be named jgc53_qiime1.ipynb and jgc_qiime1.biom.gz.

Assignment 4: Multiple sequence alignment and phylogeny exercises (due 24 March 2016, 20 points)

For this assignment, you will answer questions from the IPython Notebook provided here. Download that file (click the download icon on the top-right, and save it to your computer) and then upload it to the class IPython Notebook server. You should answer the questions in your copy of the IPython Notebook, and submit that notebook by email.

Important

Homework id: multiple_alignment; Extension: ipynb; For this assignment, the file I turn in would be named jgc53_multiple_alignment.ipynb.

Presentations

Important

Homework id: app; Extension: pdf; For BIO/CS 499 students, the assignment should be named <group-number>_app_slides.pdf, so for example Group 1’s assignment would be named group1_app_slides.pdf. For BIO/CS 599 students, the assignments should be named <nau-id>_app.pdf and <nau-id>_app_slides.pdf, so for example my assignments would be named jgc53_app.pdf and jgc53_app_slides.pdf.

BIO/CS 499 Group Application Report and Presentation

Each group will be pre-assigned a bioinformatics software package and associated documentation or paper two weeks before their presentation date, and will present the software in class the day they’re assigned. Every member of the group must give part of the presentation. Your presentation should answer the following questions and your slides must be turned in as a PDF (by email, with all group member names included, before class on the day of your presentation).

  1. What is the biological problem that the authors are trying to address?
  2. What is the motivation for addressing this problem?
  3. What previous work has been done in this area? Are there preexisting tools that address this problem?
  4. What computational technologies did the authors make use of to create this tool (e.g., programming language, databases, etc)?
  5. What preexisting biological resources (e.g., sequence databases) did the authors make use of (if any)?
  6. What is the input to this tool?
  7. What is the output of this tool?
  8. How did the authors test this tool? Was performance benchmarking included in their paper?
  9. How did the authors evaluate whether this tool was giving biologically meaningful results?

Your presentation will additionally include a live demo of the software where the presenters show/discuss the input data, run the application, and show/discuss the output. Your presentation should be 15 +/- 2 minutes, including the live demo. You will lose points if your presentation falls outside of this time range.

All students in a group will receive the same grade on this assignment, unless there is clear evidence that some student(s) didn’t contribute, in which case the rest of the group should discuss with the TA or professor.

BIO/CS 599 Individual Application Report and Presentation

Graduate student presentations should be 15 +/- 2 minutes and should address all of the following questions. You will lose points if your presentation falls outside of this time range. You will also turn in a 2 +/- 1/3 page report answering these questions, by email on the day of your presentation. You will lose points if your report falls outside of this length range.

  1. Background/Introduction to your research.
  2. How does bioinformatics play a role in your research?
  3. What bioinformatics tool have you used (or do you plan to use) to address your questions?
  4. What computational technologies did the authors make use of to create this tool (e.g., programming language, databases, etc)?
  5. What pre-existing biological resources (e.g., sequence databases) did the authors make use of (if any)?
  6. What is the input to this tool (ideally you will demo this using your own data)?
  7. What is the output of this tool (ideally you will demo this using your own data)?
  8. How did the authors test this tool? Was performance benchmarking included in their paper?
  9. How did the authors evaluate whether this tool was giving biologically meaningful results?
  10. Discuss the pros and cons of this software relative to related bioinformatics software.

The presentation should address all of the questions listed here and should include a live demonstration of the software where the presenters show/discuss the input data, run the application, and show/discuss the output.

Presentations groups

Assignment 3: Pairwise alignment exercises (due 18 February 2016, 20 points)

For this assignment, you will answer four questions from the IPython Notebook provided here. Download that file (click the download icon on the top-right) and then upload it to the class Jupyter Notebook server. You should answer the questions in a text or word processing document, and email that document as a PDF before class on the due date.

Point breakdown:
  • Question 1: 4 points
  • Question 2: 2 points
  • Question 3: 2 points
  • Question 4: 12 points; This is the meat of the assignment, you should answer with 1-2 paragraphs. This is also where we can score you on your process, rather than just on whether you get the correct answer, if you provide sufficient detail.

Important

Homework id: pairwise_alignment; Extension: pdf; For this assignment, the file I turn in would be named jgc53_pairwise_alignment.pdf.

Assignment 2: BLAST exercises (due 4 February 2016, 20 points)

Using NCBI nucleotide BLAST, complete the assignment worksheet. You should turn in a PDF of that file with all answers filled in by email.

Important

Homework id: blast; Extension: pdf; For this assignment, the file I turn in would be named jgc53_blast.pdf.

Query sequences:

>Sequence1
AACAATTCATTTTTCCTGCTTTCCTAGAAAATTCTATAAAAGCTTCAAAA
TGAATTACTTGGTGATGATTAGTTTGGCACTTCTCTTCGTGACAGGTGTA
GAGAGTGTAAAAGACGGTTATATTGTCGACGATGTAAACTGCACATACTT
TTGTGGTAGAAATGCATACTGCAACGAGGAATGTACCAAGTTGAAAGGTG
AGAGTGGTTATTGCCAATGGGCAAGTCCATATGGAAACGCCTGTTATTGC
TATAAATTGCCCGATCATGTACGTACTAAAGGACCAGGAAGATGCCATGG
CCGATAAATTATAAGATGGAATGTATCCTAAGTATCAATGTTAAATAAAT
ATAATCAAAAAATT
>Sequence2
CTAATAATCCTTGGAATACTCCTATATTTTGTATAAAGAAGAAATCAGGG
AAATGGAGAATGCTAATTGATTTTAGAGAACTTAATGCAAAAACAGAAAA
AGGAGCAGAAGTCCAATTAGGATTACCTCACCCATCTGGATTACAGAAGA
GAAAGAATGTAACAGTTTTAGATATAGGAGATGCTTATTTTACCATCCCT
TTAGATCCTGATTATCAGCCCTATACTGCATTTACTTTACCATCTAAGAA
TAATCAAAGTCCAGGAAAAAGGTATATTTGGAAATCTCTTCCACAGGGGT
GGGTCTTGAGTCCCTTAATATACCAGAGCACTCTAGATAATATTCTACAA
CCATTTAGAA
>Sequence3
TCTTGGTGAGGATCCGTTGAGAACAACCCAACCGCCGCCCCATCGCCCTN
GTTAGANTNATGGCCGCGTCGGCGCTGCACCAGACCACCAGCTTCCTCNG
CACCGCCCCTCGCCGGGATGAGCTCGTCCGCCGCGTCGGCGACTCCGGTG
GCCGCATCACCATGCGCCGCACCGTCAAGAGCGCGCCCCAGAGCATCTGG
TATGGACCTGACCGTCCCAAGTNCCTGGGCCCGTTCTCGGAGCAGACGCC
ATCGTACCTGACCGGAGAGTTCCCGGGAGACTACGGGTGGGACACGGCGG
GGCTATCGGCCGACCCGGANACGTTCGCTATGAACAGGGAGCTGGANGTG
ATCCACTCNCGGTGGGCGATGCTGGGGGCGCTGGGCTGCGTCTTCCCGGA
GATCCTGTCCAANAACGGGG
>Sequence4
TTAATACATGCGAGTTGAACGTGAATTTTTTAATTAAAATGAAAGTAGCGT
ACTGGTGAGTAACACGTGAGAATCTACCTTTCAAATCAACATAAAATGTTG
AATAAAAGCTTCTAAAGCTATAAAGATATGTTTTCGTTGAAAGATGAGCTT
GCGCAAGATTAGGTAGTTGGTAAGGTAACGGCTTACCAAGCCAAAGATCTT
TAGCTGGTTTGAGAAAATGATCAGCCACATTGGAACTGAAACACAGTCCAA
ACGTAATATAACGGCAGCAGTAGGGAATTTTGAACACTGAGCGAAAGCTTG
ATTCAGCCAAGTATCGTGGATGAAGAAGGCTGTCTTTTGGTCGTAAAATCC
ATTTATATAGTCACATGAAATGTGTCTTTTATTTCGATAAAAGGAAAGATT
ATGACTTTCTATTGAAAAGTCCCGGCTAATCTCGTGCCAGCAGCCGCGGTA
ATACGAGAGGGGCAAACGATGTTTAGCATGATTGGGCGTAAAGAGCTTGTA
GATGGTTTCTTTTAATTTTATATAAAAGCTCTAAGCTTAACTTTGATTATA
TATAAAGGAAAGATAACTTGAGTTATGGAAAGGAAAGTAGAATTCTTGGAG
GAGAGGTAGAATTTGGTGATATCAAGAGGAATTCCAAAAGCGAAGGCAGCT
TTCTTGCCATATACTGACATTGAAGGGCGAAAGCGTGGGTAGCGACAGGGA
TTAGATACCCCATTAGTCCACGCCGTCAACGATGACCTTTATTTATTGGTT
TCTCTTAAAATAAATAAATTATTTTTTAGTTTGATCAGTGAAACAGTTAAC
GCGTTAAAAGGTCCGCCTGAGGAGTACGATCGCAAGATTAAAACTCAAAAG
AATAGACGGGAGCGTTCACAAGTGGTGGAGCATGAAGTTTAATGCGATACA
ACACGCAAAACCTTACCATTTTTTGATATTTTACTTATCAGTTATTTCTCA
TGAAATAATGTTTTTTACTAAAGTAAAAATTTGTTTGTATAACAGGCGTTG
CATGGCTGTCGTAAGTTCGTACTGTGAAGTGTTGGATTAATTTCCTTAACG
AACGTAACCCCTTGGTTTTGTTAAAACTAAAATCTACCGCTAGTCATAAAC
TAGAGGAAGGGAGGGATCACGTCAAGTCCTCATGACCCTTATAAAATGGGC
TACGCTTTTCGTGCTACAATGATAAATACAATAAGAAGCAATAACGAAAGT
TGGAGCAAATCTATAAAATTTATCTCAGTTCAGATTGTTCTCTGCAATTCG
AGAACATGAAGATGGAATCACTAGTAATCGTAGATCAGCATGCTACGGTGA
ATATGTAATTACGCTCTGTACTCACAGCCCGTCACACAATGGAAGTAAAAT
GTATCGGAAATTTGTCAAATATTGTTAGATTTTCTTTTTTAAATTTATTGA
ATAAATTATTTTAATTAATATCTTTCAACTAAATGGGAACTGATGATATGT
TTCATGACTGTTGTGAAGTCGTAACAAGGTAGCGCTAGCGGAAGCTGGTGC
TGGAT
>Sequence5
TTCCGGTTGATCCTGCCGGACCCGACTGCTACTTGGGTGAGAATAAGCCAT
GCAAGTCGAATGGAATACCAAAATATTCCATAGCAAACTGCTCAATAACAC
GTGATCAACTTACCCTATGGAAAACAATAACCTCTGGAAACGGAGGATAAT
GGTTTATAGTTGAAAAGGCTTGGAAAAGTTTTTCAATAAAAGGGAATAATA
AAAATGGTTATTATTTTGCCATAGGATAGGATTGCGGTCGATCATGGCTGT
TGGTGAGGTAATGGCTCACCAAACCAATAATCGATAGGGGCCGTGAGAGCG
GGAGCCCCGAGATGGGTACTGAGACAGCGACCCAGGCCTTACGAGGTGCAG
CAGGCGCGAAAACTCCGCAATACGCGAAAGTGTGACGGGGTTACCCAAGGT
GCTTAATTTTTAAGCTGTGGTAAGTGTGTAATGTACCTTACTAGAAAGGAG
AGGGCAAGGCTGGTGCCAGCCGCCGCGGTAAAACCAGCTCTTCAAGTGGTC
GGGATAATTATTGGGCTTAAAGTGTCCGTAGCTTGTATAATAAGTTCCTGG
TAAAATCTAATAGCTTAACTATNAGTATGCTAGGAATACTGTTGTACTAGA
GGGCGGGAGAGGTCTGAGGTACTTCAGGGGTAGGGGTGAAATCCTATAATC
CTTGAAGGACCACCAGTGGCGAGGGCGTCAGACTGGAACGCGCCTGANAGT
GAGGGACGAAAGCCAGGGGAGCGAACCGGATTAGATACCCGGTAGTCCTGG
CCGNTAAACGATGCACACTAGGTGTGGTATGGCTATTGAGCCCATATCAGT
GCCGAAGGGAAACCCATTAAGCGTGCCGCCTGGGGAAGTACGGTCGCAAGG
CTAAAACTAAAAGGAATTGGCGGGGGAGCACCACAAAGGGGTGAAGCCTGC
GGTTCAATTGGACTCAACGCCGGGAAAACTTCCCAGGGGAGACAGCAGAAA
TGAAAAGTCAGGTTGACGACCTTACTTAACGAGCTGAGAGGAGGGTGCCAT
GGCCGTCGCCAGTTCGTGCCGTGAGGTATCCTGTTAAGTCAGGCAACGAAC
GAGACCCGTGCTTTTAGTTCCCAGCAAGACGTCACGACTTCGATGGGAACA
CTAAAAGGACCGCCATCGATAAGATGGAGGAAGGAGCGGGCCAAGGCAGGT
CAGTATGCCCCGAAACCCCTGGGCCACACGCGGGCTGCAATGGTATGAACA
ATGGGCTGTAACTCCGAAAGGAGAAACCAATCCCGAAATCATATCTCAGTT
GGGATTGTTGGCTGTAACTCGCTGACATGAACGTGGAAT
>Sequence6
AGAGTTTGATCCTGGCTCAGGATGAACGCTAGCTACAGGCTTAACACATGC
AAGTCGAGGGGCAGCATGGTGTATCAATATATCTATGGCGACCAGCGCACC
GGTGATGCACACCTCTCCTACCTGCCCCTTACTCCGGGATGATCTTTCTAA
AAAAATATTACTACTCCATGGTATTACCGAAAAACGTCTTTTTGTTGTTTA
AAAACTTCGATGGTGGAAGGTGATGCTTTCTATTATATACTTGGTGGGGTA
ACAGCCCACCACCTCAGCGATGAATAGGGGTTCTAATAAGAAGGTCCCCCC
CATGGTAACTGGGCCCCGGTCCAAATTCTTCGGGAAGCCACCAGTGAGGAT
TATTGTTCAATGGCGGAGATTTTGACCCAGCCCAAGTAGCGTGAAGGATGA
CTGCTCCCATAGGTGGTAAACTTCTTTTATATGGGAATAAAGTGAGTCACG
TGTGTCTTTTTGTATGTATCATATGAATAAGGATCGGCTAACTCCGTGCCA
GCAGCCGCGGTAATACGGAGGATTCGAGCGTTATCCGGATTTATTGGGTTT
AAAGGGAGCGTAGGCGGTTTGTTAAGTCAGTGGTGAAAGTTTGGGGCTCAA
CCGTGAAATTGCATTTGATACTGGCGGTCTTGAGTGCAGTAGAGGTGGGCG
GAATTTGTGGTGTAGCGGTGAAATGCTTAGATATCATGCAGAACTCCGATT
GCGAAGGCAGCTCACCGGAGTGTATCTGACGTTGAGGCTCGAAAGTGTGGG
TATCAAACAGGATTAGATACCCTGGTAGTCCACACAGTAAAGAAGGAATAT
TGTCGTTGTGGGATCTCCATTAAGGGGTCAAGGGAAAGCATTAATTATTCC
CCTGGGGGAGTAGTCCGCCAGAGGTGAAATTAAAAGAAATGGAGGGGGGCC
GGCCCAAGGGAAGGACCATGTGGTTTAATTGGAGGATAGGGGAGGACCTTT
CCCGGGGTTGAAAGTGCAAATGAATTATGGGGAGAGCCATTCCCTTCAAGG
CATGAGAGAAGGTGCTGCATGGTTGTCGTCAGCTCGTGCCGTGAGGTGTCG
GGTTAAGTCCCATAACGAGCGCAACCCTTATCTTCAGTTACTATCAGGTCA
AGCTGAGCACTCTGGAGAGACTGCCGTTGTAAGATGAGAGGAAGGTGGGGA
TGACGTCAAATCAGCACGGCCCTTACGTCCGGGGCTACACACGTGTTACAA
TGGGGGGTACAGAAGGCAGCTACCCAGCGACAGGATGCCAATCCCAAAAAC
CTATCTCAGTTCGGATTGAAGTCTGCAACCCGCCTTCGTGAAGTTGGATTC
GCTAGTAATCGCGCATCAGCCATGGCGCGGTGAATACGTTCCCGGGCCTTG
CACACACCGCCCGTCA
>Sequence7
GATGAACGCTGGCGGCGTGCCTAATACATGCCAGTCGAGCGAACTTATGAT
AAGCTTGCTTCTCTGATGTTAGCGGCGGACAGGTGAGTAACGCTTGGGTAA
CCTACCTATAACAGTGGGATAACTCCGGAAAACCGGGGCTAATACCGGATA
ATATATTGAACCGCATGGTTCAATGTTGAAAGACGGTTTCGGCTGTCTCTT
ATAGATGGACCCTCGCCCCATTATCTATTTGGTAAGGGAACAGCTTACCGA
GGCAACGAGACGTAACCCACCTGAGAGGGTGATCGGCCACCCTGCAACTGA
GACCCGGTCCACACTCCTAACGCAGGCAGCAGGAAGGAATCTTCCACCATG
GGCGAAAGCCTGACGGATCACCGCCCCGCGACTGATGAATGACTTAGGATC
TCAAATCTCTGTTGTCAGGGAAGAACAAATATGTTAGATACTGAACAAATC
TTGACCGCACCTCACCATAAAGCCACGGCTAACTACGTGCCAGCAGCCGCG
GTAATACGTAGGCGGCAATCGTCATCCGGAATTATTGGGCGTAAAGCGCGC
GTAGGCGTTTTCTTTAGTCTGATGTGACAGCCCGCGCCTCAGCCGTGGAGC
GTCATTGGAAACTGGGGAACTTGAGTGCAGAGGAGAGTGGAATTCCATGTG
TAGCGGTGAAATGCGCAGAGATATGGAAGAACACCAGTGGCGAAGGCGGCT
CTCTGGTCTGTAACTGACGCTGATGTGCGAAAGCGTGGGGATCAAACAGAA
TTAGATACCCTGGTAGTCCACGCCGTAAACGATGAGTGGTAAGTGTTAGGG
TGTTTGCGCTCCTTAGTGCTGCAGCTAACGCATTAAGCACTCCGCTCGGGG
AGTGCGACTGCAAGGTTGAGATTCAAATGAATTGACGGGACCCGCACAAGC
GGTGGAGCATGTGGTTTAATTCGAAGCAACGCGAAGAACATTAACAAATCT
TGACATCGTCAGATCGCTCTAGAGATAGAGTTTTAGCTTTCGGTGGACAAA
GTGACAGGTGGTGCATGGTTGTCGTCAGCTAGTGTCGTGAGATGTTGGGTT
AAGTACAGTGCAACGAGCGCAACCCTTAAGTTTAGTTGCCATCATTAAGTT
GGGCACTATTGGTTGACTGCCGGTGACAAACCGGAGGAAGGTGGGGATGAC
GTCAAATCATCATGCTCCTTATGATTTGGGGTACACAAGTGGTGCAATGGA
TAATACGAAGGGCAGTGAACCCGTGAGGTCAAGCAAATCCTATAAAATTAT
TTTCAGTTGGGATTGTAGTATGCAACTAGTCTACATGAAGAAGGAATAGTT
AGTAATAGTAGATCAGCATGATACGGTGAATAAGTTCCTGGGTGTCGTACA
CCCCGCCCGTCACCCCACCAGAGTTTGTAACACCAGAAGCCGGTGGAGTAA
CATTTTATTAGGAGCTAGCCGTCGAAGGTGGGAC

Assignment 1: GC content (due 26 January 2016, 10 points)

Download a genome and compute its GC content. Copy or download the assignment, fill in your answers, and turn the assignment in by email as a PDF. While you will get started on this assignment in class (optionally in small groups), you will complete the questions in assignment yourself.

Note that there are various ways that you can just look up the GC content, including via the IMG website. I’m asking you to compute it, and you’re being graded on your descriptions. Getting the right answer is a bonus (i.e., if you spend a couple of hours trying, and get it wrong, you’ll be graded on your well-documented effort, not your final answer).

Hints: Start with the NCBI Genome Browser, and work with a bacterial, archaeal or viral genome.

Be creative - there are many ways to achieve this.

Important

Homework id: gc_content; Extension: pdf; For this first assignment, the file I turn in would be named jgc53_gc_content.pdf.