Today's ii wheel ship of the day is...
Pickle x Box!
seen from Malaysia
seen from China
seen from China

seen from Australia
seen from United States

seen from Italy
seen from United States

seen from T1
seen from China

seen from Russia
seen from Maldives
seen from Luxembourg

seen from United States

seen from Italy

seen from United States

seen from Malaysia

seen from United States

seen from Italy

seen from Italy
seen from Maldives
Today's ii wheel ship of the day is...
Pickle x Box!

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.
Free to watch • No registration required • HD streaming
After reviewing the joined read results for pickle box analysis, I decided to join the forward reads from both runs, and avoid joining altogether. The reverse reads were pretty poor across the board, so we were losing way to many sequences and joining others incorrectly.
Here is the R script regarding quality: http://rpubs.com/jfmeadow/27882
Mostly it came down to really low reverse read quality that tanked the total output number, as seen here:
The x and y axes here are equal at the top sequence count. Thus the forward read in the second run had the most sequences, and by combining both forward runs, I can get the most from the total number of sequences.
I've been trying to find alternative methods to the standard QIIME OTU clustering. UPARSE (published here)
Here is the workflow I am currently using to process sequence data using a combination of QIIME and UPARSE. Ann W put this together based on Mike Robeson's post here.
The only step that takes lots of time is the OTU table python script. No idea why it is so slow, and it might be worth rewriting that script to speed things up.
# Split libraries with QIIME split_libraries_fastq.py -v -q 0 --store_demultiplexed_fastq -i $COMBINED/seqs.fastq -b $COMBINED/barcodes.fastq -o splitLib/ -m map.txt --barcode_type 16 # -n 300 # get quality stats usearch -fastq_stats splitLib/seqs.fastq -log splitLib/seqs.stats.log # remove low quality reads - trimmed short seqs - presumeably didn"t join correctly. mkdir qF usearch -fastq_filter splitLib/seqs.fastq -fastq_maxee 0.5 -fastaout qF/seqs.filtered.tmp.fasta -fastq_minlen 400 # -fastq_trunclen 296 sed 's/>/>barcodelabel=/' qF/seqs.filtered.tmp.fasta > qF/seqs.filtered.fasta # dereplicate sequences. Last step with files separate. mkdir deRep usearch -derep_fulllength qF/seqs.filtered.fasta -output deRep/seqs.filtered.derep.fasta -sizeout # filter singletons mkdir filterSingles usearch -sortbysize deRep/seqs.filtered.derep.fasta -minsize 2 -output filterSingles/seqs.filtered.derep.mc2.fasta # clusterOTUs mkdir OTUs usearch -cluster_otus filterSingles/seqs.filtered.derep.mc2.fasta -otus OTUs/seqs.filtered.derep.mc2.repset.fasta # reference chimera check mkdir chiCheck usearch -uchime_ref OTUs/seqs.filtered.derep.mc2.repset.fasta -db scripts/gold.fa -strand plus -nonchimeras chiCheck/seqs.filtered.derep.mc2.repset.nochimeras.fasta # label OTUs using puthon script from UPARSE mkdir labelOTUs python scripts/fasta_number.py chiCheck/seqs.filtered.derep.mc2.repset.nochimeras.fasta OTU_ > labelOTUs/seqs.filtered.derep.mc2.repset.nochimeras.otus.fasta # match original quality filtered reads back to otus - this is with bash derep workaround. mkdir matchOTUs usearch -usearch_global qF/seqs.filtered.fasta -db labelOTUs/seqs.filtered.derep.mc2.repset.nochimeras.otus.fasta -strand plus -id 0.97 -uc matchOTUs/otu.map.uc # make otu table mkdir otuTable # python scripts/uc2otutab.py matchOTUs/otu.map.uc > otuTable/seqs.filtered.derep.mc2.repset.nochimeras.otu-table.txt python scripts/uc2otutab_jl.py matchOTUs/otu.map.uc > otuTable/seqs.filtered.derep.mc2.repset.nochimeras.otu-table.txt #### still slow - running pbs script Monday afternoon. # convert to biom biom convert --table-type="OTU table" -i otuTable/seqs.filtered.derep.mc2.repset.nochimeras.otu-table.txt -o otuTable/seqs.filtered.derep.mc2.repset.nochimeras.otu-table.biom # assign taxonomy assign_taxonomy.py -t gg_13_5_otus/taxonomy/97_otu_taxonomy.txt -r gg_13_8_otus/rep_set/97_otus.fasta -i labelOTUs/seqs.filtered.derep.mc2.repset.nochimeras.otus.fasta -o assigned_taxonomy # add taxonomy to BIOM table biom add-metadata --sc-separated taxonomy --observation-header OTUID,taxonomy --observation-metadata-fp assigned_taxonomy/seqs.filtered.derep.mc2.repset.nochimeras.OTUs_tax_assignments.txt -i otuTable/seqs.filtered.derep.mc2.repset.nochimeras.otu-table.biom -o otuTable/otu_table.biom
We received great feedback on the knitr documents that were submitted with the Lillis microbial surface paper that was published in Microbiome Journal. The editors actually wrote a great piece about our efforts! So I've been asked to speak recently about how to pull this off. This has given me an opportunity to create some teaching materials. Mostly composed of a small subset of those data, along with analysis scripts and example manuscript documents all created in the dynamic analysis document style, using knitr and pandoc.
Github repository here: https://github.com/jfmeadow/ReproducibleDemo
Slides here: https://dl.dropboxusercontent.com/u/62653716/ReproDemo.pdf
This is a topic I've also been asked to talk about during my upcoming visit to San Luis Potosi, Mexico. So I'll get to reuse this potentially several times.
Lately we've been working really hard getting the next round of PickleBox studies off the ground, including IRB and planning with ESBL. We're also almost ready to submit our phone microbiome paper.
I've been working all week to migrate lots of data onto our new storage system. We're trying to maintain a consistent directory structure for each project so that we can all navigate easily in each others' projects well after we each leave.
We published the Lillis Dust paper in PLOS ONE this week, so I've also been fielding press for that. It has shown up in Gizmodo, Popular Science, Quartz, Fast Company, Futurity, and lots of other news sites that just printed the press release. So that's fun.
The surface paper was also accepted in Microbiome Journal so that will finally be published soon.
I also submitted the pickle box paper last week.

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.
Free to watch • No registration required • HD streaming
Finished a draft of the picklebox manuscript today and handed it off for editing! It is coming along really nicely and the study gets better and better as I write and dig into results and figures. Rare.
As for manuscript writing in markdown, it is going really well. Except tables are a real bummer. Converting tables between markdown, latex, and word is a hassle, and editing tables in anything other than excel or R is also a hassle. So for now, I'm keeping tables out (in their own excel file) until the final version. Then they can be plugged in pretty easily. Other than that, manuscript writing in markdown is wonderful and pretty much hassle free!
I'll be absent from the lab for a couple of weeks. Enjoy!
The Lillis surfaces manuscript was resubmitted to AEM last week, and for the first time I was able to submit an R markdown knitr document as supplementary information. If this is well received by reviewers, I'll try to do so for most manuscripts in the future.
I have also been re-preparing all raw Lillis sequence data to be put into the QIIME DB. The previous efforts to get the data in the shape they wanted didn't work, so instead they have added dual-indexing capability to the DB. Adam and I prepared a full document detailing all sequence files, metadata, barcodes, and sequence format. If they are able to get these datasets into the DB, it will be a much better solution. For now, the indoor evol meta-analysis is put on hold until the situation is more amicable. Rachel, Holly, Ashley and I agreed that we should hold off until about January and reassess. Holly has some great ideas about how to approach the situation without the standard QIIME analysis pipeline which might help us combine all of these disparate datasets.
After resubmitting the Lillis dust and surface manuscripts in the last 2 weeks, I've been spending most of my time working on the pickle box manuscript. This ideally needs to be submitted before the next couple of weeks, but plenty of work remains. Since analysis for this project is so complex, and the datasets are so big, I have broken R markdown scripts into:
analysis where raw data are processed and heavy stats routines are conducted
results where stat outputs are finalized and tables, tests, are output
figures where figures are created.
However, this becomes unwieldy in a hurry, so I think in the future, for similarly complex projects, it would be best to break into:
processing where raw data are formatted
several different results scripts to answer different questions
several different figures scripts for each individual figure.
The results scripts should be compartmentalized because workspaces become really big in a hurry when all analyses are combined. So some routines, such as indicator analysis, should be computed separately so as not to bog down one big analysis script.
The figures scripts really suck to work on inside of RStudio (where knitr works best), because of the crappy way that RStudio handles figure windows. Harder to output to pdf, harder to tweak on the fly, etc. So now I've taken to creating a standalone script for each figure so that the whole script can be sourced every time a change is made. Then the final version can be put into R markdown format to be curated.
Additionally, for the pickle box project, we really have to deidentify subjects (occupants) and unfortunately our names are shown throughout the analysis. This made life infinitely easier during analysis, but it might mean that we cannot submit a R markdown script with this manuscript.
I spent today working on the swimming/skin/MB idea details, and then spent the rest of the day filling in results and discussion on the pickle box manuscript.