Keeping track of the science – lab book equivalents for Bioinformaticians

I’ve recently been employed as a Research Assistant with the group I did my PhD with at the Menzies Institute for Medical Research (University of Tasmania, Hobart, TAS, Australia). My work is continuing on with genomics based analyses of inherited diseases including cancer (my thesis was based on a Tasmanian familial blood cancer resource, but we also study other cancers) and also now eye diseases.

Something I’m working on at the moment is how to keep a record of my day-to-day analyses when i’m working on different projects for different people. I was originally trained as a lab based scientist so for me this is a lab book equivalent. Somewhere I can record the processes I went through, ideas, relevant notes and output. I also want it as a living document that can be shared with my colleagues and supervisors. I’d like to be able to demarcate code, differentiating between R code and Unix commands and also be able to include plots with ease.

So i’ve been investigating a few different options, which is what this post is about:

Google Docs

My initial go-to was Google Documents. I can use different font styles there for Unix or R code, it’s easily shareable and collaborative with others, changes are trackable, and it’s stored online securely (for what I understand).

However it’s a little bit cumbersome copy / pasting from the terminal or an R script to the google document, and I won’t always necessarily have WiFi access when analysing data.

R Markdown

So my next port of call has been in RStudio using an R Markdown document. The idea here would be keep everything in the Markdown and regularly upload the PDF to  With knitr this creates a PDF fairly quickly and I can still do demarcations of different code types. I’ve been using the standard demarcation for R code (which also runs the code during knitr) and tab indenting Unix commands and Unix output. A challenge I’ve hit is when I have a long unix command string without spaces or new lines, Markdown doesn’t wrap automatically. (I believe I can make a change to allow that). I also don’t always want or need it to run my R code when the PDF (or HTML or Word document) compiles, so I’d like to be able to demarcate that something is R code but not have it as active R code.

Other options?

I could achieve a similar thing to Markdown with LaTeX, which is many levels of more complicated above Markdown but would allow me to personalise my recordings and make the desired demarcations.

I’ve had a brief google around and found a few posts on the topic:

http://saaientist.blogspot.com.au/2008/05/keeping-track-of-things-using-labbook.html

Which advocates for using a tiered type organisation. Project, Task, Steps. Record the details of the project, record the tasks in the project as you do them and then the steps within each task (including all data mangling).

https://www.biostars.org/p/46444/

With a bit of dialogue happening here (post is a few years old) one recurring suggesting is to keep a personal Wiki. Seems like a workable idea.

The biostars post also led me to this article: Noble, WS, (2009) A Quick Guide to Organizing Computational Biology Projects, PLoS Computational Biology

I like this article, it brings together a lot of different conversations on this topic with opinions from an experienced Bioinformatician. The article also recommends a wiki or other online alternative / blog so that it can be accessed collaboratively.

Tentative conclusions

With that in mind I think what I’m going to aim for is the Google document approach. If I set up some styles it will be easy to change fonts for different types of code. It will mean copy/pasting notes and code around but I think that it’s the most useful approach for working accountably and collaboratively. I’ll keep looking into R Markdown as well, it might be useful to use that as a final resting place for what I place in the Google doc when part of a project is complete.