Welcome to the 2016 #uioCarpentry Oslo workshop on Reproducibility¶
These are the workshop materials for the experimental Data Carpentry/Reproducibility workshop we offered at the University of Oslo on March 17th, 2016.
This content is under CC0.
Instructors: C. Titus Brown, Tracy K. Teal, and Lex Nederbragt.
Materials:¶
Welcome!¶
- Who are we? Currently, background, interests...
- meta-goals for this workshop: hands-on exposure, with lots of rooms
for questions. Please ask questions.
- etherpad link: http://pad.software-carpentry.org/2016-oslo-reproducibility
- We have a Code of Conduct
- Caveat: this is the first time for this workshop material all together!
- All of this material will be open (CC0). Use it as you wish!
- all material will stay up indefinitely (although it may bitrot!)
- Pretty please, ask questions!
- Notes on sticky notes.
- Where are the bathrooms??
- How many of you have Docker installed? How many of you have AWS accounts?
- Show of hands –
- worked with Jupyter Notebook before?
- shell experience?
- used ‘make’?
- git & github experience?
- Docker?
Repeatability vs reproducibility - a discussion¶
Learning goals: attendees will understand some of the basic arguments for computational repeatability in science.
While the name of the workshop is “reproducibility”, what we’re really going to be talking about here is what I will call “repeatability” - the ability to repeat, exactly, a computational analysis.
(There is a lot of epistemic confusion around the precise meanings of the words “reproducibility” and “replication”, and even “repeatability”. Sorry.)
In many computational analyses, repeatability should be pretty easy to achieve - given same input data, same scripts, same workflow, and same installed packages, you should be able to generate the same results.
Why is repeatability valuable? There are two reasons:
- Repeatability is valuable for efficiency, reuse, and remixing - you and others can repeat, edit, reuse, and repurpose the work.
- Repeatability is important for reproducibility - if someone else cannot reach approximately the same results with similar input data, then repeatability is a requirement for tracking down the differences.
Two stories¶
PROBLEM: Amanda did variant calling on some genomic data, and wrote up the results for publication. One of the reviewers thinks her broader results are biased by the choice of aligner and asks her to redo the analysis with a different program.
SOLUTION: Amanda reruns her analysis to repeat her original results, then forks her workflow, edits three lines to replace the Bowtie2 aligner with the BWA-mem aligner, and reruns the analysis.
OUTCOME 1: Amanda discovers that indeed her analysis results are quite different with BWA. More work is needed.
OUTCOME 2: Amanda discovers that her analysis results are very similar whether she uses BWA or Bowtie. Her work is done.
PROBLEM: Amanda did variant calling on some genomic data, and published the results. After publication, another research group led by Julio (with access to different samples) says that they find a different global pattern of genotypic variants underlying the same phenotype. Because both research groups used different samples, data collection approaches, and data analysis approaches, they wonder where the true disagreement lies.
SOLUTION: Julio takes Amanda’s pipeline and runs it on their data.
OUTCOME 1: They discover that Julio’s data run with Amanda’s pipeline gives Amanda’s results, and now they can start tracking down what Julio did differently from Amanda in data analysis.
OUTCOME 2: They discover that Julio’s data run with Amanda’s pipeline gives Julio’s results. This suggests that there is something different about the samples or data collection, while the data analysis itself is not the source of the differences.
Repeatability vs reproducibility: what’s the target?¶
For science, reproducibility is the goal - if other scientists can’t follow your process and reach roughly the same results, then the results aren’t robust. (For more, read this excellent Wikipedia article, and you can also read Dorothy Bishop’s excellent discussion of the reproducibility crisis in psychology.) Consider the second story above - if neither Amanda nor Julio’s data analysis workflow had been easy to repeat, it would have been difficult to know whether data analysis was the source of the differences or not.
IMO, reproducibility cannot be demonstrated in a single lab / within a single paper. It has to be the product of multiple people, labs, and publications.
For computational folk, repeatability is useful for reproducibility, but also for other purposes - consider the first story, above, where Amanda can re-run her analysis quickly with a different aligner (or with different parameters). This is an increase in efficiency. This efficiency argument can be extended to many more scenarios –
- Tim, who collaborates with Amanda’s lab, wants to do a similar analysis but with different data. He can use Amanda’s pipeline as a starting point.
- Kim, Amanda’s advisor, needs to repeat Amanda’s analysis after Amanda leaves the lab. They can re-run Amanda’s pipeline even without Amanda around.
- Amanda, in 3 years, needs to run a similar analysis on new data (or has a student that needs to run a similar analysis). Rather than trying to remember all the details of her own analysis from 3 years ago, she can start from a working analysis.
- Qin wants to do a meta-analysis of Amanda and Julio’s data. They can start from Amanda’s workflow and use it to run analyses on Amanda’s data, Julio’s data, etc. with some consistency.
The first three of these scenarios increase efficiency and time to publication for Amanda and her group; the fourth is more generally beneficial for the field overall.
The question for you all to consider is this: how much time and effort should you put into making your workflows reproducible, given the expected benefits?
The goal of this workshop overall is to show you the kind of overall workflow and toolset that we, and many others, have converged upon.
Next: Overview and agenda
Overview and agenda¶
Overview¶
It may just be my imagination, but it seems like many practicing computational scientists have settled on the same basic workflow for highly repeatable computational analyses:
- a workflow engine, for long-running/compute-intensive analyses;
- some form of narrative documentation for exploring and tracking data analyses and graphing;
- a “cleanroom” execution environment for specifying and isolating dependencies;
- some form of version control for tracking digital artifacts and collaborating around them;
There are many different choices here, but some common ones are shown in the figure:

The way this all works in practice is shown in the next figure:

Basically, some form of virtual environment is used to execute the workflow and data analysis narrative, transforming raw data into results; all of the associated digital artifacts (except for the raw data and, generally, the results) are checked into some form of version control system.
What we will do today is build a small example that demonstrates all of these pieces working together.
Agenda¶
- Introducing Jupyter Notebook (CTB)
- (Re)visiting Make: a workflow engine (CTB)
- Lunch!
- Putting it all together: make, git, & jupyter notebook. (CTB)
- Docker for containerization. (CTB)
- Advanced topics (if time) (CTB)
- Travis: building things automagically (TKT)
An introduction to Jupyter Notebook¶
To open Jupyter for this, click: Start here
Go here to see a notebook transcription of this lesson.
Basics¶
This is the console. This is where you start and stop notebooks, along with other things.
We’ll start with a Python notebook. You can pick the version of Python to run; for demonstration purposes, let’s use Python 3. New... Python 3.
This opens a new window. You can go back to your Home window and open up multiple notebooks in Jupyter.
New notebooks start as Untitled; click to change the name.
The main feature of a notebook is the code cell, here. You can type any valid python in here.
Put in
print("hello, world")
To execute it, go up to ‘Cell...’ and select ‘Run’. You should see the output (or a syntax error).
You can also edit and re-execute this cell. Try it!
To add a cell, go to Insert and Insert cell below. Now add another statement and execute it.
Using the menu to execute cell is annoying so you can use CTRL-ENTER and SHIFT-ENTER. What’s the difference between these?
You can also define variables: put a=5, b=10. Then in another cell put c=a+b and print(c). These are persistent definitions with a notebook’s namespace.
You may notice that the numbers change a bit. What’s up with that?
That’s the execution order. Later numbers were executed last. So it’s possible to execute cells in different order, which can get confusing.
Try adding a new cell with a=20. Then execute it. Now re-execute the c=a+b cell.
The order of execution follows the numbers, not the cell order. Generally before saving a notebook you should go run all: Cell menu, Run all. That will run all the cells in order in the notebook.
Cell output updates in real time. So, if you have long-running cells you’ll see output as it comes. For example, ::
import time for i in range(10): print(i) time.sleep(1)
will produce output in realtime.
Plotting¶
Probably the single most useful thing for me about notebooks is the plotting, which also shows up in the notebook.
We’ll use the matplotlib plotting. To get started, do:
%matplotlib inline
from pylab import *
Now, create a plot:
x = range(10)
y = [ i**2 for i in x ]
plot(x, y)
You can update this plot incrementally, too, of course, so e.g. change the plot line to ‘plot(x, y, label=”my line”)’ and then add a ‘legend()’ command.
One super neat feature of Jupyter is the ability to load code into a cell from other places. For example, if we go to the matplotlib gallery and see something we want to work with, you can load that code into the notebook. For example, if we like this plot we can grab it directly like so::
%load http://matplotlib.org/mpl_examples/images_contours_and_fields/streamplot_demo_features.py
(Here, we copied the source code link.)
This kind of feature is called a “cell magic” and you can see the long list of options in the cell magics documentation.
(If this returns an error, you may need to add ‘%matplotlib inline’ at the beginning of the cell OR re-run your notebook. Why?)
Moar features¶
There’s some other nice features for writing code.
You can do tab completion – try typing ‘pr’ and hitting tab. You’ll see a popup that gives you all of the commands that start with ‘pr’.
This works for modules, too. Above, we imported ‘time’. Let’s take a look
at what’s in time by typing ‘time.’ and then
You can also get help - type ‘print?’ and hit enter.
Getting in (and out) of trouble¶
If you want to clear the variables and restart, you can go up to Kernel and do “restart”. Now try typing ‘print(a)’ in a new cell. You’ll see the numbers have reset (and ‘a’ isn’t defined!)
If you get into an infinite loop, or just want to break into a cell, use Interrupt Kernel. Try ‘import time; time.sleep(15)’ and then interrupt kernel. Note that the cell number is ‘*‘ while it’s executing.
It’s generally good practice to restart the kernel and execute from the beginning before saving or continuing a notebook.
Markdown cells¶
By default, cells in notebooks are code cells. But you can also have markdown cells, that display formatted text; see this cheatsheet for more info on all the syntax.
For now, put in
# this is a heading
here is some text with *markup*
and change the cell type to ‘Markdown’. Now, when you leave the cell, it renders nicely.
The only things I know about markdown are: titles start with #, you can emphasize or bold, and you can add links with link text. This is a pretty nice way to write text around analysis discussions, rather than as comments in the cell.
Jupyter Notebook also has some nice extensions for markdown that let you put in equations, if you’re so inclined. For example, put in:
$f = \frac{x}{y}$
in a markdown cell and you’ll see it render as an equation. This uses latex symbol notation.
So, in a notebook, you can intersperse code with text.
That concludes an initial tour of what notebooks can do. Next we’ll talk more about the whole Jupyter Notebook ecosystem. But, if you’re interested in seeing more about what you can do with notebooks, there’s lots of interesting notebooks to look at in the Jupyter Notebook gallery.
Return to index | Next: More Jupyter
More Jupyter: Multiple languages, the console, and caveats¶
Other languages: R notebooks¶
Try starting an R notebook, and executing:
require(graphics)
(yl <- range(beaver1$temp, beaver2$temp))
beaver.plot <- function(bdat, ...) {
nam <- deparse(substitute(bdat))
with(bdat, {
# Hours since start of day:
hours <- time %/% 100 + 24*(day - day[1]) + (time %% 100)/60
plot (hours, temp, type = "l", ...,
main = paste(nam, "body temperature"))
abline(h = 37.5, col = "gray", lty = 2)
is.act <- activ == 1
points(hours[is.act], temp[is.act], col = 2, cex = .8)
})
}
op <- par(mfrow = c(2, 1), mar = c(3, 3, 4, 2), mgp = 0.9 * 2:0)
beaver.plot(beaver1, ylim = yl)
beaver.plot(beaver2, ylim = yl)
par(op)
Basically it all works as you’d expect...
Import Python code¶
Generally it’s a bad idea to write a LOT of code in a single cell; we tend to suggest using the notebook as a way to explore data, rather than write lots of code. Luckily, you can import code from modules just like you normally would.
Try entering this in a cell in a Python notebook:
%%file mycode.py
def f():
print('hello, world')
and then in the following cell, enter:
import mycode
f()
Note, here the ‘%%file’ is just a way of creating a file - you can do that in a variety of ways. Speaking of which...
Console¶
Basically, you can interact with the file system in a variety of ways: via notebook and/or running code, OR via console/upload/download/edit, OR via terminal.
- upload files;
- download and edit files;
- save and download figures;
- terminal window;
Where is this all running?¶
The general architecture of Jupyter Notebook is this:
We are running things in mybinder specifically; we’ll cover this later, but: basically we’re running on Google Compute Engine.
Caveats¶
- long-running notebooks don’t work that well;
- multiple views of the same notebook share the kernel but don’t share the output;
- this is the same on a reload...
- the execution order can be confusing: re-run your notebook from scratch, frequently.
A brief introduction to ‘make’, a workflow engine¶
Learning objectives: get a first look at a Makefile, and put your code and new Makefile into a git repository on GitHub.
‘make’ is a way to set up a computational pipeline, or workflow, in such a way that you can execute it all with a single command. (There are other programs than ‘make’, including snakemake, pydoit, scons, and makeflow; they all do fairly similar things.) We’re using ‘make’ because it’s simple, it’s ubiquitous, and it illustrates the basic points.
We’re going to walk through the first two parts of the Software Carpentry ‘make’ tutorial. To get started, we’ll have to:
- download some data;
- configure matplotlib to work without interactive graphics;
- fix our editor to put tabs in Makefiles;
Create a new terminal by going to the Console (File... Open...) and then doing New... Terminal. See the screenshot below –

Once you have a new terminal, copy and paste the following:
wget https://swcarpentry.github.io/make-novice/make-lesson.zip
unzip make-lesson.zip
cd ./make-lesson/
This downloads a bunch of scripts from the ‘make’ lesson, and changes to that directory.
You will also need to configure matplotlib to display to a file - see this stackoverflow issue:
echo backend : Agg >> matplotlibrc
Note
A tip on organizing your windows: we’re going to be doing a lot of copy-pasting, and you can arrange your tabs and windows to faciliate this! Put your terminal window in a new window, and then you can use Alt-TAB (on Windows) or Command-backquote (on Mac) to switch quickly between the windows.
One last step in configuration – to edit Makefiles with tabs in the Jupyter editor, you’ll need to do this:
mkdir -p ~/.jupyter/nbconfig
echo '{ "Editor": { "codemirror_options": { "indentWithTabs": true } } }' >\
~/.jupyter/nbconfig/edit.json
(See my issue in Jupyter’s issue tracker for this and other resolutions.)
Building a Makefile¶
Now, let’s walk through the first two bits of the make lesson:
When the time comes to edit files, you can do so in the Jupyter console by entering the make-lesson folder, creating a New... Text file, and then renaming it to the desired filename. (See screenshots below.)


In your Makefile, you’ll need to be sure to put in tabs instead of spaces for indents; your final Makefile should look like this:

Saving everything to GitHub¶
At this point, you should be able to type ‘make’ to build everything, and ‘make clean’ to remove all the outputs and start over from scratch. Now we want to save this to github so that we can communicate it to ourselves (and others!)
First, we’ll make a git repository.
Making a local git repo¶
Configure git, replacing the e-mail and name with your own:
git config --global user.email some@user.com
git config --global user.name "Some User"
Now, create a new local git repository, and add all your input files and scripts:
git init
git add Makefile matplotlibrc plotcount.py wordcount.py books
git commit -m "initial commit"
(Note, here adding the ‘books’ data into git for convenience, but in general we don’t recommend putting raw data into your github repo.)
Now, we want to push this to a public repository on github.
Pushing from a local repository to github¶
Finally, go to github.com and create a new repository, and then copy and paste the commands under ”...or push an existing repository from the command line.” (See screenshot, below.)

You should now have a new repository, full of your data, scripts, and now a Makefile. Congrats!
Next: Putting it all together: make, git, and a Jupyter notebook
Putting it all together: make, git, and a Jupyter notebook¶
Discussion topic: why are we using ‘make’ for the word counting exercise?
Related discussion topic: how do we decided what to put in ‘make’ and what to put in a notebook?
Adding a data analysis narrative¶
Rather than doing a real data analysis, we’re just going to make a word cloud from the word count information. (If this were a Real Scientific Project, this is where you’d be doing your statistics and your graphing.)
Create an empty Python2 notebook named ‘wordcloud’ in the make-lesson directory; in it, add three cells.
First, install the wordcloud library:
!pip2 install wordcloud
then write & run some parsing code to generate the wordcloud object:
def load_frequencies(filename):
for line in open(filename):
word, count, freq = line.split()
if len(word) < 5:
continue
freq = float(freq)
yield word, freq
import wordcloud
wc = wordcloud.WordCloud(stopwords=wordcloud.STOPWORDS)
wc.generate_from_frequencies(load_frequencies('abyss.dat'))
and finally show the wordcloud object:
%matplotlib inline
from pylab import imshow, axis
imshow(wc)
axis('off')
When you run this all, you should see a word cloud!
Now add and commit this, & push to github. In the terminal window, do:
git add wordcloud.ipynb
git commit -m "wordcloud notebook"
git push origin master
Recap¶
You have done the following:
- encapsulated your analysis in a Makefile that generates results from raw data;
- put your analysis in a git repository, and posted it to github;
- written an analysis notebook to generate figures from the results of running the make command;
At this point, you have a fully articulated analysis workflow with both a script-based data reduction and a Jupyter Notebook graphing analysis.
Moreover, anyone who has access to your git repository (which is everybody, in this instance) can both get your workflow and (with the proper software) run it.
At this point we can take advantage of a service called ‘mybinder’ which will actually let you run your analysis and your notebooks for free on the Google cloud.
Testing it out with mybinder¶
Take the URL of your github repository and paste it into the top of mybinder.org; you don’t need to configure any dependencies. Now click ‘make my binder’. Once it builds, click on the black-and-red button ‘launch binder’.
This will spin up an execution container, running Jupyter notebook, that has your analysis repo in it, with everything copied from your github repo.
The ‘launch binder’ URL is something you can give to other people so they can run your analysis, too.
To read more on mybinder, see my blog post.
Running in a container environment¶
Learning objectives: learn about Docker containers, and how they can help deal with dependencies.
Note
To go through the following, you’ll need to have Docker installed. For most of it, you’ll also need to have an Amazon Web Services account.
(This is an abbreviated & focused version of a 2-day Docker workshop that I ran at UC BIDS; see the full notes)
Introduction: dependencies, and containers¶
If you think back to our original overview, we’ve got three of the four workflow components in place - we’ve got a workflow that takes our raw data and converts it into a summary, we’ve got an analysis notebook that generates results from that data, and we’ve put it in git.

But, in order to run it, we still need a lot of things installed. In this case, we don’t have that weird a stack of software: we need Python, make, Jupyter Notebook, and wordcloud - which depends on numpy and matplotlib. Not terribly.
But, hypothetically, we could have many dependencies - for example, bioinformatic workflows often involve dozens of dependencies, with things written in Java and Ruby and Perl and ...
Installing all of these dependencies on your own laptop may be difficult enough, but it becomes nearly impossible in shared environments like lab workstations and HPCs, where you generally don’t have admin/install privileges, and where you have to be concerned about conflicts between dependencies because it’s a shared environment.
This is where isolated environments come in. These allow you to install a collection of software in such away that it only interacts within that collection.
There’s a bunch of ways to do this, ranging from language-specific (virtualenv in Python) to whole-machine (virtual machines and cloud).
Docker containers are a new-ish way to do this. Docker basically provides a lightweight virtual machine that is quick and easy to start up, and the Docker ecosystem includes ways of specifying configurations and also shipping these fully configured “containers” around.
A super quick introduction to Docker¶
Before Getting Started¶
Can you run this?
docker run hello-world
and see something like this?
Hello from Docker.
...
If that works, you should do:
docker pull ubuntu:14.04
If that doesn’t work, you might need to run this to reset your local docker install:
docker-machine restart default && eval "$(docker-machine env default)"
Getting started¶
Try:
docker run ubuntu:14.04 /bin/echo 'Hello world'
Now try:
docker run -it ubuntu:14.04 /bin/bash
Q: What does the ‘-it’ do?
Points to cover:
- you can use CTRL-D or type ‘exit’ to leave the container
- “images” are the disk, “containers” are the running bit
Try creating a file in your container; first run:
docker run -it ubuntu:14.04 /bin/bash
and then inside the docker container run:
echo hello, there > /home/foo.txt
Now, exit the container. Make note of the container ID if you can - it’s the string right after ‘root@’ and before the ‘:’, and should be something like ‘003eafc0422c’.
First, if you run ‘docker run ubuntu:14.04 -it /bin/bash’ you will see that the file is not there:
cat /home/foo.txt
This is because every container starts fresh from the base image (here, ubuntu:14.04).
However, you can still access files from old containers (unless you specify ‘–rm’ when you run things). Here,
docker cp 003eafc0422c:/home/foo.txt .
will copy the file into your current directory.
So: data is transient, unless you make other provisions (we’ll talk about this later!)
You can get a list of running containers by doing:
docker ps
You can get a list of all containers (running and stopped) by doing:
docker ps -a
You can get a list of all images by doing:
docker images
You can remove a container with ‘docker rm’, and remove an image with ‘docker rmi’. The
Two handy commands to clean up (a) stopped containers and (b)
docker rm $(docker ps -a -q)
docker rmi $(docker images | grep "^<none>" | awk "{print $3}")
Using docker-machine to run Docker on AWS¶
Documentation: https://docs.docker.com/machine/; also see Amazon Web Services driver docs
Here, we’re going to use Amazon to host and run our Docker images, while controlling it from our local machine.

Start by logging into the AWS EC2 console.
Find your AWS credentials and your VPC ID.
your AWS credentials are here, and if you haven’t used them before you may need to “Create a New Access Key”. (Be sure not to store these in a place that other people can view them.)
Your AWS
to get your VPC ID, go into https://console.aws.amazon.com/vpc/home and select “Your VPCs”. Your VPC ID should look something like vpc-9efe1afa (that’s mine and won’t work for you ;)
Then, set your AWS_KEY and AWS_SECRET and VPC_ID; on Linux/Mac, fill in the values and execute:
export AWS_KEY=
export AWS_SECRET=
export VPC_ID=
...not sure what to do on Windows, maybe build the command below in a text editor?
Then, run:
docker-machine create --driver amazonec2 --amazonec2-access-key ${AWS_KEY} \
--amazonec2-secret-key ${AWS_SECRET} --amazonec2-vpc-id ${VPC_ID} \
--amazonec2-zone b --amazonec2-instance-type m3.xlarge \
aws
and to connect to it, do:
eval $(docker-machine env aws)
and now you can run all the ‘docker’ commands as you would expect, EXCEPT that your docker host is now running Somewhere Else.
If you have trouble getting a subnet, make sure that your VPC has subnets in the zone/region you’re trying to use. You can set these with:
--amazonec2-region "us-east-1" --amazonec2-zone b
Take a look at the help for the EC2 driver here:
docker-machine create --driver amazonec2 -h | less
Things to discuss:
- diagram out what we’re doing!
- docker-machine manages your docker host; docker manages your containers/images ON that host.
- talk about AWS host sizes/instance types: https://aws.amazon.com/ec2/instance-types/
- explain docker client, docker host, docker container relationship
- also include -p, -v discussion.
—
You can use ‘docker-machine stop aws’ and ‘docker-machine start aws’ to stop and start this machine; with AWS, you will need to do a ‘docker-machine regenerate-certs aws’ after starting it in order to connect to it with docker-machine env.
To kill the machine, do ‘docker-machine kill aws’. This will also, I believe, trash the configuration settings so you would need to reconfigure it with a ‘create’.
Note that while the machine is running or stopped, you should be able to see it at the AWS EC2 console.
Let’s talk more about why you would want to do this :).
Also, diagrams!
Building your own Docker image (by writing a Dockerfile)¶
On your local machine, create a new (empty) directory called ‘wordcloud-image’. In that directory, create a file ‘Dockerfile’ that contains:
FROM jupyter/notebook
RUN apt-get update && apt-get -y install python-matplotlib python-numpy \
unzip
and then execute:
docker build -t wordcloud-image .
Now run it:
docker run -it -p 9000:8888 wordcloud-image
and in the docker container you should be able to execute your entire workflow from within the notebook:
In the terminal,
- check out your source code;
- change into the directory;
- run make;
In the notebook,
- run the notebook analysis.
Some final points on Docker¶
Docker is a decent solution for “single chassis” compute, where you can run everything on one computer.
Docker:
- gives you a consistent environment - either static OR procedurally generated;
- lets you move compute to the data;
- it also gives you a way to run anonymous or encapsulated compute.
but:
- it’s not declarative (you can’t computationally analyze all Dockerfiles without just running them);
- you run the risk of just dumping everything into a binary blog that provides no insight or remixability.
You can read some more thoughts on Docker (and mybinder) here:
Return to index: Welcome to the 2016 #uioCarpentry Oslo workshop on Reproducibility
More advanced materials, if there’s time:
README - 2016-oslo-repeatability¶
See https://www.ub.uio.no/english/courses/other/data-carpentry/time-and-place/Reproducibility-data-carpentry2016-03-17.html
All of the materials in this repository are under CC0 unless otherwise specified.
Advanced Jupyter topics¶
Sharing notebooks via github¶
GitHub renders Jupyter Notebooks on their site – see this for example.
To try this for yourself,
- Download your notebook from your running Jupyter site.
- Go to github.com and create a new repository.
- Select the ‘README’ link at the top of the new repository.
- Enter something, click commit.
- Select “upload files”, and upload your downloaded notebook.
- If you click on the ipynb file in your repository you will now see your rendered notebook. This is something you can send to collaborators etc.
Distributing executable notebooks with github and mybinder¶
But wait! There’s more!
On public repositories, you can feed this github URL into mybinder.org and actually get it running – try it!
- Go to mybinder.org and enter the URL of your github repo, then click “make my binder”.
- wait a minute, then click “launch binder”.
- voila, you’re at your github repo - but executing it.
Wait, where the heck is this all running?¶
(Discuss the architecture of Jupyter Notebooks)
Other cool notebook ideas: really interactive blog posts¶
See Tim Head’s demo
Other topics¶
- comparison with RStudio, RMarkdown
- running on your laptop; running on AWS
(Everything we’ve done in the notebook can be done on your laptop – you just have to install things ;).
Indices and tables¶
LICENSE: This documentation and all textual/graphic site content is licensed under the Creative Commons - 0 License (CC0) -- fork @ github. Presentations (PPT/PDF) and PDFs are the property of their respective owners and are under the terms indicated within the presentation.