Why do we need version control?#

Because this….#

image

… leads to this#

image

Version management best practices#

Why is version management important?#

  • Possible to revert back to a working version if things broke.

  • Benefit team collaboration.

  • Improve efficiency.

How should we manage changes?#

Keeping track of changes:#

  • Back up (almost) everything created by a human as soon as it is created.

  • Keep changes small.

  • Share changes frequently.

  • Create, maintain and use a checklist for saving and sharing changes to the project.

  • Store each project in a folder that is mirrored off the researchers’ working machine.

This list comes from “Keeping track of changes” in swcarpentry’s paper good-enough practices in scientific computing.

Exercise 1: Manual versioning#

Versions can be managed either by hand or by using a Version Control System (VCS). To illustrate the workings of a VCS we start an excercise using manual versioning. The goals of this excercise are:

  • Practice with versioning best practices

  • Understand the limitations of manual version management

1A Setting up the project#

We have set up a shared folder on the JupyterHub used for this course that is accessible to all participants of the course.

  1. Go to the shared folder and create a folder named simple_trigonometry_YOURNAME where you replace YOURNAME with your name. This folder is your project folder.

  2. Add a file called CHANGELOG.txt to your project folder with timestamped changes to your project.

  3. Create a subfolder called current which is your latest version of your project.

1B Single user version tracking#

Whenever you make a significant change

  1. Copy the entire project (current folder) to a directory that is datetimestamped.

  2. Update CHANGELOG.txt with a timestamped note on the changes.

This will result in your project folder looking like this:

.
|-- project_name
|   -- current
|       -- ...project content as described earlier...
|   -- 171106_130000
|       -- ...content of 'current' on Nov 6, 2017 1pm 
|   -- 171108_110000
|       -- ...content of 'current' on Nov 8, 2017 11am 

And your CHANGELOG.txt to look something like this:

## 2016-04-08

* Switched to cubic interpolation as default.
* Moved question about family's TB history to end of questionnaire.

## 2016-04-06

* Added option for cubic interpolation.
* Removed question about staph exposure (can be inferred from blood test results).
  1. Create a new file called test.py

    • Add the text print('hello world')

1C Practice basic version control using trigonometry#

Add your changes every time you finish a bulletpoint.

  • Add a function to test.py to calculate the circumference of a circle. Add your changes.

  • Add a function to test.py to calculate the surface area of a circle. Add your changes.

  • Create a new file called script.py that is empty. Add your changes.

  • Add some print statement to script.py and execute it. Add your changes.

  • Import test.py into script.py and call the functions in test.py and print the output. Add your changes.

1D collaborating on a project, resolving conflicts#

Work with your partner#

  • Both agree on which of the two project folders you will continue to work. for the rest of this exercise both of you will work in a single project folder.

Creating and resolving a conflict#

  • Person A make a temporary copy of test.py called test_A.py, and person B make a temporary copy of test.py called test_B.py.

  • Person A and B each edit their temporary copies:

    • Person A adds a docstring to the function that calculates the circumference of the circle.

    • Person B adds a docstring to the function that calculates the surface of the circle.

  • Person A and B now collaborate to merge the files test_A.py and test_B.py, so as to incorporate both their individual changes, and save the result into the original file test.py.

  • Think about how you would do this if each of you were making more complex changes. What about if you were both editing the same lines?

More practice (optional)#

  • Both work on the same repository, use script.py to test your functionality.

  • Add a function that plots a circle

  • Add save to png functionality to the plot function

  • Make the plotting function more fancy (add units, labels etc)

  • Add surface calculation for other shapes (triangle, square, pentagon, hexagon … )

  • Add circumference calculation for same shapes.

N.B. Once frustration sets in for enough people we will move on to Git.

End of exercise 1#

Problems with manual version control#

  • It requires a lot of discipline

  • It is virtually impossible to resolve conflicts

What is Git ?#

Git is a distributed version control system (VCS) that automates everything#

git logo

references: git book

VCS store all historical versions of a file#

Git stores snapshots called commits

The differences between files in 2 commits are human readable if they are text files (e.g., .txt, .py, .tex)

image

Central authority#

For collaborating on code or papers it is useful to have a central authoritative version

  • Know what the latest version is

  • Know what other peopler are working on

  • Easier to maintain than emailing around versions

image

Git is a “distributed” VCS#

  • Every copy of the repository contains the complete history

  • You can keep working if the internet is down

  • You don’t lose your data (and history) if the server dies

image

A git “repository” is a folder which has files it keeps track of#

  • You choose which files to track

  • Looks like a normal folder but there is a hidden folder (.git) inside with the history

GitHub is a web-based Git repository hosting service#

image

  • Web hosting

  • Open issues / bug reports

  • Suggest changes to projects

  • Free-private repositories for academic users

  • Conventient tools

    • Diff viewer

    • Commit browser

Excercise 2: Basic single user git#

  • Setting up a new git repository using github + clone

  • A basic single user workflow involving: commiting, pulling and pushing your changes

2A setting up git settings#

Only need to do this once (per machine)!

  • Set up your git config

    • git config --global user.email "you@example.com"

    • git config --global user.name "Your Name"

    • You can check your config using git config --list. Use this to check if you are now pointing to your repository on GitHub

  • Add your SSH public key to your GitHub account to allow you to access your repository without entering a password every time

    • We have pre-generated an SSH key for you to use during this course. You will find the public part of the key in your home directory in the file id_rsa.pub (find instructions how to do that yourself here)

    • Click on the file id_rsa.pub in the file browser to open it

    • Copy the contents of this file to the clipboard with Ctrl-C

    • Go to settings/keys and click the New SSH key button

    • Paste the contents of id_rsa.pub into the text area labeled Key. You may enter anything you like into the title field (e.g. casimir course jupyterhub)

2B setting up a project#

  • Create new repository on github.

    • Go to GitHub and log in

    • Click create a new repository

    • Name it : “Casimir-programming”

    • Add readme.md

    • add .gitignore

    • add a license (e.g., MIT) (optional)

  • clone the repository into your home directory

    • Go to the page for your Casimir-programming repository on GitHub

    • Click the Clone or download button

    • Verify that the popup title is Clone with SSH. If it is Clone with HTTPS click on the Use SSH link in the corner of the popup

    • Copy the URL in the text field of the popup. It should look like git@github.com:...

    • Open a terminal and type git clone , then paste the URL that you copied from GitHub, then hit Enter.

    • This will create a copy of the entire repository in a new folder

2B My first commit#

  • Create a new file called test.py

    • Add the text print('hello world')

  • Commit and sync your changes

    • Type git status

    • Type git add test.py

    • Type git commit -m 'my first commit'

    • Type git pull

    • Type git push

  • View the commit history

    • Using the terminal

      • Type git log. See if you understand what you see.

      • Type git log -p -2. This shows the changes introduced by the last 2 commits

      • Take a look at the “Viewing the Commit History to see other useful options

    • Using GitHub

      • Click on Commits, open your latest commit.

        • Click browse files to browse your code at the time of the commit.

      • Go to Graphs/network, this shows you a line with all comits.

        • Very useful once we move on to multi-user workflows.

      • Open a file and check out history

        • this shows a list of all commits that changed that specific file.

      • Open a file and look at blame

2C Practice basic git using trigonometry#

Commit your changes every time you finish a bulletpoint. Use the flowchart shown below.

  • Add a function to test.py to calculate the circumference of a circle. Commit your changes.

  • Add a function to test.py to calculate the surface area of a circle. Commit your changes.

  • Create a new file called script.py that is empty. Commit your changes.

  • Add some print statement to script.py and execute it. Commit your changes.

  • Import test.py into script.py and call the functions in test.py and print the output. Commit your changes.

  • Take a look at your repository on GitHub to see an overview of your work

End of excercise 2#

Git-Flow chart#

There is a git flowchart (in pdf, pptx and png) in the day3/img folder.
**! N.B. Avoid using the GitHub App, it gets you into all kinds of trouble! **

image

Excercise 3: Multiple users, resolving conflicts#

  • Working with multiple users on a single branch

  • Resolving conflicts

Work with your partner#

Add your partner as a collaborator#

  • Go to the repository of person A on github.

  • Go to settings/collaborators. Enter the GitHub ID of person Band make them a collaborator (write access).

  • Person B clones the repository of person A (look at exercise 1a if you forgot)

Creating and resolving a conflict#

  • Both persons will add a docstring (with the triple quotes) to the function that calculates the surface of the circle

  • Person A commits pulls and pushes.

  • Person B commits and pulls, this will raise a conflict. Resolve this conflict.

  • Look at the GitHub network graph to see what happened.

More practice#

  • Both work on the same repository, use script.py to test your functionality.

  • Add a function that plots a circle

  • Add save to png functionality to the plot function

  • Make the plotting function more fancy (add units, labels etc)

  • Add surface calculation for other shapes (triangle, square, pentagon, hexagon … )

  • Add circumference calculation for same shapes.