Stand, walk, run

I hope to help you under_stand_ the use case for version control

(with git)

So that you can walk the walk when you work with other developers

(for example, the Google Summer of Code)

And I hope you won’t run away screaming before the end of this.

Crawling

Every developer starts "version control" with directories like:

assignment_1
|
|- broken_marc21.go
|- old_marc21.go
|- old_marc21.go.1
|- marc21.go
|- marc21.go.old
|- marc21.go.1
  • It’s nothing to be ashamed of. We just don’t know any better.
  • But we know that we need a solution!

Celebrating our differences (diff)

  • diff: the key to comparing files
  • diff -u: generates unified diff format, the choice of the discerning developer
$ diff -u Intro_git.txt Intro_git_2.txt
--- Introducing_git.txt 2013-01-17 14:47:56.401016950 -0500
+++ Introducing_git_2.txt       2013-01-17 14:36:09.824555910 -0500
@@ -36,3 +36,10 @@

 But we know that we need a solution!

+Celebrating our differences
+---------------------------
+* `diff`: the key to comparing files
+* `diff -u`: generates _unified diff_ format, the choice of the discerning
+  developer
+
+

Bringing our worlds together: (patch)

$ # Create a patch file by redirecting STDOUT
$ diff -u Intro_git.txt Intro_git_2.txt > Intro.patch
$
$ # Apply the patch to the target file
$ patch Intro_git.txt < Intro.patch
$
$ # More concisely, as the path/file name are in the patch:
$ patch < Intro.patch
$ diff -u Intro_git.txt Intro_git_2.txt
$
$ # No output from diff because the files are now the same

Collaboration

  • The goal of diff and patch is to give developers the ability to collaborate towards a greater good
  • But we still want to avoid…
assignment_1
|
|-\ casey
|  - marc21_1.go
|- emily_marc21.go
|- marc21_dan.go
|- marc21_dan.go.old
|- marc21.go
  • And we need to send these things around… email, sneakernet, SFTP, Dropbox, Google Drive, …

The sales pitch

  • What if I told you that there was a tool that could handle all of this, and more? [1]
  • How much do you think it would cost?
  • What if I told you it was free (as in freedom, and as in beer)?

Introducing git

Our collective goal for today is to get you to start using git as of now.

For everything you do. Because it is easy to start, and it will save you time and sanity.

Begin by creating a directory that you want to hold your work, and then initializing a new git repository:

$ mkdir intro_to_git
$ cd intro_to_git
$ git init .
Initialized empty Git repository in /home/dan/intro_to_git/.git/

Adding files to your repository

You need to add every change you want to track to git’s staging area

staging area: conceptual space where changes are collected in preparation for a commit

A new file is just another change. All git commands begin with git followed by whatever command you’re giving. So, to add a file:

$ gvim marc21.go # Add a comment and save the file
$ git add marc21.go

Checking the status of your repository

The git status command shows you at a glance:

  • Which branch you are on
  • Which commit of the branch you are on
  • Which changes are in your staging area
  • Lots of other status info…
$ git status
# On branch master
#
# Initial commit
#
# Changes to be committed:
#   (use "git rm --cached <file>..." to unstage)
#
#       new file:   marc21.go

Committing a change

When you want to record the state of your project at a given point in time, you commit the staging area to the repository history.

Commits record:

  • The timestamp of the commit
  • The author of the commit, and the committer of the commit
  • Description of the changes
    • Reason for the commit (short, 70 chars or less)
    • Description of the changes (longer, keep lines 72 chars or less)
  • A SHA-1 hash of the changes (unique identifier)

Commits are cheap. When in doubt, commit early, commit often.

The command to commit a change to git is, naturally, git commit:

$ git commit
# $EDITOR opens asking you to write your description: write and save

[master eb970c5] Your short description went here
 1 file changed, 121 insertions(+), 7 deletions(-)

Ideal workflow

Nobody makes mistakes, right? So ideally your workflow would look like:

$ git add marc21.go
$ git commit # edit, test, edit, test
$ git add marc21.go README
$ git commit # edit, test, edit, test
$ git add marc21.go tests/
$ git commit # edit, test, edit, test

This linear workflow creates a history that effectively looks like:

A - B - C

Checkpoint 1

You now know how to:

  1. Create a new repository: git init
  2. Add one or more changes to the staging area of a repository: git add
  3. Commit changes in the staging area to the repository as on atomic operation: git commit

Even if that’s all that you take away today, that’s a great start.

But wait, there’s more!

Mistakes happen

Let’s pretend you made a mistake and need to get back to a previous version of your work. Here’s where version control shines.

Check the log of your changes via git log:

$ git log --oneline
f4f00cd Update doc strings to match godoc conventions
ea70154 Add a test for Record.String()
3835856 Whitespace - run "go fmt"
388c710 Add a test for GetSubFields
492c66f Test the record.getFields() method
bd2d3ac Add a test for the MARC21XML transform
  1. Find the last working commit to which you want to roll back [2] via git log
  2. Show the differences between the working commit and the current state of your project using git diff
$ git diff 492c66f           # show all changes since commit 492c66f
$ git diff 492c66f -- README # show all changes since commit 492c66f
                             # just for the file named README
$ git diff 492c66f..         # show all changes since commit 492c66f
$ git diff 492c66f..HEAD     # show all changes since commit 492c66f
$ git diff 492c66f..388c710  # show all changes between 492c66f..388c710
$ git diff 492c66f..HEAD^    # show all changes between commit 492c66f
                             # and the second-last commit in the branch
$ git diff 492c66f..HEAD^^   # show all changes between commit 492c66f
                             # and the third-last commit in the branch
$ git show 492c66f           # show the change just for commit 492c66f

You could manually apply those changes, but that requires effort. Version control is for the lazy, and laziness is a virtue.

Instead, you can preserve everything, mistakes and all, in your working branch and begin a new branch starting at the last commit where everything was good:

  1. Return to the desired version of your project’s history via:
    git checkout <commit-hash>
  2. Create a new branch and move forward towards the marvellous future

This creates a history that looks like:

A - B - C (original branch)
     \- D (new branch)

Checking out a previous commit

When you checkout just the commit hash, you need to create a new branch to record any changes that you commit from that point on.

$ git checkout ea70154
Note: checking out 'ea70154'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b new_branch_name

HEAD is now at ea70154... Add a test for Record.String()
Tip
git checkout -b <new-branch> <commit-hash> combines the checkout and branch creation steps.

Checking out a previous commit to the same branch

Alternately, you can add another commit to your working branch that restores the state of your project at a given commit, while retaining all of the previous history:

  1. Return to the desired version of your project’s history via:
    git checkout <commit-hash> .
  2. Commit the changed files (which are automatically in the staging area)

This creates a linear history that looks like:

A - B - C - B1

Hitting the reset button

Or you can rewrite history entirely, pretending that all commits after the desired commit never existed, via git reset:

  1. Return to the desired version of your project’s history via:
    git reset <commit-hash>
Note
git reset --hard throws away all of the changes. The default soft option keeps the changes to the files on disk.
Caution
Rewriting history for a branch on which another branch was based causes misery.

git reset creates a linear history that looks like (with --hard):

A - B

Branching out

When you initialize a repository, you begin with the master branch.

A branch is just a collection of changes within a repository.

Branches typically follow the master branch for a period of time, then diverge to try out something experimental, or to support release management principles.

Examples:

  • "I’m going to try porting this bash script to Python"
  • "2.1 is a stable release, so we will only accept bug fixes. You can add features to the 2.2 branch though"

To create a new branch, use the git checkout -b command, passing in:

  • The name of the new branch
  • (Optionally) the branch or commit on which you want to base the new branch; the default is HEAD of whatever branch you are currently on
$ git checkout -b write_xml_files_right ea70154
Previous HEAD position was f4f00cd... Update doc strings to match godoc
Switched to a new branch 'write_xml_files_right'
Note
Yes, you use git checkout to switch to a different branch, and you use git checkout -b to create a new branch. I’m sorry about that.

Help

git ships with a ton of documentation:

  • git help may be the most useful command for beginners
  • Correction, git help <command> is the most useful command for beginners
  • Correction, git help <command> is the most useful command

Checkpoint 2

You now know how to:

  1. Create a new repository: git init
  2. Add one or more changes to the staging area of a repository: git add
  3. Commit changes in the staging area to the repository as on atomic operation: git commit
  4. Display commit history: git log
  5. Show differences between a range of commits: git diff
  6. Show the changes associated with a single commit: git show
  7. Switch to a different commit in your branch history: git checkout
  8. Create a new branch based on any point in another branch’s history: git checkout -b
  9. Rewrite the history of a branch: git reset

Getting picky: applying changes from other branches

Development teams try to keep the history of their master and release branches "clean".

Also, rewriting history is not an option!

Thus, developers create experimental / development branches where they can work, make mistakes, and rewrite history until they have a working solution to apply to the master branch.

Example: master with two dev branches
A - B - C         (master)
     \   \- E - G (dev_branch_2)
      \
       \- D - F   (dev_branch_1)

Merging an entire branch

To apply all of the commits in a development branch to master, you can use the git merge <branch-name> command:

$ git merge dev_branch_1
Updating 492c66f..481b58f
Fast-forward
 new_file.go | 5 +++++
 1 file changed, 5 insertions(+)
 create mode 100644 new_file.go

$ git log --oneline
f88a174 New stuff from dev 1
492c66f Test the record.getFields() method

Result: a clean merge where the commits were simply added to the end of the master branch’s history.

If the parent branch’s history has changed since you created the development branch, git tries to merge the changes and adds its own commit to record the merge:

$ git merge dev_branch_2
Merge made by the 'recursive' strategy.
 new_file_2.go | 3 +++
 1 file changed, 3 insertions(+)
 create mode 100644 new_file_2.go
$ git log --oneline
d278cca Merge branch 'dev_branch_2'
a7c76a5 New stuff for dev 2
f88a174 New stuff from dev 1
492c66f Test the record.getFields() method

If all goes well, you will encounter no merge conflicts.

If not, you will have to each file that contains conflicts to resolve the merge conflict before you can commit the merged changes.

Picking individual commits into your current branch

You may want to add specific commits into your branch instead of performing a complete merge:

  • To keep branch history clean (with no merge commits)
  • To avoid unnecessary commits from a given development branch

git cherry-pick enables you to add specific commits to your current branch. For example, to avoid a merge conflict with dev_branch_2 from the merge example, we could simply cherry-pick the desired commit:

Look ma, no merge commit!
$ git reset --hard f88a174
$ git cherry-pick a7c76a5
$ git log --oneline
a7c76a5 New stuff for dev 2
f88a174 New stuff from dev 1
492c66f Test the record.getFields() method

Checkpoint 3

In addition to everything else you’ve learned, you now know how to:

  • Merge entire branches into your current branch: git merge
  • Add specific commits from any other branch to your current branch: git cherry-pick
  • Deal with merge conflicts, if and when they arise

And you have a conceptual grasp of how multiple branches interact.

No developer is an island

So far all of our work has been on our own machine. What if we want to collaborate with other developers?

  • git format-patch <commit> generates one patch file per commit, intended to be sent via email, and applied with git am
    • This workflow is primarily associated with the Linux kernel.
  • Alternately, and much more commonly, you can push your branches to a remote, publicly visible repository and point other developers at it

Repositories that are not local to your machine are called remotes.

Side benefit: if your hard drive crashes, your work is still available from the remote repository!

Pushing your work to a remote repository

  1. Create the remote repository. All of the third-party code repos have fancy point-and-click UIs to create a new repository, although you will likely have to submit a public SSH key to the code repo for authentication purposes.
  2. Add the location of the remote to your local repository using the git remote command:
    $ # Add the new remote with the name "upstream"
    $ git remote add upstream git@gitorious.org:intro_to_git/intro_to_git.git
    $ # Show the list of remotes
    $ git remote -v
    upstream        git@gitorious.org:intro_to_git/intro_to_git.git (fetch)
    upstream        git@gitorious.org:intro_to_git/intro_to_git.git (push)
  3. Push the changes from your current branch (master) to the remote branch named master using the git push command:
    # git push <local-branch-name>:<remote-branch-name>
    $ git push upstream master:master

Cloning an existing repository to local storage

The git clone command creates a complete copy of a repository, including the entire history of all branches and commits.

Third-party code repos give you the complete command required to clone the remote git repository, such as:

$ git clone https://git.gitorious.org/intro_to_git/intro_to_git.git
Cloning into 'intro_to_git'...
remote: Counting objects: 15, done
remote: Finding sources: 100% (15/15)
remote: Compressing objects: 100% (10/10)
remote: Compressing objects: 100% (10/10)
Unpacking objects: 100% (15/15), done.

Referring to branches from remotes

A freshly cloned repository has a remote name of origin referring to the original repository.

Use the remote name to distinguish between local branches and remote branches.

For example, to create a new working branch called dev_branch_3 based on the master branch of the remote repository:

$ git checkout -b dev_branch_3 origin/master

Updating local copies of remote history

Over time, as other developers push branches and commits to the remotes you have configured for your local repository, the history in your local repository gets out of sync with the remotes.

  • git fetch <remote> updates the local copy of history for the named remote
  • git fetch --all updates the local copy of history for all remotes

Once you update the remotes, you can checkout your local master branch to see where local and the remote histories have diverged.

$ git checkout master
Already on 'master'
Your branch is behind 'origin/master' by 1 commit, and can be fast-forwarded.

To update a local branch to match the remote branch on which it is based, you can use the git pull command:

$ git pull
Updating 704f4f7..61c4bf9
Fast-forward
 Introducing_git.txt | 84 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 80 insertions(+), 4 deletions(-)
Note
If you have committed changes to the local branch while the remote branch has also changed, you may need to repair a merge conflict when you pull changes.
Tip
git pull automatically refreshes remote history before applying the changes locally.

The most common mistake with a cloned repository

  • You will begin excitedly working on a freshly cloned repository
  • You will realize that you’re committing changes to your local master branch, which is (almost) never what you want to do
  • To correct the situation, just use git checkout -b <new-branch-name> to create a new branch, and git reset --hard to restore the state of your local master branch.
$ git checkout -b dev_branch_3
Switched to a new branch 'dev_branch_3'
$ git checkout master
Switched to branch 'master'
Your branch is ahead of 'origin/master' by 1 commit.
$ git reset --hard HEAD^
HEAD is now at 704f4f7 Mergery and cherry-picking

Communicating your changes to the development team

A common approach for a developer who wants their changes to be merged to a master or release branch is to:

  1. Push their nice, clean branch with descriptive commit log entries to a public repository
  2. Open a bug, send an email (or in some hosted code repos, click a "request merge" or "request pull" button) to notify the development team of the branch, with a description of what the branch does

A "clean" branch is one in which each commit exists for a logical purpose, and no commit on its own breaks the existing tests or functionality of the software.

A commit that has the side effect of changing the actual output for a test should also change the expected output for the test so that it continues to pass.

Benefits of this workflow are that you have some level of code review built into the process.

Checkpoint 4

At this point, you know how to:

  • Clone remote repositories: git clone
  • Add a references to a remote repository to an existing local repository: git remote add
  • Update your local copy of a remote repository’s history: git fetch
  • Update a local branch with commits from a corresponding remote branch: git pull
  • Communicate a request to have your changes added to a project’s master or release branch

Advanced topics

Although you have enough knowledge of git now to be perfectly functional, over time you will want to learn more shortcuts and more powerful commands, such as:

  • Rewriting history selectively and in bulk: git rebase and git rebase --interactive
  • Staging subsets of changes: git add -p
  • Finding which commit introduced a bug: git bisect

License

Dan Scott <dscott@laurentian.ca>