Sunday, April 6, 2008

Git the Version Control System - part one

Git is a Version Control System, it is used to keep a history of all the changes you made into your code, manage releases, makes it easier for more than one programmer to share his/her code with the rest of the group.

There are tens of good VCS out there, the most famous was CVS Concurrent Version System, then came Subversion from the same team of CVS with the same concepts but cleaner than its ancestor. Both CVS and Subversion was free. There are other sophisticated VCS out there some of them are free and the other are proprietary. Linux was developed using a free license version of Mercurial which is pretty good CVS but for more some reason Linus Torvalds was challenged enough to start writing his own VCS, and Git was born.

There are two types of VCS, Central and Distributed. In central VCS like CVS and Subversion, there is always one central repository, you checkout a copy of whatever version you like from the repository, start working on it and then submit your changes into the central repository again.

The Distributed type like Git, there is one or more central repository, and you clone the repository into your local machine, and then you will have a standalone repository. Which is more like forking the development of the code at certain point in its development line into another repository. You then maintain your repository and change to your heart content, and it is up to you or the original (upstream) repository maintainer to merge your changes.

I will write a note on the basic operation of managing my code in GIT. For the sake of simplicity I will not write in this post about how to serve or access remote GIT repository using SSH, I will just assume that all repositories are on local machine.
$ mkdir myapp
$ cd myapp
$ git init
Initialized empty Git repository in .git/
I created a folder for my application, and inside the folder I make Git initialize a new repository.

At this point we should have a folder named .git inside the 'myapp' and that .git folder by itself is a repository, in this folder GIT will store all the transaction, changes, version, releases of your code. The beauty of GIT is that it does not pollute your code folder with other files like Subversion does. Subversion was creating an .svn folder inside each folder in my app folder.

As I said .git is the repository and the rest of your folder is now then called the working copy. the working copy is an important concept. It is your current version files, in Git if you switch to another version of your code, all the files in the folder will be changed to reflect the real version you switched to. We can create a repository without a working copy and that's what we do in case we need to make a central repository where programmer are pulling and pushing changes into it, in that case we do not need a working copy with the repository, this is called a 'bare' repository.

The .git repository folder is now empty, it has no files (or changes) in it. To start populating the repository we use the 'add' Git command. but we actually do not have any files yet, so let's make some files.
$ echo "hello" > dialog.txt
I created a file 'dialog.txt'. Now i will ask Git about the status, and it will answer me with the following information, which match files in three cases:
  • The modified files in my working copy that needs to be committed.
  • The modified files that are marked for committing.
  • The files that are not yet tracked by Git, and they need to be added first to the repository.
Now that we only have one file in the third case.
$ git status
# On branch master
#
# Initial commit
#
# Untracked files:
# (use "git add ..." to include in what will be committed)
#
# dialog.txt
Git suggest to use the add command to start tracking the file. What tracking means? Ok, in a working copy there is either tracked files that you want to keep track of, and untracked files which are either newly created and needs to be added to the repository or non source code files like log files, object and comoiled files and in this case we need to tell Git to ignore them and not notifying me about them when we use the 'status' command.

To be more specific the 'add' command does not add the files to the repository here or even convert the untracked files into tracked ones, the 'add' command is actually preparing the files to be committed or added to the repository in the next time you use the commit command. Actually 'add' is adding the files, tracked or not, to an area which is called the 'index', and after you add all the files you need and you feel satisfied that your all the changes you need to commit is in the stage, then you can use the commit command to commit your changes into the repository.
$ git add dialog.txt
Now that we added the dialog.txt it is now in the 'index' and will be commited by the next commit command.
$ git status
# On branch master
#
# Initial commit
#
# Changes to be committed:
# (use "git rm --cached ..." to unstage)
#
# new file: dialog.txt
#
Well, it is time of our first commit. Git is storing the name and email of the committer with each commit, so we need to identify ourselves first to Git.
git config --global user.email "ibrahim@mymail.com"
git config --global user.name "Ibrahim Ahmed"
$ git commit -m "saying hello"
I committed the dialog.txt and notice the '-m', commit always needs a log message, try to write a message describing the changes you've done to the file.
$ echo "what's your name" >> dialog.txt
$ git status
I added one more line to dialog.txt, and git status to see the changes, and Git should giving me that dialog.txt is modified and will be committed in the next comit.

One more time, and excuse my boring repetition, you will need to add first the dialog.txt to the index area, before it is ready to be commit.
$ git add dialog.txt
$ git commit -m "asking about your name"
Ignoring files
Git status will always nag about untracked files, so we should either 'add' it or ignore it.

There are always some files in your app that you do not need to track, files like logs for example.
$ touch myapp.log
$ git status
# On branch master
#
# Untracked files:
# (use "git add ..." to include in what will be committed)
#
# myapp.log
$ echo "*.log" >> .git/info/exclude
$ git status
# On branch master
nothing to commit (working directory clean)
Notice how git status listed myapp.log in untracked files, and when we add a line containing '*.log' to '.git/info/exclude', git status ignored the myapp.log file and gives the message 'working directory clean'

Remove files
'git rm' removes files from the working directory and from the repository. But it also used to remove files from the 'index' using the following;
$ touch delete.me
$ git add delete.me
$ git commit -m "adding delete.me"
$ git rm delete.me
$ echo "how old are you?" >> dialog.txt
$ git add dialog.txt
$ git rm --cached dialog.txt

That's it for the first part of my note about Git, on the second note I will write about diff, clone, pull, adding remote repository, merge, and conflict resolution.

2 comments:

Anatol Pomozov said...

Linux has been using BitKeeper not Mercurial (actually Mercurial was a fork of early Git AFAIK)

Jakub Narebski said...

Actually Git was started because Linus didn't find good, fast OSS replacement for proprietary DSCM BitKeeper (which was used to develop Linux kernel). From the start it was deleoped using Git.

Mercurial project was created at almost the same time Git development has started; Mercurial is not a fork of Git, as they use different designs (content addressed object database in Git, per filename revlogs + manifest + commitlog in Mercurial) and even different languages (C + shell scripts in Git, Python and later (?) core in C for Mercurial).

See also
* "A short history of revision control" chapter in "Distributed revision control with Mercurial" (hgbook)
* GitHistory page on Git Wiki (currently formatting is somewhat broken due to MoinMoin upgrade)
* List of KernelTrap articles on GitLinks page on Git Wiki.