Version Control

						diff cpio diff3 ar tar patch jar rsync
git (2005)	hg (2005)	bzr (2005)	svn (2000)	bk (1999)	p4 (1995)	cvs (1990)	rcs (1982)	sccs (1972)
add	add	add	add	new	add	add
blame	annotate	annotate/ann	blame/ann		annotate	annotate
branch	branch(es)	branch	copy/cp		branch(es)	tag -b
checkout	update/up	checkout/co	update/up		sync	update	co -l	get -e
clone	clone	branch	checkout/co	clone	sync	checkout/co
commit	commit/ci	commit/ci	commit/ci	commit	submit	commit/ci	ci	delta
config
diff	diff	diff/di	diff/di	diffs	diff	diff	rcsdiff	sccsdiff
fetch	incoming/in	missing
grep	grep
help	help	help	help		help
init	init	init		setup		init	ci	admin
log	log	log	log	log	filelog	log	rlog
merge	merge	merge	merge		integrate
mv	rename/mv	mv	move/mv	mv	move
pull	pull	pull		pull
push	push	push		push
rebase
remote
reset	revert		revert		revert
revert	backout	revert
	resolve	resolve	resolve		resolve
rm	remove/rm	remove/rm	delete/rm	rm	delete	remove
show	cat	cat	cat		print	checkout -p	co	get -p
stash		shelve
status	status/st	status/st	status/st	status	changes	status
tag	tag(s)	tag(s)	copy/cp	tag	tag	tag

Distributed Version Control

git usage	git description	hg usage	hg description
add PATH … add -e FILE add -i FILE … add -u PATH …	Add file contents to the index. If PATH is a directory it is added with all its contents recursively. Error if no arguments are provided. Add a portion of a change to the index by editing the diff. Add file contents to the index interactively. Only add file contents to the index which are already tracked. Newly created files will never be added to the index when the -u flag is used.	add [PATH] …	Put files under version control. If no argument is provided all files in the working directory are put under version control; equivalent to git add . Under hg a file must be added only once, before it is first committed. Under git a file must be added each time it is modified. hg add is used to notify Mercurial that a file is being tracked by the version control system. It is not possible to add part of a file change. git add, by contrast, adds the changes to a file, including partial changes, to a staging area called the index to be flushed out with the next commit.
none	how to perform equivalent of Mercurial addremove with Git: git add . git ls-files -deleted \| xargs git rm	addremove [PATH] …	Add or remove files depending upon whether they are in the working directory; if no PATHs are provided, all new files are added and all missing files are removed.
archive --format=tar TREEISH > NAME.tar	Create a tarball from TREEISH.	archive -t tar ../NAME.tar	archives root directory; git does not.
bisect see manual	Find by binary search the change that introduced a bug.	bisect
blame PATH [COMMIT]	Show the revision number, author and timestamp of the last commit which modified each line in FILE. COMMIT can be used to specify an older version of FILE.	annotate -cudln [-r REV] [PATH]	Mercurial by default only indicates the local revision number. The flags -c, -u, -d, -l, and -n add changeset, user, date, line number, and local revision number.
branch [-r\|-a] branch [--contains\|--merged] COMMIT branch NAME [COMMIT] branch --track NAME [BRANCH] branch -m BRANCH NAME branch (-d\|-D) BRANCH	List branches. If -r option is provided remote tracking branches are listed. If -a option is provided both local and remote tracking branches are listed. List branches that are descendents of COMMIT if --contains option is used. List branches that are ancestors of COMMIT if --merged is used. Create a branch named NAME using COMMIT as a starting point. If COMMIT is not specified then HEAD is the starting point. Create a branch NAME which tracks BRANCH. Usually BRANCH is a remote tracking branch. This configures the repository so that when `git pull` is executed on NAME a merge equivalent to `git merge BRANCH` is automatically performed. If BRANCH is not specified the current branch is tracked. Rename the branch BRANCH to NAME. Delete branch BRANCH. Use -D to delete a branch with commits which have not been merged.	branches branch branch BRANCH	List branches. Show the current branch. A close git equivalent is git branch \| grep '' Create a branch named BRANCH which will be created from the working directory with the next commit. Mercurial does not provide a mechanism for renaming or deleting branches. The recommended way to get rid of unwanted branches is to rename the repository and then clone it to the original name with* hg clone -r REV
bundle see manual	Move objects and refs by archive.
cat-file (commit\|tree\|blob) HASH cat-file -t HASH	Display content of repository object HASH. Get the type of repository object HASH. The type can be 'blob', 'tree', or 'commit'.	none	Mercurial does not assign identifiers to files and directories, so no equivalent of git cat-file is possible. The following are equivalent, however: git cat-file commit HASH hg log -r REV
checkout [-f] BRANCH checkout TREEISH PATH … checkout -p PATH … checkout -b NAME [COMMIT]	Checkout the branch named BRANCH. BRANCH becomes the current branch. Changes in the index are carried over but if there are changes to tracked files that are not in the index the checkout fails. If the -f option is specified changes in the index and to tracked files will be discarded. Copy the files or directories `PATH …` from TREEISH to the working directory. The current branch is not changed. Copy files or directories `PATH …` from the index. The current branch is not changed. Create a branch named NAME using COMMIT as the starting point. If COMMIT is not specified the HEAD of the current branch is used. NAME becomes the current branch.	update [-c\|-C] (BRANCH\|-r REV) revert [-a] [-C] [-r REV] PATH … none branch BRANCH	Checkout BRANCH or REV. If there are changes in the working directory they are applied to the new working directory; the -C option discards changes in the working directory and the -c option prevents an update when there are changes. Revert PATHs to how they are according to the parent of the working directory or REV if specified. If this makes the files different from how they are in the parent of the working directory then the file will have a modified status. Backup copies of the files will be saved with .orig suffixes unless the -C option is used. If no PATHs are provided and the -a option is used, the entire working tree will be reverted. Mercurial has no index and thus no equivalent to git checkout -p. Start a new BRANCH using the current working directory which will be created with the next commit.
cherry-pick COMMIT …	Apply the changes introduced by some COMMITs to current branch. Although it is possible to specify multiple commits, it is better to use git rebase --onto if the commits are a chain because rebasing provides mechanisms (continue, skip, abort) for dealing with conflicts.	export import
clean -n clean -f	Show what files would be removed if run with `-f` option. Remove untracked files from the working tree.	none
clone [-b BRANCH] URL [DIR] clone [-o NAME] URL [DIR] clone [-c SECTION.KEY=VAL] URL [DIR] clone (--bare\|--mirror) URL [DIR]	Clone a repository. If BRANCH is provided, then it will be the current branch in the new repository. If DIR is provided it will be the name of the directory containing the repository. If NAME is provided it is used as the name of the origin instead of the default 'origin'. If any KEY=VAL pairs are provided they are written in the .git/config file of the new repository. If --bare is provided as an option a bare repository will be created. In a bare repository there is no working directory and the contents of the top directory are what would have been in the .git directory had the --bare flag not been used.	clone [-r REV\|-b BRANCH] … URL [DIR] none none clone -U URL [DIR]	Clone the repository at URL. Only changesets in the history of REV or BRANCH are copied over to the new repository. If DIR is provided it will be the name of the directory containing the new repository. The name default which is assigned to URL can be changed by editing .hg/hgrc. Configuration settings are changed by editing ~/.hgrc. Clone the repository at URL. The clone will have no working directory files, only a .hg subdirectory.
commit [-m STR] commit -a [-m STR] commit --amend commit --amend --author=STR	Record changes to the repository. STR is the commit message. Commit all changes to tracked files. Merge index with head commit. Change author of most recent commit.	commit [-m STR] commit -A [-m STR] none none	With both `git` and `hg` the files to be committed can be specified on the command line. If no files are specified `hg commit` will commit all modified files that are currently tracked in the working directory. Newly created files that have not be added with `hg add` will not be committed. `git commit` without arguments by contrast will only commit the files that have been specified with `git add`. `git commit -a` behaves like the `hg commit`, however.
config -l [--global] config -e [--global] config --get [--global] SECTION.KEY config [--global] SECTION.KEY VAL config --unset [--global] SECTION.KEY config --remove-section SECTION	List configuration settings. Open configuration settings file in an editor. Lookup configuration setting KEY in section SECTION. Add configuration setting KEY in section SECTION with value VAL. Remove configuration setting KEY in section SECTION. Remove all keys in SECTION. Writes modify .git/config unless --global is specified, in which case ~/.gitconfig is edited. Reads look at both files unless --global is specified, in which case they only look at ~/.gitconfig.	none	Configuration settings are changed by editing ~/.hgrc
		copy
describe	Show the most recent tag that is reachable from a commit.
diff [PATH …] diff --cached [COMMIT] [PATH …] diff COMMIT1 COMMIT2 [PATH …] diff COMMIT [PATH …]	Produce a diff between the working directory and the index. If PATHs are provided only diffs for those files are produced. Produce a diff between the index and COMMIT. If COMMIT is not specified it defaults to HEAD. Produce a diff between COMMIT1 and COMMIT2. Produce a diff between the working directory and COMMIT.	diff	`hg diff` shows the difference between tracked files in the working directory and the last commit.
fetch [-f] REPO [[+]REF1:REF2] fetch [-f] fetch [-f] --all fetch [-f] --multiple REPO …	Fetch objects and refs from REPO. If REF1 and REF2 are not supplied, then all tracking branches are fetched if REPO is a remote; HEAD is fetched if REPO is a URL. FETCH_HEAD is set to point to the local copy of the remote HEAD. The branch to fetch can be specified with REF1 and the destination in the local repository with REF2. The -f option will force a fetch if the destination exists and the update isn't a fast forward. Fetch objects and refs from origin. Fetch objects and refs from all remotes. Fetch objects and refs from multiple REPOs.	none	Mercurial does not have remote tracking branches; hence no equivalent to git fetch.
gc see manual	Remove unnecessary files and optimize the local repository.
grep [-i] [-v] [-E\|F\|P] STR grep [-h\|H] [-l\|L] [-n] STR grep -e STR (--and\|--or) -e STR grep STR (--cached\|TREEISH)	Print lines matching a pattern.	grep
hash-object PATH hash-object -w PATH	Compute the object ID for a file. Add a blob to the object database.	none
		heads
help help CMD	List most common commands and shared options. Show help for git command CMD.	help help CMD
none		incoming	Shows the changesets that are available to be pulled.
init [DIR] init --bare [DIR]	Create an empty git repository or reinitialize an existing one. If DIR is not specified the current directory is used. Create a bare empty git repository or reinitialize an existing one. In a bare repository there is no working tree and the files normally in `.git` are in the top directory. If DIR is not specified the current directory is used.	init [DIR] none
		locate
log [-N] [PATH …] log [-N] --branches [PATH …]	Show commit log for current branch. If N is provided limit output to last N commits. If PATHs are provided, limit output to commits that affected one or more of them. Show commit log for all branches.	log [-l N] -b BRANCH [PATH …] log [-l N] [PATH …]	Show commit log for BRANCH. Use 'tip' for the current branch. If N is provided limit output to last N commits. If PATHs are provided limit output to commits that affected one or more of them. Show commit log for all branches.
ls-files [PATH] … ls-files --stage [PATH] … ls-files --delete [PATH] …	List files under version control. This is the files which have had "git add" run on them and have not subsequently had "git rm" run on them. If PATH is not specified, all files are listed. Otherwise only files in PATH are listed. With the --stage option the command includes the mode bits, object ID, and stage number of the files. List files under version control which aren't in the working directory.	manifest [-r REV] none status -d
ls-tree TREEISH ls-tree -r[t] TREEISH	List the contents of a tree. List the contents of a tree and all its subtrees recursively. Use the -t option to include subtrees and their object IDs in the output.
merge COMMIT … merge --abort merge --squash	Merge one or more commits into the current branch. Restore the working directory to the state it had before a merge was attempted. This might not be possible if there were uncommitted changes in the working directory. Modify index and working directory with results of merge but don't commit.	merge [[-r] REV] update --clean
mv OLDPATH NEWPATH mv FILE … DIR	Move or rename a file, a directory, or a symlink. Move one or more files into a directory.	rename OLD NEW rename FILE … DIR
notes see manual	Add or inspect object notes.
none		outgoing	Show the changesets that have not been pushed. Synonym: out
		parents
pull [-f] REPO [[+]REF1:REF2] pull [-f]	Short for git fetch [-f] REPO [[+]REF1:REF2] git merge FETCH_HEAD Short for git fetch [-f] git merge FETCH_HEAD	pull -u	When there are local changes which are not reflected in the remote repository, then `hg pull -u` creates a new local branch which matches the remote repository and switches to it.
push [-f] push [-f] REPO [BRANCH] … push [-f] --all REPO push [-f] REPO [+]REF1:REF2 push --delete REPO BRANCH …	If the current branch is a tracking branch for a remote branch, then push to the repository for the remote branch. Otherwise the command does nothing. If the `-f` option is used conflicts will be overwritten in favor of the local repository. Push to REPO. If one or more BRANCHES are specified, all necessary objects are copied to the remote repository and the remote refs are updated. If no BRANCHES are specified, the branches that were set using 'remote set-branch' are used. Push all local branches to REPO. If any local branches do not have remote branches and remote tracking branches they are added. Push local branch REF1 to remote branch REF2 on REPO. If necessary a remote tracking branch is created. Delete the specified remote BRANCHES and their remote tracking branches.	push
rebase BRANCH rebase --onto BRANCH COMMIT1 COMMIT2 rebase --abort rebase --continue rebase --skip rebase -i COMMIT	Rebase the current branch onto BRANCH. All commits on the current branch going back to the latest common ancestor are applied to BRANCH; the head of BRANCH remains the same and the head of the current branch points to the new branch. Apply all commits after but not including COMMIT1 and up to and including COMMIT2 to BRANCH. If successful the repository will have a detached HEAD, meaning that HEAD points at a commit and not a named branch. Use git branch NAME to assign a branch name to HEAD and then git checkout NAME to switch to the new branch. Abort the results of a rebase that had conflicts. Continue with a rebase that had conflicts which have been resolved. Skip commit that caused conflicts and continue with rebase. Perform an interactive rebase on current branch using all commits after but not including COMMIT.	none
reflog see manual	Show the history of changes to refs and HEAD. This will contain branch commits as well as the creation and switching of branches.
remote remote add [-t BRANCH] … NAME URL remote add [-m BRANCH] NAME URL remote rm REMOTE remote rename REMOTE NAME remote show REMOTE remote set-head REMOTE (-a\|-d) BRANCH remote set-url --add REMOTE URL remote set-url --delete REMOTE URL remote set-branches REMOTE [--add] \ BRANCH …	List the remotes. Add a remote NAME at url URL. The -t option can be used repeatedly to track specific branches. Otherwise all branches are tracked. Add a remote NAME at url URL. The -m option can be used to set the head. The head can also be set with the set-head subcommand. Remove REMOTE. Rename REMOTE to NAME. Get information about REMOTE. Set the head for the remote to BRANCH. Having a remote head permits the remote name to be used in places a branch name would normally be used. Add a URL to REMOTE. This can be used to push to multiple repositories simultaneously. Delete a URL from REMOTE. Set branches for REMOTE. If the --add option is used, the branches are added to the existing branches. Otherwise the new branches replace the existing branches. These are the branches that will get pushed or pulled when no branches are explicitly specified.	none	Names can be assigned to repository urls in the [paths] section of the .hg/hgrc file. When a repository is cloned the source url is given the name default.
reset [--mixed] [COMMIT] reset --hard [COMMIT] reset --soft COMMIT	Reset index to COMMIT and move branch head to COMMIT. The working directory is not changed. If COMMIT is not specified, then HEAD is used and the branch head is not moved. Reset index and working directory to COMMIT. If COMMIT is not specified, then HEAD is used and the branch head is not moved. Move the branch head to COMMIT. Neither the index nor the working directory are modified.	none revert	Mercurial does not have an equivalent to the Git index. `revert` modifies the working directory.
		resolve FILE … resolve -a resolve -l resolve -m FILE … resolve -u FILE …	List all unresolved
revert [-n] COMMIT … revert [-n] COMMIT1..COMMIT2	Create one or more commits which reverse the effects of the COMMITs. If the `-n` the reversing changes are not commited but merely applied to the index and working directory. Create one or more commits which reverse the effects of COMMIT1 up to but not including COMMIT2.	backout -r REV
rev-list COMMIT rev-list COMMIT1 ^COMMIT2	Show commits which are ancestors of COMMIT in reverse chronological order. Show commits which are ancestors of COMMIT1 and not ancestors of COMMIT2 in reverse chronological order.
rm [-f] FILE … rm -r DIR … rm --cached FILE …	Remove files from the working tree and from the index. The `-f` option can be used to remove the files even if they have changes staged in the index. Remove directories from the working tree and from the index. Remove files from the index only.	remove
shortlog [COMMIT1..COMMIT2]	Summarize the commit history in a one-line-per-commit format. If a commit range is provided, it will include commits after COMMIT1 and up to and including COMMIT2.
show COMMIT:FILE	Show blob.	cat -r REV FILE
show-ref	List all references.	none
stash [save [STR]] stash show [STASH] stash pop [STASH] stash list stash drop [STASH] stash clear	Stash the changes in a dirty working dir. If STR is provided it is used as an identifier. Show specified or latest stash. Recover specified or latest stash. List stashes. Delete specified or latest stash. Delete all stashes.
status [PATH …]	Show paths in the working tree that differ from the index, paths in the index which differ from HEAD, and paths in the working directory which are not in the index or HEAD. Reports on all files unless PATHs are provided.	status
submodule see manual	Initialize, update or inspect submodules.	none
tag tag NAME [COMMIT] tag -d TAG	List tags. Create a tag. If COMMIT is not specified, HEAD is used. Delete a tag.	tags tag [-r REV] NAME tag --remove NAME
		tip
_______________________________________________	______________________________________________	______________________________________________	______________________________________________

hg metasyntactic variables

BRANCH	the name of a branch.
CMD	the name of a version control command: the first argument of the base command.
DIR	a directory on the file system. In some cases it must exist; in others it will be created.
FILE	a regular file on the file system. In some cases it must exist; in others it will be created.
NAME	a name for an entity which will be created. Usually there are restrictions on the characters that can be used.
PATH	a path on the file system. In some cases it must exist; in others it will be created.
REV	the revision number for a changeset. It can be either the local revision number, which is a small decimal integer, or the 12 hex digit universal revision number.
SOURCE	A URL or a name for a URL in the [paths] section of the .hg/hgrc file
STR	a string. There are no restrictions on the characters that can be used, but if they include whitespace or characters special to the shell they must be escaped or quoted.
URL	a url for a repository.

git metasyntactic variables

BRANCH	the name of a branch.
CMD	the name of a version control command: the first argument of the base command.
COMMIT	the HASH for a commit. A commit can be referenced indirectly via a branch or tag name or via commit notation. The symbolic references HEAD or FETCH_HEAD can also be used to reference commits.
DIR	a directory on the file system. In some cases it must exist; in others it will be created.
FILE	a regular file on the file system. In some cases it must exist; in others it will be created.
HASH	a 40 digit hex string used as an identifier for something in the object database.
HEAD	the literal string HEAD.
NAME	a name for an entity which will be created. Usually there are restrictions on the characters that can be used.
PATH	a path on the file system. In some cases it must exist; in others it will be created.
REF	HEAD or refs/heads/BRANCH
REMOTE	the name of a remote.
REPO	A REMOTE or a URL.
STASH	stash identifier format: stash@{0}, stash@{1}, …
STR	a string. There are no restrictions on the characters that can be used, but if they include whitespace or characters special to the shell they must be escaped or quoted.
TREEISH	the HASH for a tree, a commit, or a tag. If the HASH is for a commit or a tag the tree in the commit is used.
URL	a url for a repository.

sccs (1972)

CSSC Documentation CSSC is the GNU implementation of SCCS
The Source Code Control System Rochkind 1975

In his 1975 paper Rochkind describes SCCS as a "radical departure from conventional methods for controlling source code". SCCS was initially implemented in 1972 on the IBM 370. The implementation language was SNOBOL. Rochkind was an employee of Bell Laboratories and SCCS was soon ported to Unix where it became a cornerstone of the "Programmer's Workbench", a suite of software distributed with early Unix.

The radical departure of SCCS appears to be the decision to store every version of each file under source control. This is done in a space efficient manner by means of deltas: the original file is stored with a delta for each change. To get the most recent version of the file all of the deltas must be applied to the original file. Also stored with each delta is the name of the user who made the change, the date and time of the change, and a user supplied comment explaining the change.

SCCS introduces a file format so that the original file, the deltas, and the meta-information can all be stored in a single history file. If the original file was foo.c, a common early convention was for the history file to be named s.foo.c. In the original Unix implementation the SCCS commands were stand alone Unix commands. Starting with the version of SCCS which Allman wrote for BSD Unix in 1980 the SCCS commands became arguments or subcommands to a sccs executable.

Here is an sample SCCS session. The file foo.txt is put under source control. It is then checked out, edited, and the change committed. Finally a non-editable copy of the most recent version is checked out.

$ echo "foo" > foo.txt
$ sccs admin -ifoo.txt s.foo.txt
$ rm foo.txt
$ sccs get -e s.foo.txt
$ vi foo.txt
$ sccs delta s.foo.txt
$ sccs get -p s.foo.txt > foo.txt

The SCCS history file format consists of fields separated by the Ctrl-A (ASCII 1) characters. The fields are divided into headers, which contain the meta-information, and the body, which contains the original file and the deltas. The original file is given revision number 1, and the number is incremented with each change.

The body consists of the original file interspersed with nested insert blocks and delete blocks. The format for an insert block is

^AI REV
added line one
added line two
...
^AE REV

where REV is the revision number for which the lines were added. Similarly the format for a delete block is

^AD REV
deleted line one
deleted line two
....
^AE REV

When extracting a version of the file, the desired version is compared with each block. Insert blocks are ignored if they have a higher number than the desired version and delete blocks are ignored if they have a lower or equal number than the desired version.

diff (1974)

An Algorithm for Differential File Comparison Hunt & McIlroy 1976
man diff

To implement an efficient version control system it is desirable to find a minimal delta or difference between two similar text files. The problem led to the development of the Unix diff utility. Regarding a file as a sequence of lines, the problem can be treated as an example of the longest common subsequence problem. The standard solution to this problem has O(nm) performance in both time and space, where n and m are the line lengths of the two files. To facilitate quick comparison of lines, each line is replaced with a hash code. When implementing diff McIlroy developed an algorithm that was more efficient than the standard solution in most cases.

The standard diff notation prefixes lines with < and > to indicate whether the line originated in the first or second file. It also uses the letters a, c, and d to indicate lines being added, changed, or deleted:

$ echo "foo" > foo.txt

$ echo "bar" > bar.txt

$ diff foo.txt bar.txt 
1c1
< foo
---
> bar

$ diff foo.txt /dev/null
1d0
< foo

$ diff /dev/null foo.txt 
0a1
> foo

These letters used in diff notation are also ed commands. In fact, diff -e will output an ed script which can be used to convert the first file into the second:

$ diff -e foo.txt bar.txt > diff.ed

$ ( cat diff.ed ; echo "w" ) | ed foo.txt

The version of diff released with BSD 2.8 in 1981 added the -c option to show the context of lines and an -r option to perform a recursive diff on directories.

cpio (1977)

man cpio

diff3 (1979)

man diff3

diff3 displays the differences between three versions of the same file.

The three way diff is the foundation of branch merging. A two way diff is insufficient for merging because deleting a line in one branch looks like adding a line in the other branch. Only by comparing both branches with the original can these two cases be distinguished.

diff3 has three basic invocations:

diff3 MYFILE OLDFILE YOURFILE
diff3 -e MYFILE OLDFILE YOURFILE
diff3 -m MYFILE OLDFILE YOURFILE

The first invocation writes a description of the three-way diff to standard out. The second invocation writes an ed script to standard out which will merge the changes in YOURFILE to MYFILE. The third invocation writes a version of MYFILE which the changes of YOURFILE merged in to standard out.

Here is an example of the output format used by the first invocation:

$ cat /tmp/orig.txt 
a
b
c
d
e

$ cat /tmp/edit1.txt 
a
b1
c
d
e
f

$ cat /tmp/edit2.txt 
a
b
c
d1
e

$ diff3 /tmp/edit1.txt /tmp/orig.txt /tmp/edit2.txt
====1
1:2c
  b1
2:2c
3:2c
  b
====3
1:4c
2:4c
  d
3:4c
  d1
====1
1:6c
  f
2:5a
3:5a

Each hunk of the diff3 output starts with four hyphens. All of the hunks in the example above are two-way hunks, meaning that two of the three files are the same. In this case the number of the differing file as it appears in the diff3 arguments is placed after the hyphens.

Here is an example of a three-way hunk, where all three files differ and no number is placed after the hyphens:

$ cat /tmp/orig.txt 
a

$ cat /tmp/edit1.txt                               
a1

$ cat /tmp/edit2.txt 
a2

$ diff3 /tmp/edit1.txt /tmp/orig.txt /tmp/edit2.txt
====
1:1c
  a1
2:1c
  a
3:1c
  a2

ar (1979)

man ar

tar (1979)

man tar

How to create a tar file; list the contents of a tar file; compare a tar file with the file system; and extract the contents of a tar file:

tar [-]cf NAME.tar DIR
tar [-]tf TARFILE
tar [-]df TARFILE [DIR]
tar [-]xf TARFILE

The -v option can be used with -c or -x to list the files being added or extracted.

Tar files store the files in sequential order. Each file is precede by a 512 byte header. The file itself is null byte padded to a multiple of 512 bytes.

Tar can write to and read from stdout. The following two invocations behave identically:

tar cf - . | (cd DIR ; tar xf -)
tar cf . - | tar xf - -C DIR

Tar can append data to an existing tar file. These commands append the contents of a directory to a tar file; append the contents of the directory which are newer than what is already on a tarfile; append subsequent tar files to the first tar file:

tar [-]rf TARFILE DIR
tar [-]uf TARFILE DIR
tar [-]Af TARFILE1 TARFILE2 ...

How to create a compressed tar file:

tar [-]czf NAME.tar.gz
tar [-]cjf NAME.tar.bz2
tar [-]cJf NAME.tar.xz

In 1988 POSIX extended the format of the header block in a backwardly compatible way. Additional header type flags were added in 2001.

header format
offset	length	original format	ustar
0	100	file name
100	8	file mode
108	8	owner user id
116	8	group id
124	12	file size in bytes
136	12	last modification time
148	8	header checksum
156	1	type flag
157	100	name of linked file
257	6		"ustar"
263	2		"00"
265	32		owner user name
297	32		group name
329	8		device major number
337	8		device minor number
345	155		filename prefix

header type flags
flag	original meaning	ustar	2001
'\0'	normal file
'0'	normal file
'1'	hard line
'2'	symlink
'3'		character device
'4'		block device
'5'		directory
'6'		FIFO
'7'		contiguous file
'g'			global extended header
'x'			extended header for the next file

rcs (1982)

RCS--A System for Version Control Tichy 1985
man rcsfile The RCS history file format

RCS works in a similar manner to SCCS. There is a history file which is indicated with a ,v suffix. Thus, the history file for foo.txt would be foo.txt,v. The RCS commands take the original file as an argument instead of the history file like in SCCS. RCS supports multiline commit messages and it adds the rlog command for getting all the commit messages for a file. RCS has always been freely available software, a factor which has promoted its use over SCCS.

Here is a sample work session using RCS. It is equivalent to the SCCS work session in the previous section.

$ echo "foo" > foo.txt
$ ci foo.txt
$ co -l foo.txt
$ vi foo.txt
$ ci foo.txt
$ co foo.txt

Examining an RCS history reveals some improvements in the implementation over SCCS. First of all, ampersands (@) are used instead of Ctrl-A to demarcate sections of the file. Ampersands in the data are escaped by doubling them. This makes the history files more pleasant to inspect at the command line.

Another change is that the current version of the file is stored in its entirety. Older revisions are obtained by applying a chain of reverse diffs. The advantage of this design is that it is optimized for the common case of fetching the current version.

Here is an example of adding two lines after line 6:

@a6 2
added line one
added line two
@

Here is an example of deleting two lines after line 6:

@d6 2
@

patch (1985)

man patch

The patch command can apply the output of diff to a file. The diff output is read from standard input:

$ echo "foo" > foo.txt
$ echo "bar" > bar.txt 
$ diff foo.txt bar.txt > foo.patch
$ patch foo.txt < foo.patch 
patching file foo.txt
$ cat foo.txt 
bar

The above is only a slight improvement over what could have been achieved with diff -e and ed. The novelty of patch is its ability to apply a patch file to an entire directory:

$ mkdir foo
$ echo "bar" > foo/bar.txt
$ echo "baz" > foo/baz.txt
$ cp -R foo foo2
$ echo "qux" > foo2/bar.txt
$ diff -cr foo foo2 > foo.patch
$ patch -p0 < foo.patch
patching file foo/bar.txt
$ cat foo/bar.txt 
qux

When creating the patch file with diff the -c flag must be used so that filenames are included in the diff. Around 1990 a -u option was added to diff which is preferred over -c because it has a more condensed format.

The -p flag is a required if no file or directory is provided to patch as an argument. Invoking patch -p1 to strip off the top level of the pathnames (to be precise, everything up to and including the first slash) in the patch file. This is useful if the diff was created outside the directory being modified, but patch is being run inside the directory being modified.

cvs (1990)

CVS was the first popular revision control system with a client-server architecture. The client would have a local copy of a recent version of the source code and only the server would have the complete version history. This made CVS somewhat cleaner to work with than RCS or SCCS which keep history files on the filesystem for the client to see. It also permitted developers to collaborate without logging in to the same machine. The CVS client-server protocol communicated over rsh and later over ssh. The well known port number for a CVS server is 2401.

CVS also enabled a user to commit several files together. Multiple file commits are sometimes necessary to keep the source code "consistent" after each commit. The definition of consistency varies from project to project, but C developers want the source code to compile without errors, for example. Although CVS permits a user to submit changes to several files with a single command, the file system operation performed by the server is not actually atomic.

Setting up a CVS server is a bit of a bother and I'm not aware of any free CVS hosting services. As a result, it is difficult these days to experiment with CVS even though the client is still installed by default on Mac OS X. There are GNU projects which still use CVS. One can register at savannah.gnu.org and upload a public SSH key to participate in a project. One can perform an anonymous checkout of source like this:

cvs -z3 -d:pserver:anonymous@cvs.savannah.gnu.org:/sources/emacs co emacs

TeamWare (1994)

A tool developed at Sun for version control. It was built on top of SCCS. It supported merging and atomic commits of multiple files. It provided distributed version control with the help of NFS.

p4 (1995)

Perforce Command Reference

Perforce has a client server model. It supports atomic commits. It provides the ability to create and, unlike CVS, merge branches.

Perforce licenses are several hundred dollars per user.

jar (1995)

man jar

jar supports some of the tar commands:

jar cf NAME.jar DIR
jar tf JARFILE
jar xf JARFILE
jar uf JARFILE DIR

jar can write to and read from stdout; the syntax is different from tar:

jar c . | (cd DIR ; jar x)
jar c . | jar x -C DIR

Use jar -e to make a jar file runnable by java. The argument to -e is a class with a main routine which will be used as the entry point.

$ mkdir

$ cat > foo/A.java
package foo;

public class A {
    public static void main(String[] args) {
        System.out.println("A");
    }
}

$ sed s/A/B/ foo/A.java > foo/B.java

$ javac foo/*.java

$ jar cef foo.A foo.jar foo

$ java -jar foo.jar        
A

A jar file is a zip file; unzip can also be used to extract the contents. jar stores extra information about the jar file in META-INF/MANIFEST.MF:

$ unzip foo.jar

$ cat META-INF/MANIFEST.MF 
Manifest-Version: 1.0
Created-By: 1.6.0_26 (Sun Microsystems Inc.)
Main-Class: foo.A

rsync (1996)

man rsync

bk (1999)

Larry McVoy worked on smoosh and TeamWare at Sun. He founded Bitmover to develop and sell Bitkeeper.

Linux kernel development moved onto Bitkeeper in 2002, thanks to a free license granted to the kernel developers. The license was revoked in 2005. This was the crisis which motivated the development of Git.

svn (2000)

Version Control with Subversion (multiple HTML pages)
Version Control with Subversion (single HTML page)
Free SVN Hosting Services
Setting Up an Ubuntu Subversion Server

bzr (2005)

To get a list of common commands; to get help on a specific command:

bzr help
bzr help commit

To make a commit it is necessary to register a name and an email address:

bzr whoami "Joe Foo <joe@foo.com>"

hg (2005)

How to get Mercurial documentation from the command line:

$ man hg
$ hg help
$ hg help clone

identifiers and notation

In Mercurial, every commit is assigned two identifiers: a local revision number and a universal changeset identifier. The local revision number is a small integer that is unique only to the local repository. The first local revision number issued is zero, and it increments up from there with each local commit. The changeset identifier is a twelve digit hex number which is unique across all repositories.

The -r option is used to pass a mercurial commit identifier to a command. The argument can be a local revision number, a changeset identifier, or both separated by a colon.

Here are URL formats:

local/filesystem/path[#revision]
file://local/filesystem/path[#revision]
http://[user[:pass]@]host[:port]/[path][#revision]
https://[user[:pass]@]host[:port]/[path][#revision]
ssh://[user@]host[:port]/[path][#revision]

.hg

name	file type	description
00changelog.i
dirstate
last-message.txt
requires
store	directory
undo.branch
undo.desc
undo.dirstate

~/.hgrc

This file has an INI format:

[ui]
username = Your Name Comes Here <you@yourdomain.example.com>

.hgignore

Unlike .gitignore, an .hgignore file must be in the root of the working directory.

The format is one Perl regular expression per line. All files which match the regular expression will be ignored.

Comments start with the pound sign: #

It is also possible to use glob syntax:

# regexp to ignore twiddle files:
~$

# glob to ignore compiled python files:
syntax: glob
*.pyc

# additional patterns will use regexp format:
syntax: regexp

hg-git

hg-git

hg-git is a Mercurial plug-in. With it, Mercurial can push and pull from a Git repository. Once the plug-in is installed, the following must appear in ~/.hgrc:

[extensions]
hgext.bookmarks =
hgext.git =

One can then code a Git repo with

hg clone REPO.git

git (2005)

man git contains a list of git commands
gittutorial
gittutorial-2
gitcore-tutorial
gitglossary
gitrevisions
gitrepository-layout
Git User's Manual
github free git hosting

documentation and configuration

Configuration File (from man git-config)

Git installs a man page for each top level command:

$ man git
$ man git-clone

If the man pages weren't installed use the top level help command to get documentation:

$ git help
$ git help clone

To configure Git run these commands:

git config --global user.name "Joe Foo"
git config --global user.email joe@foo.com

plumbing

cat-file | hash-object | ls-tree | rev-list | show-ref

The Git object database is kept in .git/objects. Git has three types of objects: blobs, trees, and commits. Each version of a file which is under version control becomes a blob and each version of a directory becomes a tree. Each object is uniquely identified by a 40 digit hex string called an object ID. Also called an object name or an object id.

The porcelain commands only expose the object IDs of commits. The object ID of each commit is shown by git log for example. One can use git cat-file -t to find out what type of object an object ID refers to:

git cat-file -t OBJECTID

If the object is a commit, one can get information about the commit with:

git cat-file commit OBJECTID

Included in the information are the parent commits, if any. If the commit was an initial commit it will have no parents. If it was the product of a merge it will have two or more parents. If it was the product of an octopus merge it will have three or more parents.

Aslo included in the commit information is the tree representing the files in the top directory of the repository when the commit was made. To see the contents of the tree use

git ls-tree OBJECTID

git ls-tree can be used on any of the subtrees as well. To see the contents of a blob in a tree, use

git cat-file blob OBJECTID

An easy way to get the object IDs of all the blobs which are under version control and current is to run the following:

git ls-files --stage

If a file has not been added, it is possible to get the object ID that would be assigned to it if it were with

git hash-object PATH

Furthermore, git hash-object -w can be used to put the object in the object database.

plumbing commands for making a tree

plumbing commands for making a commit

how to display and manipulate refs

plumbing for the index

identifiers and notation

hashes

Git has three types of objects: commits, trees, and blobs. Every Git object is assigned a unique hash ID which is a 40 digit hex string. It is called the hash, SHA1, object name, or object identifier with no difference in meaning. When the underlying object is a commit or tree it is also called a tree-ish.

Commit hashes are the hashes the user most commonly sees and needs to reference. Only as many of the digits that are necessary to uniquely identify an object in the object database need to be provided to a git command; usually the first 6 or 7 is sufficient.

refs

A ref is a user provided name which has been assigned to a commit. Branches and tags are both refs. The difference is that a tag always refers to the same commit, whereas a branch is updated to the most recent commit each time a commit is made to the branch.

HEAD is a file in .git containing the ref of the current branch. When a repository is created it starts out with a branch called 'master'.

commit notation

Git uses 40 character hash IDs to refer to commits. HEAD is a special name which refers to the most recent commit of the current branch. The previous commit is HEAD^ and the commit before that is HEAD^^. The is also numerical notation: HEAD~4 is 4 commits ahead of HEAD. If HEAD is the result of a merge, then the antecedents can be be referenced with HEAD^1 and HEAD^2.

git url notation

protocol	format
ssh	ssh://[user@]host.xz[:port]/path/to/repo.git/ [user@]host.xz:path/to/repo.git/
git	git://host.xz[:port]/path/to/repo.git/
http	http[s]://host.xz[:port]/path/to/repo.git/
ftp	ftp[s]://host.xz[:port]/path/to/repo.git/
rsync	rsync://host.xz/path/to/repo.git/
local	/path/to/repo.git/ file:///path/to/repo.git/

branching

When a branch is created a file is created in .git/refs/heads with the name of the branch. The file contains the COMMIT that was provided on the command line or the commit of HEAD if no commit was explictly provided.

When a commit is made on a branch the parent of the commit is the commit in the branch ref. The branch ref is then updated to contain the new commit.

One switches branches by changing .git/HEAD to contain the name of the new branch.

fetching, pushing, pulling

Fetching, pulling, and pushing is controlled by the remotes that have been set up for a repository. When a repository is created with the clone command a remote with the name origin is created unless the -o option is used to specify a different remote name. Remotes can be added, removed, and modified with the remote command.

merging

A merge adds changes from one or more branches into the current branch. If the merge is successful and committed, the new commit will have two or more parents. A merge commit with more that two parents is called an octopus merge.

The most common case is when one branch is merged into the current branch, yielding a merge commit with two parents.

To perform a merge Git gets the tree contained in the common ancestor and puts its items into the staging area with staging number 1. It puts the current branch tree items in the staging area with staging number 2. It puts the tree items of the other branches in the staging area with staging number 3 or higher.

Suppose that bar is a branch of foo. If commits have subsequently been made to foo but not to bar, then running the following when bar is the current branch will perform a fast-forward:

git merge foo

In a fast-forward no merge commit is created. Instead the head of bar is simply moved to point to the same commit as the head of foo.

rebasing

All commits on BRANCH going back to the most recent shared ancestor will be applied successively to current branch. The process will stop at the first commit which encounters a conflict, which must be resolved in the same manner as a merge conflict.

When a conflict is encountered, rebase --abort can be used to return the repository and working directory to the state before the rebase was starting. rebase --skip can be used to continue with the commit without applying the change from the commit that caused the conflict. If the conflict is resolved by editing the conflicted files in the working directory and running add on them (i.e. resolving the conflicts in the same manner one would resolve merge conflicts) then rebase --continue can be used to continue with the rebase.

.git

This file exists in the root directory of a repository. It is created by git init and git clone. Its presence is the definition of when a directory is a git repository.

If git init is run on an empty directory, the following are created inside .git:

name	file type	description
HEAD	file
config	INI file	per repo configuration info
description	text file	a file which can be manually edited describing the repo
hooks	directory	scripts (usually shell scripts) with certain names will be executed when certain events happen
index	directory
info	directory
logs	directory	records changes to HEAD and refs
objects	directory	the object database
refs	directory
refs/heads	directory	branch refs
refs/remotes	directory
refs/stash	directory
refs/tags	directory	tag refs

~/.gitconfig

This file can be created in a user's home directory. It has INI format and can be used to specify the name and email address for commits:

[user]
        name = Your Name Comes Here
        email = you@yourdomain.example.com

.gitignore

man gitignore

A list of file patterns, one per line. The patterns specify files that git status and git add should ignore. Shell glob syntax (i.e. the asterisk: *) can be used.

A .gitignore can be placed in any directory in the repository. The rules in a given .giitignore file will only apply to the current directory and the directories beneath it.

Lines starting with a pound sign: # are ignored.

A pattern starting with an exclamation point: ! will negate a pattern. This can be used to include files that were excluded by a pattern higher in the file matching a broader set of files.

tig

$ tig [BRANCH]
$ tig --all

tig displays the commit history of branches. Each commit is represented by an o or an I, the I being used for a commit with no parents. There is one commit per row with lines connecting each commit with its parents.

With no arguments tig shows the current branch. A branch can be specified as an argument or the --all option can be provided to show all branches.