|
A note about documentation: As these documents progress, a number of
commands will appear in bold face. In every instance the reader should
be able to competently use the provided command or make use of the online
documentation in order to become competent. As the quantity of Biological information continues to increase at rates greater than Moore's Law, it becomes increasingly important for a modern Biologist to make use of the most efficient tools available. The Unix family of operating systems grew out of a single individual's desire to make use of an otherwise deprecated computer; as a result these systems have a legacy of simplicity and speed. In the 1960's ATT and Bell labs were looking to make a system to run on their mainframes. They began a system known as Multics; after a very short time Bell labs pulled out and the project died. One of the original developers, Ken Thompson then needed work and so he "found a small computer (a Digital Equipment Corp. PDP-7) on which he began developing space related programs (satellite orbit predictors, lunar calendars, space war games, etc.)." [*] Shortly thereafter he rewrote the system so that multiple people could work simultaneously. Brian Kernighan saw this and suggested the name Unics, which was quickly shortened to UNIX. The system found sudden adoption in Bell Labs until the programming language C was written for UNIX (and later used to re-write UNIX). The system was simple enough and small enough that programmers all over the country ported UNIX to other architectures and added new features. Over time, the UNIX trademark passed from company to company until now there is no system which can properly be called 'UNIX,' but each implementation is a flavor of UNIX. To learn more about the history of the UNIX family of operating systems, please explore: http://www.levenez.com/unix/
The shell is the fundamental method of interacting with a Unix like operating
system.
Neal Stephenson,
wrote an interesting exposition about why this is the case entitled:
In the beginning was the
command line.
This is the shell. It provides the user a simple and consistent interface to most commands on the system. Running a shell command takes the form "COMMAND ARGUMENTS INPUT." There are a few caveats: case matters; the arguments generally take the form "--switch option" or "-s option"; and most commands have a simple built in help accessible via "--help or -h" which enumerates the available options. The most important command for a new user is the manual or man(1)
There are many different types of shells, including but not limited to: sh, bash, ash, csh, tcsh, ksh, zsh, psh; each shell has its own syntax, variable system, and oddities but they all share at the very least three extremely important variables: STDIN, STDOUT, and STDERR. These variables hold the shell's current input, output, and error. For example:
As a challenge: find which three single letter alphabetical arguments are _not_ valid switches for ls. There are two types of variables when working with a Unix shell: instance variables and environment variables. By convention environment variables consist of all capital letters while instance variables do not. Environment variables are passed from the process initiating the variable down to every following subprocess. Shell variables on the other hand exist only during the lifetime of the initiating process.
If you are using the [t]c-shell, the required commands to do the same thing are: set and setenv respectively. On the computer I am using, the command prompt looks like this: (19:38:08)trey@sedition:~/docs/>and is defined by the environment variable PS1, look up in bash(1) how to change this variable to something more interesting that just '$.' Unix like operating systems have a hierarchal filesystem with a single root called '/' (unlike other systems). Every accessible disk, floppy, pseudo-fs(to be explained later), etc is grafted or 'mount(8)ed' upon this tree. There are few rules regarding the layout of files in this tree, but many customs. Here are a few: etc is for configuration information, bin for binaries, sbin for super user binaries, var for variable information (logs spools, etc), lib for libraries, tmp for temporary information, and share for information shared among programs (things like images and icons or data files). Every segment of the filesystem has its own permissions, viewable via the 'ls' command. Use the manual to find the correct switches to ls to see the permissions of the files in your home directory. Follow along for a short tour of a unix system: (Note: Users of Apple OSX will find that the names of the assorted directories are changed to their non abbreviated cognates: Library instead of lib, Applications instead of bin, etc...)
Some of the most important commands for dealing with the filesystem include: chmod(1) allows a user to change these permissions for any file owned by him/her. Perform the following:
At this point in the document it is assumed that the reader is making great use of the man(1)ual. The following discussion will surround the following commands: awk(1): a programming language and tool, find(1) used find files, grep: the [G]lobal [R]egular [E]xpression [P]attern matching tool, head(1)/tail(1): print the beginning/end of a file, sort(1) which sorts its input, uniq(1): prints unique input, and xargs(1): which performs the same action on multiple inputs. An extremely important attribute of Unix-like systems is the ability to 'pipe' the output from one program as the input to another; for example:
Two ways to perform a killall(1) functionality:
Why does the previous example return an error while still working properly?
(hint, try each piece of the command one at a time: ps aux,
At the beginning of this discussion, Unix was called 'multitasking.' The operating system is capable of switching quickly from one task to another. When working from within the shell one accesses this functionality through 'job control.'
Another aspect of job control deals with the problem that when one logs off from a computer, one's processes are by default killed by the computer in what is called the 'hangup.' nohup(1) is a command which tells the computer to keep a process alive even if it receives a hangup signal. The example provided uses paup, a program with which we will become extremely familiar. Its purpose is to examine a multiple sequence alignment and find the phylogenetic tree which best fits this alignment. Running paup may take many months to examine a single dataset. Therefore it is good to use nohup with paup to make sure the job does not die prematurely.
The problem with nohup is that it requires that you think of it before running your job. If you use the bash shell, this is not a problem:
It turns out that the bourne shell is a self contained and complete programming language with an Algol-like syntax. Below is an example of a shell script which I use to start up the windowing system on my computer:
There are some interesting elements to this script. Line 1 shows one of many ways to start a shell to interpret a script. env(1) allows one to run any given program in its own environment, searching through the system's path to find it. Line 3 makes use of the '.' command, which executes an input file (analagous to the [t]csh source command) and sets up some other variables for the rest of the script. Lines 8-19 show the peculiar bash if; then; else; fi syntax. Line 20 illustrates a for; do; done. For loops are especially important and tricky, the given commands are done for every space delineated entry of the given list. You may also note that variables can be either evaluated as $VARIABLE or ${VARIABLE}; this allows one to do something like: echo ${VARIABLE}Now when echo $VARIABLENow would fail. exec(1) is important to this script, as are shell redirections (as shown above) and grep(1). tee(1) is a generally less used command, it duplicates STDOUT so that one may evaluate a command and save its output for later. `` is an especially interesting convention, it allows one to execute a shell within the running shell and save its output as a variable. Here is an unordered list of potentially useful links:
Use mkdir(1) to create a directory in which to keep some amino acid sequences, then click here to download them to your computer. Use mv(1) to get them into your sequences directory. Examine the permissions and sizes of the files and change their permissions so that only you may read or write to the files. Use one of the default Unix editors: emacs(1), vi(1), pico(1) in order to add the current date and time to the sequence files previously downloaded. Use a pipe or two, cal(1) and grep(1) to find the day of the week of your day of birth for every month of the year your were born, perhaps use cut(1) to print out only the column of the calendar of your birthday. (Do this _after_ reading through the Blast exercises.) Write a short shell script which allows some cursory examination of the sequences you downloaded earlier. Some ideas include using a script to con cat(1)enate the files into a single file so that clustalx may make an alignment of them, use blastcl3 in order perform individual local alignments of the files, and use wc(1) to examine their relative length. (I suggest lines 20-22 of the above script for ideas) Check this output from a script an anonymous student wrote: PS UUUUUUAGUU......11-......222222.......222222..11.CUGAAUAAGA 0,p:5,s:79,b:11 PS UUUUUUAGUUU.......-......11.1....1.11...UUACGGGUAC 0,p:1,s:85,b:5 PS UUUUUUAGU111111...-......111111.........UUACGGGUAC 0,p:4,s:82,b:8 PS UUUUUUAGU.......11-......222222.......222222..11.CUGAAUAAGA 0,p:5,s:79,b:11 PS UUUUUUAGUU........-......11.1....1.11...UUACGGGUAC 0,p:1,s:85,b:5 PF AGCGCUUUUUUUAGUUUUUACAAC-AAAAGAGUGAGAGAUGACGUUUUACGGGUACUGAAUAAGAUCCCG YCR032W=BPH1 NT :| | .::: : :: ::|:: ::::|:| .:| :|:||:.::: :: .:::: |:: SO UCAAGUGAAACAAGAAUUAUGUUCAUUUUCUCCUUCGGAACUGCAGAAUAAAAAUUCUUUAUACUAUAAA YBR182CThere are approximately 300 similar stanzas in the output file, with different lines which start with 'PF' and 'SO'. These lines end with the name of a gene in the yeast genome which may be of significance. How might one quickly get a list of all of these potentially interesting genes?
|
Created: Wed Sep 15 00:58:22 EDT 2004 by Charles F. Delwiche
Last modified: Mon Nov 8 15:49:44 EST 2004
by Ashton Trey Belew.