This is the short description of the program that is running.
PROGRAMNAME what sequence(s) ? ge:someseq
Begin (* 1 *) ?
Select one of:
A) First option
Please choose one (* A *): B (don't accept defaults without
What should I call the output file (* someseq.pgmnm *) ?
Note that the arguments can occur before or after any
switches; an argument is actually the answer to the programmes
default switch "-INfile=". If the arguments are
not present on the command line, then the programme will prompt for them. If
switches are not present on the command line, the programme will use default
values and will NOT prompt for them.
To see what switches are available and optionally to set them, run the programme
with the switch "-CHEck". You may abbreviate a switch by
entering only the uppercase part of the switchname; the rest is optional.
This is the short description of the program that is running.
Press <rtn> for more:
Syntax: % programname [-INfile=]GenEMBL:Humhb*
Required Parameters: None
Local Data Files: None
Optional Parameters:
-OUTfile=FileName copy file(s)-sequence(s) into one file
Add what to the command line ? -pro
PROGRAMNAME what sequence(s) ?
etc.
One point to note about arguments for E/GCG programmes: arguments
that are database entries [actually from E/GCG data libraries]
may be given in upper- &/or lower-case because E/GCG itself
is "case-insensitive". E/GCG programmes are run
under the UNIX environment, though, and UNIX is a
"case-sensitive" operating system. Therefore, if an
argument is a UNIX file with one or more upper-case letters, it must be typed
with its upper-case letter(s).
Exercise 1: map a sequence
Exercise 2: edit a resource file; re-map a sequence
Exercise 3: configure the graphics display; plot a sequence map
with mapplotThe generic E/GCG programme
As you saw with the GCG programmes fetch, translate,
reformat and fromstaden, most E/GCG programmes are called
like UNIX commands. You type the programme name, a flag or two (optional), and
an argument or two (sometimes optional) at the UNIX prompt, press
<RETURN>, and follow directions. Most E/GCG
"commands" expect one or more arguments specifying the names of
files or database entries to act on. And many E/GCG commands accept
flags (called "switches") to modify their behaviour.
prompt> programname argument1 argument2 -switch1 -switch2
It is usually two lines long and fairly terse.
End (* 516 *) ?
Reverse (* No *) ?
B) Second option
knowing what you are accepting)
prompt> programname -che
It is usually two lines long and fairly terse.
-DOCLines=6 copies only the first 6 lines of documentation.
-NOMONitor suppresses the screen monitor
-PROtein input sequence is protein
Input sequence specification: answering the -INfile switch
With the exception of the sequence exchange programmes and a few others,
E/GCG programmes only recognise E/GCG format sequence files or entries in
databases that have been converted to E/GCG data
libraries.
Output sequence file specification: answering the -OUTfile switch
E/GCG programmes usually suggest a default name for their output file. It is
best to select a name that has an extension reminding you of the programme that
created the file, and this is what E/GCG attempts with the default suggestion.
For example, DNAsequence23.fra could be the filename of the result
from passing a nucleotide sequence through frames. Often, the output
file of one programme is the input file for another; accepting E/GCG's default
file extension for output files can save typing in subsequent steps.
Mapping sequence with map
map is a versatile program that finds restriction enzyme sites
in a sequence. As with most E/GCG programmes,
it accepts sequence data as its default input, and can be run with zero
to many switches. These switches can modify the behaviour of map
in useful ways, and we'll explore some of these modifications with the
sequences fetched in the
Sequences Databases Exercise 2.
In addition to the files or data library entries you specify, map accesses a file describing a vast number of commercially available restriction enzymes to determine what sites it can seek. This extra input file is normally read in from a central, hidden part of the system. We will fetch this file, too, and modify it to reflect our enzyme freezer stock, budget, and available vector sites.
prompt> map Map displays both strands of a DNA sequence with restriction sites shown above the sequence and possible protein translations shown below. (Linear) MAP of what sequence ? hsfau.ge_pr Begin (* 1 *) ? End (* 518 *) ? Select the enzymes: Type nothing or "*" to get all enzymes. Type "?" for help on which enzymes are available and how to select them. Enzyme(* * *): What protein translations do you want: a) frame 1 b) frame 2 c) frame 3 d) frame 4 e) frame 5 f) frame 6 t)hree forward frames s)ix frames o)pen frames only n)o protein translation q)uit Please select (capitalize for 3-letter) (* t *): What should I call the output file (* hsfau.map *) ? prompt> more hsfau.map (Linear) MAP of: hsfau check: 2981 from: 1 to: 518 LOCUS HSFAU 518 bp RNA PRI 23-SEP-1993 DEFINITION H.sapiens fau mRNA. ACCESSION X65923 KEYWORDS fau gene. SOURCE human. ORGANISM Homo sapiens . . . With 209 enzymes: * October 26, 1995 15:21 .. S MH B C AN a B CB P TbiM B AcT Av vlMAu s vs l aonn c ceh li aawc9 m io e qIfl c ifa uJ IIoi6 F RF I IIII I III II IVIII I II / / / / TTCCTCTTTCTCGACTCCATCTTCGCGGTAGCTGGGACCGCCGTTCAGTCGCCAATATGC 1 ---------+---------+---------+---------+---------+---------+ 60 AAGGAGAAAGAGCTGAGGTAGAAGCGCCATCGACCCTGGCGGCAAGTCAGCGGTTATACG a F L F L D S I F A V A G T A V Q S P I C - b S S F S T P S S R * L G P P F S R Q Y A - c P L S R L H L R G S W D R R S V A N M Q - [several pages deleted] Enzymes that do cut: AceIII AciI AflII AluI ApaI AscI AvaII BanII BbsI BbvI BccI BcefI BmgI BpmI Bpu1102I BsaJI BsaXI BscGI BsiEI BsiHKAI BslI BsmFI BsoFI Bsp1286I BsrI BsrDI BsrFI BssHII BstEII Bsu36I Cac8I CviJI CviRI DdeI DpnI DrdII EaeI EciI EcoO109I EcoRII FauI FokI GdiII HaeI HaeII HaeIII HhaI Hin4I HincII HinfI HphI MaeII MaeIII MboII MnlI MscI MseI MspI MwoI NciI NlaIII NlaIV NspI PleI Psp1406I RsaI Sau96I Sau3AI ScrFI SfaNI SphI TaqI TauI ThaI TseI Tsp45I Tsp509I TspRI Tth111II UbaCI Enzymes that do not cut: AatII AccI AflIII AhdI AlwI AlwNI ApaBI ApaLI ApoI AvaI AvrII BaeI BamHI BanI Bce83I BcgI BcgI BclI BfaI BfiI BglI BglII BplI Bpu10I BsaI BsaAI BsaBI BsaHI BsaWI BsbI BseRI BsgI BsmI BsmAI BsmBI Bsp24I Bsp24I BspEI BspGI BspLU11I BspMI BsrBI BsrGI BssSI Bst1107I BstXI BstYI CjeI CjeI CjePI CjePI ClaI DraI DraIII DrdI DsaI EagI EarI Eco47III Eco57I EcoNI EcoRI EcoRV FseI FspI HgaI HgiEII HindIII HpaI KpnI MluI MmeI MslI MspA1I MunI NarI NcoI NdeI NgoAIV NheI NotI NruI NsiI NspV PacI Pfl1108I PflMI PinAI PmeI PmlI PshAI Psp5II PstI PvuI PvuII RcaI RleAI RsrII SacI SacII SalI SanDI SapI ScaI SexAI SfcI SfiI SgfI SgrAI SmaI SnaBI SpeI SrfI Sse8387I Sse8647I SspI StuI StyI SunI SwaI TaqII TaqII TfiI Tth111I VspI XbaI XcmI XhoI XmnI prompt>
prompt> fetch data:enzyme.dat
prompt> map hsfau.ge_pr -dat=enzyme.dat -out=hsfau2.map
prompt> more hsfau2.map
prompt> map -che
prompt> map hsfau.ge_pr -dat=enzyme.dat -out=hsfau3.map
-minc=2 -maxc=3
prompt> more hsfau3.map
prompt> setplot +---------------------> displaying all of 10 option(s) <---------------------+ |psf postscript - sent to file: homedir:graf.ps | |epsf eps postscript - sent to file: homedir:graf.eps | |hpg hp laser with hpgl - sent to file: homedir:graf.hp | |xcol x windows colour graphics - for x-windows terminal | |xmon x windows monochr. graphics - for x-windows terminal | |vt340 vt340 graphics - for a vt340 terminal | |vt241 vt241 graphics - for a vt241 terminal | |tek versaterm tektronix 4105 graphics on your terminal | |dec declaser 5100 postscript/pcl/hpgl printer at biobase | |qms qms colorscript210 ps printer at biobase (14 kr./pg) | | | | | +------------------------------------------------------------------------------+ enter a command. choices are: <up-arrow> and <down-arrow> scroll the list <return> makes GCG use the selected device Q quits without doing anything C creates and edits a new device (you can't delete from the site file) V views the selection (use C to edit a copy)
prompt> mapplot hsfau.ge_pr -dat=enzyme.dat -minc=2 -maxc=3
This final output might show possibilities for sub-cloning most of hsfau with only one enzyme. Can you sub-clone a fragment that is only coding sequence? Which open reading frame(s) is (are) used? Where is this information shown in the orginal sequence file? (Hint!) Are "hser2.ge_pr" or "hsht.ge_pr" better or worse prospects for sub-cloning with your reduced enzyme list?
prompt> eextractpeptide hsfau3.map -out=hsfau3.pep
prompt> more hsfau3.pep
Given that we know the coding regions for these three example sequences, let's translate them properly into proteins. For quick reference, the coding regions of these three sequences follow:
data library entry | filename | coding sequence |
---|---|---|
ge:hsef2 | hsef2.ge_pr | 1 .. 2577 |
ge:hsfau | hsfau.ge_pr | 57 .. 458 |
ge:hsht | hsht.ge_pr | 128 .. 1420 |
prompt> translate hsef2.ge_pr TRANSLATE translates nucleotide sequences into peptide sequences. Begin (* 1 *) ? End (* 3075 *) ? 2577 Reverse (* No *) ? Range begins ATGGT and ends TGTAG. Is this correct (* Yes *) ? That is done, now would you like to: A) Add another exon from this sequence B) Add another exon from a new sequence C) Translate and then add more genes from this sequence D) Translate and then add more genes from a new sequence W) Translate assembly and write everything into a file Please choose one (* W *): What should I call the output file (* hsef2.pep *) ?