Gribskov:Programming
From Purdue Genomics Database Facility
Contents |
UNIX Information
- Unix_how to
- pico editor
Projects
- Summer 2008 perl programming
- Histogram Project A warmup project for OO perl and programming tools
- GFF Project GFF related project
Server Transition
All services are moving from sapporo to gweb. please keep a record here of what you are doing so we don't conflict. to see where a service is currently running type
nslookup <domain_name> e.g. %nslookup plantsp.genomics.purdue.edu
this will tell which machine the server is currently running on.
- Note that /cbio/server points at completely different directors on sapporo and gweb. Make sure you are changing the right one.
- gweb is running apache 2 - sapporo apache 1.3, this may cause some differences
- on gweb config files are in /opt/csw/apache2/conf/extra
currently running on sapporo 'All need to be moved'
- Bugs after switching production database to gweb version
- PlantsP and PlantsUBQ
- Program died: 'DataBase::connect_secure - Cannot connect (plants3:gweb.genomics.purdue.edu:3306): Access denied for user 'plants3_secure'@'sapporo.genomics.purdue.edu' (using password: YES)'
- PlantsT
- Program died: 'Object->readWholeFile: Cannot read file '/cbio/server/lib/perl/KModules/Data/config.plantst.genomics.purdue.edu': No such file or directory '
- PlantsP and PlantsUBQ
- current directories and configuration
- plantsp,plantsubq
- plantsp.genomics.purdue.edu
- plantsubq.genomics.purdue.edu
- database:plants3
- KModules path:/ver3/perl/
- testing
- plantsp.genomics.purdue.edu:8081
- plantsubq.genomics.purdue.edu:8081
- database:plants3_test
- KModules path:/ver3-test/perl/
- plantst-production
- URL: plantst.genomics.purdue.edu
- database: plants3
- KModules path:/server/lib/perl/
- plantst - testing
- URL: plantst.genomics.purdue.edu:8085
- database:plants3_test
- KModules path:/ver3/perl/
- selaginella-production
- URL:selaginella.genomics.purdue.edu
- database: plants3
- KModules path:/server/lib/perl/
- selaginella-testing
- URL:selaginella.genomics.purdue.edu:8081
- database: plants3_test
- KModules path:/ver3-test/perl
- maase
- ?
- plantsp,plantsubq
- do we need any of these? someone please check
currently running on gweb or (www)
- wiki.genomics.purdue.edu done
- targets.genomics.purdue.edu in progress
- targets-dev in progress
- xselaginella in progress
- xmaase.genomics.purdue.edu (defined but not running)
- xplantsp in progress
BLAST function brokensoap service working at this moment.- Scripts still using DataBase module:
plantsp/cgi-bin peptides_displaydetail.cgi:use DataBase; peptides_search.cgi:use DataBase; protein_dumper.cgi: use DataBase;
plantsp/cgi-bin/secure add2favorites.cgi:use DataBase; add2favorites_confirm.cgi:use DataBase; annotate.cgi:use DataBase; annotate_db_upload.cgi:use DataBase; blastfiles_manage.cgi:use DataBase; discuss-getsingle.cgi:use DataBase; discuss.cgi:use DataBase; discuss_confirm.cgi:use DataBase; edit.cgi:use DataBase; exp_annot.cgi:use DataBase; exp_annot_types.cgi:use DataBase; exp_info_retrieve.cgi:use DataBase; lexiedit.cgi:use DataBase; loadblastfile.pl:use DataBase; peptides_bgprocess.pl:use DataBase; peptides_blast.cgi:use DataBase; peptides_blast_bioperl.cgi:use DataBase; peptides_displaydetail.cgi:use DataBase; peptides_main.cgi:use DataBase; peptides_search.cgi:use DataBase; peptides_upload.cgi:use DataBase; sqlquery.cgi:use DataBase; user_update.cgi:use DataBase;
protein feature redraw on detail pageprotein sequence display on detail pagemotif search - delete feature function broken- style update needed for motif search and feature scan pages
search by uid function broken for some uids (e.g. 11013, 37645)Result of Search: select any item from dropdown list and click "make it so", all have errorsDNA/RNA page: click on DNA fasta generates error messagefeature motif search page, delete and exclude option worksadd2favorites.cgi modified to workchromosome drawclustalWChange "Get Sequence" on the search result page to "Get DNA Sequence" and "Get Protein Sequence"
current bugs
all - userpreferences interests doesn't update(Ying)- xplantsubq - final testing (Ying)
- xplantst
these servers are the place to bring up the new version. when working we will repoint the production symbols here amigo scripts (Hao/Doug)myristoylation predictionMyrist Predictor not working(Hao)documentation/help links in myrist page are dead, for example (Ying)BLAST returns no result(Hao)
GeneGC (Gribskov)(Hao)geneGC with DNA seq input return error:use QCGI:tmpImage which is not in QCGI.pm,(Hao)gene gc produces no result, for example(Hao)-
GC content Graph not shown fixed (Ying)
- feature search
Click on feature legend of protein feature map, return error; if redraw, click on feature legend, return a different error ,for example(Hao)- Comment:For feature search result page, no redraw function. Is it possible to use the same script as general search? these features are stored in files not in DB. there is no one unique id to identify which feature is selected.
boxes in protein features drawing are not clickable(Hao)
- details page
details page has "type:MSTR" just under logo box (Ying)- Protein.pm: delete $option->{dbh} ? what is this for?
- currently commented out, causes problems with some scripts, possible details.cgi, see Protein.pm line 113 (Gribskov)
Add Title (e.g.,UID) to the detail page(Ying)Detailed page has a weird display on IE (Gribskov)family links return list of all genes, modified search.cgi (Ying)-
mutant pagetemporarily inactivated - no new information have been loaded since 2005 (Ying) External resource: not showing, shows but needs roll-over information- Formating: Overflow of protein annotation, for example
- blockEST: sub blockEST from ObjectToHtml.pm (at the end of the DNA/RNA detail page) is not working properly, for example
- Motif search
Prosite motif search doesn't work (sample query g-x-g-x-x-g)(Ying)
- lexisearch
unexpanded cvs tags in lexisearch, probably needs to be committed(Hao)
- new papers
new papers is not running - probably needs cron job startedDoug started the cron job (Ying)-
the paper list uses original NCBI style sheetFixed (Ying) -
Add "new paper from collaborators" link to home page(Ying)
- Annotation page
Style sheetFixed (Ying)
- News
- Remove plantsp news -> what to add to fill the page? (Ying)
Configuration notes for new plants resources
- new servers for the time being, have the same names as the old ones with an x prepended. For instance, old:plantsp.genomics.purdue.edu, new:xplantsp.genomics.purdue.edu
- CVS
- After moving code to gweb (by cvs update), check that release tag rel-3-0 has been added to the sapporo version. You must move first, then add tag, or you will get the tagged version on gweb. this keeps new code checked into cvs on gweb from affecting current production versions of sapporo
- some directories have not been updated to point at the new repository. this can cause some CVS errors. check the CVS/Root file in each directory and make sure it reads /cbio/cvs (you can edit this file with any editor)
- (Gribskov 09:52, 16 October 2007 (EDT)) I know how to fix the tags it you make a mistake. don't panic.
- production versions go in /cbio/server/ver3/...
- Be Aware: /cbio is a different file system on gweb than on sapporo; changes to one do not affect the other
- FASTA databases go in /cbio/data/sequence/fasta
- Be Aware: /cbio/data is a different file system on gweb than on sapporo; changes to one do not affect the other
- the BlastDB table in the datbase now lets you specify the path
- tmp files go in /cbio/server/tmp - all databases can use the same tmp dir. this should be the only writable directory. It is highly desirable that scripts that write files take care of cleaning up old files with appropriate retention times
- perl libraries are in /cbio/server/ver3/perl hierarchy. trying to move only the ones we use.
- Kmodules - /cbio/server/ver3/perl/Kmodules
- Config files - /cbio/server/ver3/perl/Kmodules/Data
- Other perl libraries, e.g. BlastObj.pm - /cbio/server/ver3/perl/Perl_lib. This is new
- changing the directory for the other perl libraries requires this directory to be added to the perl path in the Apache configuration. If you get package not found errors, and the package is in /cbio/server/ver3/perl/Perl_lib let Doug or me know and we will look at the apache configuration
- Apache: gweb uses apache 2.2.4, sappor uses 1.3.33. I haven't noticed any difference, but there could be some that crop up
- logs are still in /var/log/httpd/xxx_error_log and xxx_access_log. Be Aware: the gweb logs are separate from the sapporo logs. If debugging, make sure you are looking at the right ones,
- apache config is in /opt/csw/apache/conf/extra. Not all servers have been set up. if you need a new one, contact Doug or me.
plan & progress
Maase (Yatcilla)
- maase - yatcilla
- checked scripts, libraries, and web pages into CVS; tried to omit dataGribskov 07:11, 18 October 2007 (EDT)
- put release tag on sapporo versionGribskov 07:12, 18 October 2007 (EDT)
- cleaned out a lot of apparently not used stuff on gweb: temp files, duplicate versions of codeGribskov 07:11, 18 October 2007 (EDT)
Targets (Gribskov)
- targets.genomics.purdue.edu
move server to gwebdone Gribskov- copy database from sapporo to gweb
- targets-dev
Selaginella (Gribskov)
- start with selaginella - Gribskov 11:44, 26 September 2007 (EDT)
update/commit all cvs changesdone - Gribskov 14:04, 26 September 2007 (EDT)-
checkout cvs on gwebGribskov 17:08, 26 September 2007 (EDT) Gribskov 14:04, 26 September 2007 (EDT) -
reconfigure cvs to new directory structure(started Gribskov 17:08, 26 September 2007 (EDT) Gribskov 14:04, 26 September 2007 (EDT))- this will probably not be a good idea for the plantsx databases,
they will need branch
- this will probably not be a good idea for the plantsx databases,
- leave database on sapporo - work on copy
need to branch cvs for KModulesdone Gribskov 18:04, 26 September 2007 (EDT)(started Gribskov 11:44, 26 September 2007 (EDT))i thought this would be hard, but it appears easy, we'll seedone, not too hard(Gribskov 17:15, 1 October 2007 (EDT))
build cvs/selaginella changes directly into cvsdone(Gribskov 17:15, 1 October 2007 (EDT))- updated KModules: QCGI, Object, ObjectToHtml
- updated cgi:blast_tmpl_s.cgi, signup.cgi
- everything is working except blast
and registration which use KModules, blast needs executable to test
- need blast web service Hao17:15, 1 October 2007 (EDT)
- made branch for plantsp/cgi-bin and cgi-bin/secure Gribskov 17:15, 1 October 2007 (EDT)
PlantsP (Gribskov/Li)
transferred to xplantsp and branched
- plantsp/templates
- plantsp/includes
- plantsp/cgi-bin, plantsp/cgi-bin/secure
- plantsp/html
- plantsp/html/secure - i don't think we need this, did not branch Gribskov 11:10, 16 October 2007 (EDT)
tasks
- plantsp (production)
- update/commit all cvs changes - Ying 14:46, 26 September 2007 (EDT)
KModulesdone - Ying 15:32, 26 September 2007 (EDT)cgi-bin/secure/.cgidone - Ying 16:54, 26 September 2007 (EDT)except cvsweb.cgi (unresolved conflict) and annotate_db_upload.cgiresolved conflicts Gribskov 10:55, 16 October 2007 (EDT)
- cgi-bin/ - Ying started 9:32, 27 September 2007 (EDT)
- no entry in CVS
- feature_new.cgi - usage?
- feature_scan.cgi - There is another copy in /cgi-bin/fscan/ currently being used. Usage of this one?
- go.cgi - There is another copy in /cgi-bin/secure/. One used for log-in users, the other for users not logging-in. Keep one copy in CVS, the other one will be copied after checking out from repository (according to Hao).
showBait.cgi - usage?not used, deleted Gribskov 11:25, 16 October 2007 (EDT)testdraw.cgi - archive?commited Gribskov 11:20, 16 October 2007 (EDT)url_validate.cgi- no longer used (according to Hao), moved to archive, Ying 15:47, 28 September 2007 (EDT)
everything elsedone - Ying 15:50, 28 September 2007 (EDT)
- no entry in CVS
template/ - Ying started 14:22, 28 September 2007 (EDT)done Gribskov 11:15, 16 October 2007 (EDT)template.plantsx.top, template.plantsx.bottom, search.tmpl -no longer used, moved to archive - Ying 15:22, 28 September 2007 (EDT)gap_edit.tmpl: usage?i'm pretty sure this belongs to a very dangerous script we use to have that edited gene models. I deleted it. Gribskov 10:59, 16 October 2007 (EDT)everything else- done - Ying 15:42, 28 September 2007 (EDT)
include/done - Ying 16:02, 28 September 2007 (EDT)html/- done except the following three types of files - Ying 10:22, 2 October 2007 (EDT)- automatically generated files (paper.html, *_block etc.)
- files no longer in use (used by archived files), but need double check
-
projects/(changed location of the directory (from /plantsp/ to /plantsp/html/)) - Done except .user and .htaccess files Ying 10:34, 3 October 2007 (EDT)
microarray/- done- Ying 10:56, 2 October 2007 (EDT)myrist/- done Ying 10:58, 2 October 2007 (EDT)docs/ - moved /ver3/plantsp/docs/* and /ver3/plantsp/plantsp/docs/* (overlap) to /ver3/support/doc/- done Ying 14:50, 5 October 2007 (EDT)
-
KModules - added rel-3-0 tag on sapporo production site(done Gribskov 20:06, 26 September 2007 (EDT)) -
KModules checkout cvs on gweb(done Gribskov 20:06, 26 September 2007 (EDT)) -
checkout cgi-bin on gwebdone last week Gribskov 08:48, 16 October 2007 (EDT)- draw_xsome is locally merged
- update/commit all cvs changes - Ying 14:46, 26 September 2007 (EDT)
KModules Updates (09:22, 16 October 2007 (EDT))
I made quite a few changes to QCGI, and Object.pm, and to the config files. major points
- QCGI now inherits correctly from CGI.pm, changes to code:
old
$some_param = $qcgi->{CGI}->param( $option ); print $qcgi->{CGI}->header;
new
$some_param = $qcgi->param( $option ); print $qcgi->header;
- QCGI accessor functions are now autoloaded. Please avoid using direct access into QCGI class variables
old
$dbName = $qcgi->{siteName}; $mask = $qcgi->{'dbMask'};
new
$dbName = $qcgi->siteName; $mask = $qcgi->dbMask;
- rationalized page headers and footers; its still not perfect, but everything is now handled in QCGI -- you no longer have to call CGI->header
old
print $qcgi->{CGI}->header; print $qcgi->pageHeader( "$qcgi->{siteName} Fasta" ); print $qcgi->pageTop(); print "<body>\n"; ... print "</body>\n"; print "</html>\n"; print $qcgi->pageBottom();
new
print $qcgi->pageHeader( "$site Lexicon" ); # see note 1 print "\n\n"; print $qcgi->pageTop( {class=>"page_narrow light1 border2" } ); # see note 2 ... print qq{ </div> <! closes pagebox -->\n\n}; # see note 3 print $qcgi->pageBottom(); print $qcgi->pageFooter( {update=>$update} ); # see note 4
- the parameter sent to pageHeader is the title. this was formerly sent to pageTop. This should probably be changed to a hash reference to allow more options
- pageTop now takes a hash reference as its parameter, right now only the class options is supported
class specifies a string that will be applied to a div tag after the body tag, the body style is completely set in the stylesheet and does not include a width - right now you need to close the div opened in pageTop manually. This seems like it should be in pageBottom to be analogous to pageTop; I'm not sure why I did it like this, maybe a back compatability issue
- added pageFooter to be analogous to pageHeader. The Footer closes the page, adds an update message if specified, and adds HTML and CSS validation links. Please try to make the final versions of pages validate (also helps find errors)
- the update message is created by the code below, usually at the very beginning of the program. Note that you must use single quotes to prevent the perl interpreter from interpolating (or trying and failing) the CVS tags.
my $update = 'Last update by $Author:$ on $Date:$';
- Back-compatability with stylesheets.
- in old QCGI, stylesheets was hard-wired /includes/Xstyle.css
- stylesheet is now speciied in config with, e.g., siteStyle = /includes/plantsp.css
- During port to gweb and updated modules
- leave the old Xstyle.css in place
- make a new style sheet with the name of the site (e.g., plantsp.css) need more information on how stylesheet works
- Add the code like the following to CGI scripts as you update. this will allow the not updated databases that point at plantsp to continue using their old styles. when the port is complete we'll have to remove this.
my $site = $qcgi->siteName;
#TODO this code temporarily overrides the style sheet in config
if ( $site =~ /PlantsP/i ) {
$qcgi->siteStyle( "/includes/plantsp.css" );
}
plantsubq (Li)
plantst (pending)
Web services (Jiang)
need webservices for
- thhmm
- tmpred
- blast
- features??
CVS
We use CVS, Concurrent Version system, for source code control and group programming
- Our CVS repository is /cbio/cvs. Include the following in your .cshrc file
setenv CVSROOT /cbio/cvs
- in each directory, check that the file CVS/Repository contains /cbio/cvs not the previous /cbio/server/lib/CVS_Repository
setenv CVS_RSH ssh
- web access: http://www.genomics.purdue.edu/local/cvsweb/cvsweb.cgi
- if using Eclipse, open window/team/CVS/Password Management and add the following definitions. "user" means your UNIX user name on sapporo
- Location=
:extssh:user@sapporo.genomics.purdue.edu:/cbio/cvs/CVS_Repository - Username=user
- Location=
- CVS home
- HTML reference, PDF reference
- Full CVS manual - wikified version of the complete manual
Branching
- apply branch as far up the module as possible (easier) or directory by directory(hard)
- cvs tag -b rel-3-0 (current plantsp production 18:13, 26 September 2007 (EDT))
- cvs update -r rel-3-0
- cvs commit (check here for more info)
Telling CVS to ignore files
CVS normally expects to be managing all the files in a directory (with some standard assumptions -- it will ignore files ending in ~, files starting with #, files ending in .o or .a, files named core, and a variety of other standard exclusions intended to avoid temporary or generated files; see info cvs for a full list). Often that isn't what you want, though. The standard practice is only to put source files under CVS control; any file that can be programmatically and automatically generated from the other files probably shouldn't be under revision control itself. (For programming projects, this generally means any file that can be build automatically from the Makefile.)
To tell CVS to completely ignore a file, put the name of the file in a file named .cvsignore in that directory. CVS will completely ignore any files listed in that file, treating them as if they didn't exist. You can also put wildcards into .cvsignore, using standard wildcard shell syntax; for example, a line of *.elc will tell CVS to ignore all files ending in .elc.
Similarly, CVS will normally expect to be managing all subdirectories of a directory, but if a directory name matches a filename or wildcard expression in .cvsignore CVS will ignore it.
Another use of .cvsignore is to tell CVS it does need to manage files it normally wouldn't. For example, CVS normally ignores files ending in .a because they're generally libraries built as the result of compiling sources, and therefore are generated files. If you have a file ending in .a that you want CVS to track, put:
!
in .cvsignore in that directory. That single exclamation point will clear the default list of ignored files. (You may have to, after the !, list some of the standard patterns like *~ or #* again if you want CVS to ignore them still, since ! clears out all patterns of files to ignore.)
Perl Style guide
A style guide, while some people find it constraining, eases your life as a programmer by making it easier to read your co-workers code. If you can more easily read and understand code, you will be forced to re-create the same code less often. Style is also important when using source code control systems such as CVS, RCS, SCCS because it minimizes the amount of diffs that represent purely formatting changes.
Please contribute your ideas for discussion.
Documentation
Header elements
Every Perl code should contain the following
- Standard executable and pragmas
- Name of script/package/module
- use CVS $Id:$ if possible; consider using CVS for anything you may reuse over time
- Statement of function
- Major Revisions
- use CVS $Log:$; previously we've had this at the top, I'm thinking it should be at bottom becasue it makes it hard to search for function names
Standard executable and pragmas
#!/cbio/server/software/bin/perl –Tw # for web-server scripts #!/usr/bin/perl -w # uses system perl; OK for other scripts use strict; use taint;
- Taint should be used for CGI scripts (use #!perl -T)
- Scripts should compile without warnings under perl -w (use #!perl -w)
- Strict should always be used
- use #!/cbio/server/software/bin/perl for server scripts. this allows us to point to somethings other than the most recent version of perl if there are back compatability issues after a perl update
Name of script/package/module
Even though it may be obvious, this can be a help when searching for particular programs. Include the name as a plain text comment or include the cvs keyword $Id:$
#!/cbio/server/software/bin/perl -Tw #------------------------------------------------------------------------------ # $Id:$ # preferred, or # myscript.cgi: # # description of what this code does #------------------------------------------------------------------------------ use strict; <Your code here> ... <at end of file> #------------------------------------------------------------------------------ # $Log:$ # preferred, or # list of revisions #------------------------------------------------------------------------------
Statement of function
Include a concise statement of what the program does. Include any necessary input files and if appropriate what programs create them. Explain any command line switches or options. For scripts that are more complicated, such as CGI scripts that call themselves, provide a description of the important control parameters.
Usage
A usage statement should be provided in every function or subroutine. In addition, the header of the file should summarize all of the functions/subroutines/methods in the file. For stand-alone programs, the command line argument -h should return a usage statement. Use mnemonic file names and you shouldn’t need to explain the usage much, but do not hesitate to provide explanations.
# USAGE # findseq.pl < sequence.fasta >list_of_sequences.text
Programs should reserve the –h switch for a usage message (which should also be displayed on input errors). Use the variable $USAGE for a usage string
my $USAGE = “findseq.pl < sequence.fasta >list_of_sequences.text”;
if ( $opt_h ) {
print $USAGE;
exit $status;
}
Major Revisions
CVS will keep a revision log for your code. EAch time you commit, the comments you write in CVS will be added if you include the $Log:$ CVS tag (Be careful not to write a CVS tag in the comment, this gets crazy fast).
# $Log:$
For Perl packages, it is sometimes useful to provide the system variable $VERSION so that programs can check for the version of a package when they include it. This can also be generated from CVS as follows
my $REVISION = '$Revision: 1.33 $'; $REVISION =~ s/\$//g; my ( $VERSION ) = $REVISION =~ /Revision: ([\d.]*) /;
HTML pages
It is important that we can trace every HTML file, and in many cases portions of the file to their source. This can greatly speed up debugging when problems occur.
Use comments to mark the beginning and ending of the HTML block, and give the location of the source file.
<!—template/bogus.html start --> <!-- $Id:$ --> … <!—template/bogus.html end -->
CVS keywords: see [1]
Some useful cvs keywords
- $Id:$ - the name of the file without path - use this at the beginning of html pages
- $Revision:$ - CVS revision
- $Author:$ - Last editing author
- use a string like $update = 'Last update $Date:$ by $Author:$'; to get a printable string with update information. You must use single quotes to prevent the perl interpreter interpolating the CVS tags (which look like perl variables)
- $Date:$ - Last edited date
- $Log:$ - revision comments - usually this should not be in HTML files, but should always be in CGI scripts.
Variable and Functions
- Always use lexical variables (defined by my), and explicitly pass all variables to subroutines.
- use underlines in variable names to separate words
- use upper/lower case in function names to separate words, begin with lowercase
- Packages begin with a capital letter
package ProcessText; # package name begins with capital $line_len = 0; # a variable sub getLineLen # functions use upper/lower case my $DEFAULT_LEN = 0; # use all uppercase for constant parameters
Style
- Use cuddled braces
- include space inside braces/parentheses in logical tests
- include space between braces/parentheses and preceding/following text
foreach my $residue ( @peptide ) {
# some code here;
}
# is preferred over
foreach my $residue ( @peptide )
{
# some code here;
}
- indent 4 spaces, use spaces not tabs (your editor will handle this for you)
foreach my $sequence ( @list ) {
$sequence_clean = removeJunk( $sequence );
}
- incude a comment explaining every regular expression
- use spaces to line up key/value pairs in hashes
- use => instead of , in hashes
Of Interest
Ajax for bioinformatics
