Nov 19, 2008

Note of SAS coding style

Guidelines for Coding of SAS® Programs
--by Thomas J. Winn Jr., Texas State Auditor’s Office, Austin, TX

SAS language is like a script but with more flexiblility and less limitation. Sometimes it is useful but as a programmer, I always confused by program writen by SAS, though sometimes the problem is not complex.

Luckily, I found a paper by Thomas, provide some idea which is practical and beautiful, Though most of them is just a basic law in normal programming. And Law is Freedom.

And the following material is just copy from the paper.

“This paper presents a set of guidelines that could be used for writing SAS code that is clear, efficient, and easy to maintain.”
you can write SAS code in a particular way does not mean that you should do so.

Name
• In naming, avoid cuteness, single-letter names, and names that too closely resemble one another.
• Names should be unique, short, and descriptive – in that order of importance.
• If longer names are needed, underscores may be used to separate words, in order to enhance readability.
• If a user-defined format applies to only one variable, then name the format with a readily-recognizable form of the variable-name plus the suffix FMT .

READABILITY & APPEARANCE
♦ Insert a blank line between SAS program steps; that is, before each DATA or PROC step.
♦ Be consistent with your indentation increments.
♦ Indent all statements in a logical grouping by the same amount.
♦ Left-justify all OPTIONS, DATA, PROC, and RUN statements. Indent all of the statements within a DATA or PROC step.
♦ Indent conditional blocks and DO groups, and do it consistently, The logic will be easier to follow.
♦ Align each END statement with its corresponding DO statement. This will make it easier to verify that they match.
♦ Remember to preface major blocks of code with explanatory comments.
♦ Consider inserting PAGE statements to force the SAS Log to begin tracing the execution of new modules on a new page.

REUSABILITY
Since most of the operations of the SAS macro facility are carried out in the background, sometimes debugging them can be fairly mysterious.
• Write code that can be re-used, with different parameters. Keyword parameters are preferable to positional parameters, because they are less likely to be specified incorrectly.
• Write the code you use repeatedly as a macro, and then, instead of repeating your code, invoke the macro.
• Avoid using global macro variables.
• If a macro is used by more than one program, put it into an AUTOCALL macro library.

EFFICIENCY
Avoid jumping to statement labels by GO TO, or LINK statements and RETURN statements,
♦ If possible, replace logic which jumps between subroutines with DO ...END and IF ... THEN ... ELSE ...-logic,
♦ End every DATA and PROC (except PROC SQL) step with a RUN statement,
♦ End every PROC SQL step with a QUIT statement.

Place most of the non-executable statements in a DATA step before all of the executable statements.
In particular, place variable attribute and other declarative statements near to the top of the DATA step, and ahead of the executable statements.

• In a DATA step, place most of the non-executable statements before the executable statements – exceptions include the DROP or KEEP statements, which may be placed after the executable statements.
• Define INPUT and PUT variables one per line, using @ pointer control.
• Screen data for unusual circumstances.

reduce the number of times the data are read:

♦ Minimize the number of passes through the data,
♦ Minimize the number of DATA steps,
♦ Read and store only the data that are needed,
♦ Sort the data only when it is absolutely necessary.

When you read in an external file, use pointer controls, informats, or column specifications in the INPUT statement, to read only those fields you actually need.
• Store only the variables you need by using DROP or KEEP statements, DROP= or KEEP= options (eliminate variables from the output data set which are needed only during DATA step execution, and not afterward).
• When only one condition can be true for a given observation, use IF ... THEN ...ELSE ... statements (or a SELECT group), instead of a series of IF ... THEN ... statements without ELSE statements (In a sequence of IF-THEN statements without the ELSE, the SAS System will check each condition for every observation).
• When using a series of IF ... THEN ... ELSE ... statements, list the conditions in descending order of probability. This will save CPU time.,
• Use the LENGTH statement to reduce the storage space for variables in SAS data sets.
• Minimize workspace usage by using the DELETE statement in a PROC DATASETS step, to eliminate temporary data sets that are no longer needed by the program.
• Use the IN operator instead of a series of multiple logical OR operators.

0 comments: