2007-07-16

Accessing Windows PowerShell from SAS

Windows PowerShell, previously Microsoft Shell or MSH (codenamed Monad) is an extensible command line interface (CLI) shell and scripting language product developed by Microsoft.—Wikipedia

PowerShell provides ‘easy’ access to Windows via a command line, and can be used for many things. I wanted to find the total disk space and amount free which we can do like this:
get-wmiobject -class win32_logicaldisk
Outputs:


DeviceID : A:
DriveType : 2
ProviderName :
FreeSpace :
Size :
VolumeName :

DeviceID : C:
DeviceID : C:
DriveType : 3
ProviderName :
FreeSpace : 221132865536
Size : 293552017408
VolumeName :
PowerShell uses STDIN and STDOUT, which is how we can pipe it into SAS. Note adding the filter to restrict to local, non-removable, drives:
filename diskinfo pipe "powershell get-wmiobject -class win32_logicaldisk -filter 'DriveType=3'";

data _null_;
infile diskinfo;
input;
put _infile_;
run;
Which yields (note the 2 leading blank lines):


DeviceID : C:
DriveType : 3
ProviderName :
FreeSpace : 221132881920
Size : 293552017408
VolumeName :
PowerShell can reformat output, but I coded SAS to read it as-is:
data _null_;
infile diskinfo firstobs=3 truncover;
input
@':' +1 DeviceID $1.
/
/
/ @':' +1 FreeSpace 32.
/ @':' +1 Size 32.
/ @':' +1 VolumeName $32.
/;

put _all_;
run;
Which gives us variables holding the total size of the C drive (in bytes) and free space:
DeviceID=C FreeSpace=221132861440 Size=293552017408 VolumeName=  _ERROR_=0 _N_=1

2007-07-13

Being a SAS Data Warehouse Administrator

Somebody asked me what made a SAS data warehouse programmer different from a normal SAS programmer—was it the amount of SAS knowledge I had?

The short answer was no, it’s all about experience. Here’s a rough longer answer based on my data warehousing experience at 4 different organisations:
  • Strong base SAS and macro is essential, this is what the majority of data warehouse jobs use.

  • Also important is a strong knowledge of how SAS works internally. What gets written to the log in the following sample, and why?
    data _null_;
    log 'hi';
    set work.test(obs=1);
    run;
  • A general knowledge of SAS and the ability to learn more as needed: e.g. I also use ODS and Graph on a regular basis.

  • General knowledge: operating system knowledge (mainframe/Unix/Windows), SQL, reading from different formats (CSV, XML…), writing to various formats (Excel, HTML, XML…), version control, backup…

  • Awareness of machine resources and scheduling

  • Understand the big picture: can you justify creating an index (space and time vs. potential usage & savings)? Should you create another data set for new data or combine with an existing one? When should you create a macro for a common routine vs. a pre-summarised data set?

  • Consistency: users need to be able to transfer their knowledge from one part of the warehouse to another. Don’t make it hard for them.

  • Work within the existing warehouse setup: if the naming standards are CamelCase (my preference), don’t create Underscore_Names.

  • It’s about doing it right

  • Documentation. The ‘d’ word! I try to create clear and commented code, easy-to-use data sets, and document what I’ve done. My goal is to make myself redundant (which hasn’t worked!)

  • Communication: essential within your team, to users, and management. Sometimes users and/or managers will have an exact requirement but it should be implemented in a different way to fit the overall picture—you need to spot these and be able to explain why.

  • After hours work: users don’t like losing the warehouse to maintenance during working hours!

2007-07-07

Bridge to Nowhere^h^h^h Terabithia

You have not stumbled into Narnia—don’t believe the trailer.

This is a coming of age movie, and as that succeeds well. However it is NOT a fantasy movie as portrayed in the trailer: the fantasy elements only make up about 10 minutes of movie time.

Take your pre-teens, not your younger kids looking for a good fantasy.