*TWO MEANS OF READING RAW DATA
INTERNAL RAW DATA / EXTERNAL RAW DATA;
*1. INTERNAL RAW DATA
example as followed;
DATA USPRESIDENTS;
INPUT PRESIDENT $ PARTY $ NUMBER;*specify a '$' after a variable to indicate that the variable is a character type;
DATALINES;* structured;
Adamas F 2
Lincoln R 16
Grant R 18
Kennedy D 35
;*must be a single line without any other words;
run;
*2. EXTERNAL RAW DATA
examples as followed;
* (1) INFILE YOUR DATA;
DATA USPRESIDENTS;
INFILE 'C:\MYRAWDATA\PRESIDENT.dat' LRECL = 2000;*Give the path of the data. By default, the SAS assumes external files have a record length of 256 or less, 'LRECL' is used to specify the record length;
INPUT PRESIDENT $ PARTY $ NUMBER;
RUN;
* (2) LIST INPUT(A PREFERRED NAME 'READ ROW');
DATA TEST1;
INPUY NAME $ AGE HEIGHT;
DATALINES;
Lucky 2.3 1.9 . 3.0
Spot 4.6 2.5 3.1 .5
Tue 7.1 . 3.8
1.5
;
RUN;*By defalut, SAS will go to the next data line to read more data if there are more vairables in the INPUT statement than there are values in the data line;
* U can use proc PRINT to check the result;
PROC PRINT DATA = TEST1;
TITLE "SAS DATA SET 'TEST1'";
RUN;
* (3) READ COLUMN;
DATA TEST2;
INPUT NAME $ 1-7 SCORE1 8-10 SCORE2 11-14 SCORE3 15-18 SCORE4 19-22;
Lucky B 2.3 1.9 2.3 3.0
Spot D 4.6 2.5 3.1 0.5
Tue K 7.1 1.1 3.8 1.5
;
RUN;
* (4) READING RAW DATA NOT IN STANDARD FORMAT;
* an example of not standard raw data;
01/01/60 1,002
01/03/60 2,012
02/01/60 4,336
An EXAMPLE of reading this raw data are showed below;
INPUT NAME $10. AGE 3. HEIGHT 5.1 BIRTHDAY MMDDYY10.;
* Another EXAMPLE, SUPPOSED THAT YOU HAVE A DATASET LIKE THAT:
ALLICA GROSSMAN 13 C 10-28-2012 7.8 2.3 5.6 4.6
GRANDA SMITH 18 D 10-30-2013 5.6 1.2 5.6 7.7
...
* U can then read the raw data by;
DATA TEST3;
INFILE 'A PATH';
INPUT NAME $16. AGE 3. +1 TYPE $1. +1 DATE MMDDYY10. (SCORE1 SCORE2 SCORE3 SCORE4) 4.1;
RUN;
* +1 represents skipping one column;
* (IMPORTANT!) SUMMARIZE FOR selected informats;
* 1) character;
$CHAR4. /*reading character data--which does not trim leading or trailing blank*/
$UPCASE4. /*Converts character data to uppcase*/
$4. /*reads character data with triming leading blanks*/
* 2) DATE TIME AND DATETIME;
DATE9. 05DEC2018 | DATE7. 05DEC18;
MMDDYY10. 12052018;
TIME8. 00:10:12.58 |TIME4. 00:10;
the incompletely informats displayed above are mostly used in my daily work, so....
* 3) NUMERIC;
COMMA5.1 /*REMOVE COMMAS AND $, AND CONVERTS '(' INTO MINUS SIGN'-'*/
COMMAX5. /*REMOVE COMMAS AND PERIOD(, AND .)*/
PERCENT4. /*CONVERT PERCENTS TO NUMBERS*/
5.1 /*READ STANDARD NUMERIC DATA*/
* (5) MIXED INPUT STYLE;
* AN Example is showed below;
Yellowstone ID/MT/WY 1872 4,605,493
EVERGLADES FL 1934 1,398,800
Yosemite CA 1864 760,917
Great Smoky Mountains NC/TN 1926 520,269
Wolf Trap Farm VA 1966 130
...
* THEN U can wirte the input statement as;
INPUT parkname $ 1-22 State $ Year @40 Acreage COMMA9.;
*parkname---column read, State & Year----row read, Acreage---read with formatted style input;
*@40 represents the u command the pointer move the the 40th column;
*other useful for @n 1. to skip over unneeded variables 2. read a variable twice using different informats.;
* (6) READING MESSY RAW DATA;
* SUPPOSE that u have such a messy data like this;
My dog Sam Breed: Rottweiller Vet Bills: $478
* U can use the INPUT statement infront of a variable name to specify the position of the pointer;
INPUT @'Bread:' DOGBREED :$20.;
* command "@'Breed:" specify a position for pointer when it tends to read variable DOGBREED;
* $20. make pointer to read 20 characters for variable DOGBREED(defalut length is 8);
* :$20. a colon ':' is used to let pointer stop reading data when it meets a space or the end of dataline;
INPUT @'Bread:' DOGBREED $; result is Rottweil;
INPUT @'Bread:' DOGBREED $20.; result is Rottweiller Vet Bill;
INPUT @'Bread:' DOGBREED :$20.; result is Rottweiller;
* (7) READING MULTIPLE LINES OF RAW DATA PER OBSERVATION;
* SUPPOSE that u have a raw data like this;
Nome AK
55 33
78 23
Miami FL
34 23
32 55
...
* u can infile this raw data by;
DATA TEST;
INFILE 'example1.dat';
INPUT city $ state $
/ Nomalhigh Nomallow /*a '/' let SAS pointer to read a observation at a next line*/
#3 recordhigh recordlow/*-also, -*/
;
RUN;
* (8) READING MULTIPLE OBSERVATION PER LINE OF RAW DATA;
* SUPPOSE that u have a raw data like this;
Nome AK 55 33 Miami FL 78
12 Raleigh NC . 35
...
* u can infile this raw data by;
DATA TEST;
INFIEL 'example1.dat';
INPUT STATE $ CITY $ NORMALRAIN MEANDAYRAIN @@; /*----A '@@'tell SAS pointer keep reading until it either runs out of the whole data or reaches an INPUT statement that does not end with a double trailing'@@'----*/
RUN;
* (9) READING PART OF THE RAW DATA;
* SUPPOSE YOU HAVE A RAW DATA LIKE THIS;
MALE 78 JACK
FEMALE 34 JULY
MALE 34 JASON
FEMALE 12 MARIA
...
* SUPPOSE that u ONLY want to read male_s information, so;
DATA INFO_MALE;
INPUT SEX $ @;/*--'@' is USED to keep the pointer staying at this variable and excute IF statement showed below--*/
IF SEX = 'FEMALE' THEN DELETE;
INPUT AGE NAME $;
RUN;
* it ends either the the end of DATA step, or a subsquence INPUT statement that not contains '@';
LATER TO BE CONTINUED....