SAS learning_1: READ data(mainly internal raw data)

*TWO MEANS OF READING RAW DATA
INTERNAL RAW DATA / EXTERNAL RAW DATA;

*1. INTERNAL RAW DATA
example as followed;

DATA USPRESIDENTS;
	INPUT PRESIDENT $ PARTY $ NUMBER;*specify a '$' after a variable to indicate that the variable is a character type;
	DATALINES;* structured;
Adamas F 2
Lincoln R 16
Grant R 18
Kennedy D 35
	;*must be a single line without any other words;
run;

*2. EXTERNAL RAW DATA
examples as followed;


* (1) INFILE YOUR DATA;

DATA USPRESIDENTS;
	INFILE 'C:\MYRAWDATA\PRESIDENT.dat' LRECL = 2000;*Give the path of the data. By default, the SAS assumes external files have a record length of 256 or less, 'LRECL' is used to specify the record length;
	INPUT PRESIDENT $ PARTY $ NUMBER;
RUN;

* (2) LIST INPUT(A PREFERRED NAME 'READ ROW');

DATA TEST1;
	INPUY NAME $ AGE HEIGHT;
	DATALINES;
Lucky 2.3 1.9 . 3.0
Spot 4.6 2.5 3.1 .5
Tue 7.1 . 3.8
1.5
	;
RUN;*By defalut, SAS will go to the next data line to read more data if there are more vairables in the INPUT statement than there are values in the data line;

* U can use proc PRINT to check the result;
PROC PRINT DATA = TEST1;
    TITLE "SAS DATA SET 'TEST1'";
RUN;

* (3) READ COLUMN;

DATA TEST2;
	INPUT NAME $ 1-7 SCORE1 8-10 SCORE2 11-14 SCORE3 15-18 SCORE4 19-22;
Lucky B 2.3 1.9 2.3 3.0
Spot D 4.6 2.5 3.1 0.5
Tue K 7.1 1.1 3.8 1.5
	;
RUN;

* (4) READING RAW DATA NOT IN STANDARD FORMAT;
* an example of not standard raw data;

01/01/60 1,002
01/03/60 2,012
02/01/60 4,336

 An EXAMPLE of reading this raw data are showed below;

INPUT NAME $10. AGE 3. HEIGHT 5.1 BIRTHDAY MMDDYY10.;

* Another EXAMPLE, SUPPOSED THAT YOU HAVE A DATASET LIKE THAT:

ALLICA GROSSMAN 13 C 10-28-2012 7.8 2.3 5.6 4.6
GRANDA SMITH 18 D 10-30-2013 5.6 1.2 5.6 7.7
...

* U can then read the raw data by;

DATA TEST3;
	INFILE 'A PATH';
	INPUT NAME $16. AGE 3. +1 TYPE $1. +1 DATE MMDDYY10. (SCORE1 SCORE2 SCORE3 SCORE4) 4.1;
RUN;
* +1 represents skipping one column;

* (IMPORTANT!) SUMMARIZE FOR selected informats;


* 1) character;
$CHAR4.      /*reading character data--which does not trim leading or trailing blank*/
$UPCASE4. /*Converts character data to uppcase*/
$4.                /*reads character data with triming leading blanks*/

* 2) DATE TIME AND DATETIME;
DATE9. 05DEC2018 | DATE7. 05DEC18;
MMDDYY10. 12052018;
TIME8. 00:10:12.58 |TIME4. 00:10;

the incompletely informats displayed above are mostly used in my daily work, so....

* 3) NUMERIC;
COMMA5.1   /*REMOVE COMMAS AND $, AND CONVERTS '(' INTO MINUS SIGN'-'*/
COMMAX5.   /*REMOVE COMMAS AND PERIOD(, AND .)*/
PERCENT4.  /*CONVERT PERCENTS TO NUMBERS*/
5.1                 /*READ STANDARD NUMERIC DATA*/
 

* (5) MIXED INPUT STYLE;
* AN Example is showed below;
Yellowstone           ID/MT/WY 1872    4,605,493
EVERGLADES            FL 1934          1,398,800
Yosemite              CA 1864            760,917
Great Smoky Mountains NC/TN 1926         520,269
Wolf Trap Farm        VA 1966                130
...
* THEN U can wirte the input statement as;

INPUT parkname $ 1-22 State $ Year @40 Acreage COMMA9.;

*parkname---column read, State & Year----row read, Acreage---read with formatted style input;
*@40 represents the u command the pointer move the the 40th column;
*other useful for @n 1. to skip over unneeded variables 2. read a variable twice using different informats.;

* (6) READING MESSY RAW DATA;
* SUPPOSE that u have such a messy data like this;
My dog Sam Breed: Rottweiller Vet Bills: $478

* U can use the INPUT statement infront of a variable name to specify the position of the pointer;

INPUT  @'Bread:' DOGBREED :$20.;

* command "@'Breed:" specify a position for pointer when it tends to read variable DOGBREED;
* $20. make pointer to read 20 characters for variable DOGBREED(defalut length is 8);
* :$20. a colon ':' is used to let pointer stop reading data when it meets a space or the end of dataline;

INPUT  @'Bread:' DOGBREED $; result is Rottweil;
INPUT  @'Bread:' DOGBREED $20.; result is Rottweiller Vet Bill;
INPUT  @'Bread:' DOGBREED :$20.; result is Rottweiller;

* (7) READING MULTIPLE LINES OF RAW DATA PER OBSERVATION;
* SUPPOSE that u have a raw data like this;
Nome AK
55 33
78 23
Miami FL
34 23
32 55
...
* u can infile this raw data by;

DATA TEST;
	INFILE 'example1.dat';
	INPUT city $ state $
	/ Nomalhigh Nomallow /*a '/' let SAS pointer to read a observation at a next line*/
	#3 recordhigh recordlow/*-also, -*/
	;
RUN;

* (8) READING MULTIPLE OBSERVATION PER LINE OF RAW DATA;
* SUPPOSE that u have a raw data like this;
Nome AK 55 33 Miami FL 78
12 Raleigh NC . 35
...
* u can infile this raw data by;

DATA TEST;
	INFIEL 'example1.dat';
	INPUT STATE $ CITY $ NORMALRAIN MEANDAYRAIN @@; /*----A '@@'tell SAS pointer keep reading until it either runs out of the whole data or reaches an INPUT statement that does not end with a double trailing'@@'----*/
RUN;

* (9) READING PART OF THE RAW DATA;
* SUPPOSE YOU HAVE A RAW DATA LIKE THIS;
MALE 78 JACK
FEMALE 34 JULY
MALE 34 JASON
FEMALE 12 MARIA
...
* SUPPOSE that u ONLY want to read male_s information, so;

DATA INFO_MALE;
	INPUT SEX $ @;/*--'@' is USED to keep the pointer staying at this variable and excute IF statement showed below--*/
	IF SEX = 'FEMALE' THEN DELETE;
	INPUT AGE NAME $;
RUN;

* it ends either the the end of DATA step, or a subsquence INPUT statement that not contains '@';

 

 

LATER TO BE CONTINUED....

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章