SAS Interview Questions and Answers
Last updated on 24th Oct 2020, Blog, Interview Question
Statistical Analysis System, commonly known as SAS is considered as a set of multiple software that is integrated and used for various operations like Data Management, Predictive – Perspective & Descriptive Analysis, Quality Improvement, Business Analysis, Application Development, etc.
SAS’s large number of components customization, as well as extensive programming approach feature together, performs Data Analysis and Data transformation tasks. It can run on any operating system (Linux, Windows, etc) as it is platform-independent.
1. Describe 5 Ways To Do A “table Lookup” In Sas?
- Match Merging,
- Direct Access,
- Format Tables,
- PROC SQL.
2. What Are Some Good Sas Programming Practices For Processing Very Large Data Sets?
Sampling method using OBS option or subsetting, commenting the Lines, Use Data Null.
3. Under What Circumstances Would You Code A Select Construct Instead Of If Statements?
I think Select statements are used when you are using one condition to compare with several conditions like.
- Data exam;
- Set exam;
- select (pass);
- when Physics gt 60;
- when math gt 100;
- when English eq 50;
- otherwise fail;
4. What Is The One Statement To Set The Criteria Of Data That Can Be Coded In Any Step?
5. What Is The Effect Of The Options Statement Errors=1?
The –ERROR- variable has a value of 1 if there is an error in the data for that observation and 0 if it is not.
6. What Do The Sas Log Messages “numeric Values Have Been Converted To Character” Mean? What Are The Implications?
It implies that automatic conversion took place to make character functions possible.
7. Why Is A Stop Statement Needed For The Point= Option On A Set Statement?
Because POINT= reads only the specified observations, SAS cannot detect an end-of-file condition as it would if the file were being read sequentially.
8. How Do You Control The Number Of Observations And/or Variables Read Or Written?
FIRSTOBS and OBS option.
9. Approximately What Date Is Represented By The Sas Date Value Of 730?
INPUT, DATA and RUN.
10. Does Sas ‘translate’ (compile) Or Does It ‘interpret’?
11. What Does The Run Statement Do?
When the SAS editor looks at Run it starts compiling the data or proc step, if you have more than one data step or proc step or if you have a proc step. Following the data step then you can avoid the usage of the run statement.
Subscribe For Free Demo[contact-form-7 404 "Not Found"]
12. Why Is Sas Considered Self-documenting?
SAS is considered self documenting because during the compilation time it creates and stores all the information about the data set like the time and date of the data set creation later No. of the variables later labels all that kind of info inside the dataset and you can look at that info using the proc contents procedure.
13. What Is The Difference Between Functions And Procs That Calculate The Same Simple Descriptive Statistics?
Functions can be used inside the data step and on the same data set but with proc’s you can create a new data set to output the results.
14. What Is A Method For Assigning First.var And Last.var To The By Groupvariable On Unsorted Data?
In unsorted data you can’t use First. or Last.
15. How Do You Debug And Test Your Sas Programs?
First thing is look into Log for errors or warnings or NOTE in some cases or use the debugger in SAS data step.
16. What Areas Of Sas Are You Most Interested In?
BASE, STAT, GRAPH, ETSBriefly.
17. What Versions Of Sas Have You Used (on Which Platforms)?
SAS 9.1.3,9.0, 8.2 in Windows and UNIX, SAS 7 and 6.12.
18. What Are Some Problems You Might Encounter In Processing Missing Values? In Data Steps? Arithmetic? Comparisons? Functions? Classifying Data?
The result of any operation with missing value will result in missing value. Most SAS statistical procedures exclude observations with any missing variable values from an analysis.
19. How Would You Create A Data Set With 1 Observation And 30 Variables From A Data Set With 30 Observations And 1 Variable?
Using PROC TRANSPOSE.
20. What Is The Different Between Functions And Procs That Calculate The Same Simple Descriptive Statistics?
Proc can be used with wider scope and the results can be sent to a different dataset. Functions usually affect the existing datasets.
21. If You Were Told To Create Many Records From One Record, Show How You Would Do This Using Array And With Proc Transpose?
Declare array for number of variables in the record and then use Do loop Proc Transpose with VAR statement.
22. What Are _numeric_ And _character_ And What Do They Do?
Will either read or write all numeric and character variables in the dataset.
23. How Would You Create Multiple Observations From A Single Observation?
Using double Trailing @@.
24. For What Purpose Would You Use The Retain Statement?
The retain statement is used to hold the values of variables across iterations of the data step. Normally, all variables in the data step are set to missing at the start of each iteration of the data step. What is the order of evaluation of the comparison operators: + – * / ** ()?A) (), **, *, /, +, -.
Get Comprehensive SAS Training to Build Your Skills & Advance Your Career
- Instructor-led Sessions
- Real-life Case Studies
25. How Could You Generate Test Data With No Input Data?
Using Data Null and putting statements.
26. What Can You Learn From The Sas Log When Debugging?
It will display the execution of the whole program and the logic. It will also display the error with line number so that you can and edit the program.
27. What Is The Purpose Of _error_?
It has only two values, which are 1 for error and 0 for no error.
28. How Can You Put A “Trace” In Your Program?
By using ODS TRACE ON.
29. How Does Sas Handle Missing Values In: Assignment Statements, Functions, A Merge, An Update, Sort Order, Formats, Procs?
Missing values will be assigned as missing in the Assignment statement. Sort order treats missing as second smallest followed by underscore.
30. How Do You Test For Missing Values?
Using Subset functions like IF then Else, Where and Select.
31. How Are Numeric And Character Missing Values Represented Internally?
Character as Blank or and Numeric as.
32. Which Date Functions Advances A Date Time Or Date/time Value By A Given Interval?
33. In The Flow Of Data Step Processing, What Is The First Action In A Typical Data Step?
When you submit a DATA step, SAS processes the DATA step and then creates a new SAS data set.( creation of input buffer and PDV)
- Compilation Phase
- Execution Phase.
34. What Are Sas/access And Sas/connect?
SAS/Access only process through the databases like Oracle, SQL-server, Ms-Access etc.
SAS/Connect only use Server connection.
35. What Is The Purpose Of Using The N=ps Option?
The N=PS option creates a buffer in memory which is large enough to store PAGESIZE (PS) lines and enables a page to be formatted randomly prior to it being printed.
36. What Are The Scrubbing Procedures In Sas?
Proc Sort with nodupkey option, because it will eliminate the duplicate values.
37. What Are The New Features Included In The New Version Of Sas I.e., Sas9.1.3?
The main advantage of version 9 is faster execution of applications and centralized access of data and support.
There are lots of changes that have been made in version 9 when compared with version 8. The following are the few:
SAS version 9 supports Formats longer than 8 bytes & is not possible with version 8.
Length for the Numeric format allowed in version 9 is 32 where as 8 in version 8.
Length for Character names in version 9 is 31 whereas in version 8 is 32.
Length for numeric informat in version 9 is 31, 8 in version 8.
Length for character names is 30, 32 in version 8.3 new informats are available in version 9 to convert various date, time and datetime forms of data into a SAS date or SAS time.
- ANYDTDTE. – Converts to a SAS date value
- ANYDTDTM. – Converts to a SAS time value.
- ANYDTDTM. -Converts to a SAS datetime value.CALL SYMPUTX Macro statement is added in the version 9 which creates a macro variable at execution time in the data step by
Trimming trailing blanks
Automatically converting numeric value to character.
New ODS option (COLUMN OPTION) is included to create multiple columns in the output.
38. What Difference Did You Find Among Version 6 8 And 9 Of Sas?
The SAS 9 Architecture is fundamentally different from any prior version of SAS. In the SAS 9 architecture, SAS relies on a new component, the Metadata Server, to provide an information layer between the programs and the data they access. Metadata, such as security permissions for SAS libraries and where the various SAS servers are running, are maintained in a common repository.
39. What Has Been Your Most Common Programming Mistake?
Missing semicolon and not checking log after submitting program, Not using debugging techniques and not using Fsview option vigorously.
40. Name Several Ways To Achieve Efficiency In Your Program?
Efficiency and performance strategies can be classified into 5 different areas.
- CPU time
- Data Storage
- Elapsed time
- Memory CPU Time and Elapsed Time- Base line measurements.
41. What Other Sas Products Have You Used And Consider Yourself Proficient In Using?
Data _NULL_ statement, Proc Means, Proc Report, Proc tabulate, Proc freq and Proc print, Proc Univariate etc.
42. What Is The Significance Of The ‘of’ In X=sum (of A1-a4, A6, A9);
If we don’t use the OF function it might not be interpreted as we expect. For example the function above calculates the sum of a1 minus a4 plus a6 and a9 and not the whole sum of a1 to a4 & a6 and a9. It is true for the mean option also.
43. What Do The Put And Input Functions Do?
INPUT function converts character data values to numeric values.
- for INPUT: INPUT (source, informat).
PUT function converts numeric values to character values.
- For PUT: PUT (source, format).
44. Which Date Function Advances A Date, Time Or Datetime Value By A Given Interval?
INTNX: INTNX function advances a date, time, or datetime value by a given interval, and returns a date, time, or datetime value.
INTCK: INTCK(interval,start-of-period,end-of-period) is an interval function that counts the number of intervals between two given SAS dates, Time and/or datetime.
DATETIME () returns the current date and time of day.
DATDIF (date,date,basis): returns the number of days between two dates.
45. What Do The Mod And Int Function Do? What Do The Pad And Dim Functions Do?
MOD: Modulo is a constant or numeric variable, the function returns the remainder after numeric value divided by modulo.
INT: It returns the integer portion of a numeric value truncating the decimal portion.
PAD: it pads each record with blanks so that all data lines have the same length. It is used in the INFILE statement. It is useful only when missing data occurs at the end of the record.
CATX: concatenate character strings, removes leading and trailing blanks and inserts separators.
SCAN: it returns a specified word from a character value. Scan function assigns a length of 200 to each target variable.
SUBSTR: extracts a substring and replaces character values.Extraction of a substring:
Middle Initial=substr(middle name,1,1); Replacing character values: substr (phone,1,3)=’433’; If SUBSTR function is on the left side of a statement, the function replaces the contents of the character variable.
TRIM: trims the trailing blanks from the character values.
SCAN vs. SUBSTR: SCAN extracts words within a value that is marked by delimiters. SUBSTR extracts a portion of the value by stating the specific location. It is best used when we know the exact position of the substring to extract from a character value.
46. How Might You Use Mod And Int On Numeric To Mimic Substr On Character Strings?
The first argument to the MOD function is a numeric, the second is a non-zero numeric; the result is the remainder when the integer quotient of argument-1 is divided by argument-2. The INT function takes only one argument and returns the integer portion of an argument, truncating the decimal portion. Note that the argument can be an expression.
- DATA NEW ;
- A = 123456 ;
- X = INT( A/1000 ) ;
- Y = MOD( A, 1000 ) ;
- Z = MOD( INT( A/100 ), 100 ) ;
- PUT A= X= Y= Z= ;
- RUN ;
47. In Array Processing, What Does The Dim Function Do?
DIM: It is used to return the number of elements in the array. When we use the Dim function we would have to re –specify the stop value of an iterative DO statement if you change the dimension of the array.
48. How Would You Determine The Number Of Missing Or Nonmissing Values In Computations?
To determine the number of missing values that are excluded in a computation, use the NMISS function.
- data _null_;
- m = . ;
- y = 4 ;
- z = 0 ;
- N = N(m , y, z);
- CMISS = NMISS (m , y, z);
- The above program results in N = 2 (Number of non missing values) and NMISS = 1 (number of missing values).
49. Do You Need To Know If There Are Any Missing Values?
This function simply returns 0 if there aren’t any or 1 if there are missing values.If you need to know how many missing values you have then use
You can also find the number of non-missing values with
- non_missing=N (field1,field2,field3);
50. What Is The Difference Between: X=a+b+c+d; And X=sum (of A, B, C ,d);?
Is anyone wondering why you wouldn’t just use total=field1+field2+field3;
51. First, How Do You Want Missing Values Handled?
The SUM function returns the sum of non-missing values. If you choose addition, you will get a missing value for the result if any of the fields are missing. Which one is appropriate depends upon your needs.However, there is an advantage to use the SUM function even if you want the results to be missing. If you have more than a couple fields, you can often use shortcuts in writing the field names If your fields are not numbered sequentially but are stored in the program data vector together then you can use: total=SUM(of field–field); Just make sure you remember the “of” and the double dashes or your code will run but you won’t get your intended results. Mean is another function where the function will calculate differently than the writing out the formula if you have missing values.There is a field containing a date. It needs to be displayed in the format “ddmmyy” if it’s before 1975, “dd mon ccyy” if it’s after 1985, and as ‘Disco Years’ if it’s between 1975 and 1985.
52. What Is The Difference Between Calculating The ‘mean’ Using The Mean Function And Proc Means?
By default Proc Means calculate the summary statistics like N, Mean, Std deviation, Minimum and maximum, Where as Mean function compute only the mean values.
Advance Your Skills with SAS Certification Course From Real-Time ExpertsWeekday / Weekend BatchesSee Batch Details
53. What Are Some Differences Between Proc Summary And Proc Means?
Proc means by default give you the output in the output window and you can stop this by the option NOPRINT and can take the output in the separate file by the statement OUTPUT OUT= , But, proc summary doesn’t give the default output, we have to explicitly give the output statement and then print the data by giving PRINT option to see the result.
54. Which Data Set Is The Controlling Data Set In The Merge Statement?
Dataset having the least number of observations controls the data set in the merge statement.
55. How Do The In= Variables Improve The Capability Of A Merge?
The IN=variablesWhat if you want to keep in the output data set of a merge only the matches (only those observations to which both input data sets contribute)? SAS will set up for you special temporary variables, called the “IN=” variables, so that you can do this and more. Here’s what you have to do: signal to SAS on the MERGE statement that you need the IN= variables for the input data set(s) use the IN= variables in the data step appropriately, So to keep only the matches in the match-merge above, ask for the IN= variables and use them:data three;merge one(in=x) two(in=y); /* x & y are your choices of names */by id; /* for the IN= variables for data */if x=1 and y=1; /* sets one and two respectively */run;
56. What Techniques And/or Procs Do You Use For Tables?
Proc Freq, Proc univariate, Proc Tabulate & Proc Report.
57. Do You Prefer Proc Report Or Proc Tabulate? Why?
I prefer to use Proc report until I have to create cross tabulation tables, because, It gives me so many options to modify the lookup of my table, (ex: Width option, by this we can change the width of each column in the table) Where as Proc tabulate unable to produce some of the things in my table. Ex: tabulate doesn’t produce n (%) in the desirable format.
58. What Is The Difference Between Nodup And Nodupkey Options?
NODUP compares all the variables in our dataset while NODUPKEY compares just the BY variables.
59. What Is The Main Difference Between Rename And Label?
1. Label is global and rename is local i.e., label statement can be used either in proc or data steps whereas rename should be used only in data steps.
2. If we rename a variable, the old name will be lost but if we label a variable its short name (old name) exists along with its descriptive name.
60. What Is an Enterprise Guide? What Is The Use Of It?
It is an approach to import text files with SAS (It comes free with Base SAS version 9.0).
61. What Are Input Dataset And Output Dataset Options?
Input data set options are obs, firstobs, where, in output data set options compress, reuse.Both input and output dataset options include keep, drop, rename, obs, first obs.
62. How Can You Create a Zero Observation Dataset?
Creating a data set by using the like clause.ex: proc sql;create table latha.emp like oracle.emp;quit;In this the like clause triggers the existing table structure to be copied to the new table. using this method results in the creation of an empty table.
In the editor window we write%include ‘path of the sas file’;run;if it is with a non-windowing environment no need to give a run statement.
63. How Can You Import .csv File In To Sas?
To create a CSV file, we have to open notepad, then declare the variables.
- proc import datafile=’E:age.csv’
- out=sarath dbms=csv replace;
64. What Is The Use Of Proc Sql?
PROC SQL is a powerful tool in SAS, which combines the functionality of data and proc steps. PROC SQL can sort, summarize, subset, join (merge), and concatenate datasets, create new variables, and print the results or create a new dataset all in one step! PROC SQL uses fewer resources when compared to that of data and proc steps. To join files in PROC SQL it does not require to sort the data prior to merging, which is must, is data merge.
65. What Is Sas Graph?
SAS/GRAPH software creates and delivers accurate, high-impact visuals that enable decision makers to gain a quick understanding of critical business issues.
66. Why Is A Stop Statement Needed For The Point=option On A Set Statement?
When you use the POINT= option, you must include a STOP statement to stop DATA step processing, programming logic that checks for an invalid value of the POINT= variable, or Both. Because POINT= reads only those observations that are specified in the DO statement, SAS cannot read an end-of-file indicator as it would if the file were being read sequentially. Because reading an end-of-file indicator ends a DATA step automatically, failure to substitute another means of ending the DATA step when you use POINT= can cause the DATA step to go into a continuous loop.
67. How to sort in descending order?
Use DESCENDING keyword in PROC SORT code. The example below shows the use of the descending keyword.
- PROC SORT DATA=auto; BY DESCENDING engine ; RUN ;
68. How to convert a numeric variable to a character variable?
You must create a differently-named variable using the PUT function.
The example below shows the use of the PUT function.
- charvar=put(numvar, 7.) ;
69. How to convert a character variable to a numeric variable?
You must create a differently-named variable using theINPUTfunction.
The example below shows the use of the INPUT function.
Single Dash :It is used to specify consecutively numbered variables. A1-A3 implies A1, A2 and A3.
Double-dash :It is used to specify variables based on the order of the variables as they appear in the file,regardless of the name of the variable. A1–A3 implies all the variables from A1 to A3 in the order they appear in the data set.
Example :The order of variables in a data set : ID Name A1 A2 C1 A3
So using A1-A3 would returnA1 A2 A3. A1–A3 would returnA1 A2 C1 A3.
70. Difference between PROC MEANS and PROC SUMMARY?
1. Proc MEANS by default produces printed output in the OUTPUT window whereas Proc SUMMARY does not. Inclusion of the PRINT option on the Proc SUMMARY statement will output results to the output window.
2. Omitting the var statement in PROC MEANS analyses all the numeric variables whereasOmitting the variable statement in PROC SUMMARY produces a simple count of observation.
71. Can PROC MEANS analyze ONLY the character variables?
No, Proc Means requires at least one numeric variable.
72. How does the SUBSTR function work?
The SUBSTR function is used to extract substring from a character variable.
The SUBSTR function has three arguments:
SUBSTR ( character variable, starting point to begin reading the variable, number of characters to read from the starting point)
There are two basic applications of the SUBSTR function:
RIGHT SIDE APPLICATION
- data _null_ ;
- phone='(312) 5
- phone='(312) 555-1212′ ;
- area_cd=substr(phone, 2, 3) ;
- put area_cd=;
Result : In the log window, it writes area_cd=312 .
LEFT SIDE APPLICATION
It is used to change just a few characters of a variable.
- data _null_ ;
- phone='(312) 555-1212′ ;
- substr(phone, 2, 3)=’773′ ;
- put phone=; run ;
Result : The variable PHONE has been changed from(312) 555-1212 to (773) 555-1212.
73. Difference between CEIL and FLOOR functions?
The ceil function returns the smallest integer greater than/equal to the argument whereas the floor returns the greatest integer less than/equal to the argument.
For example : ceil(4.4) returns 5 whereas floor(4.4) returns 4.
74. Difference between SET and MERGE?
SET concatenates the data sets where as MERGE matches the observations of the data sets.
75. How to do Matched Merge and output only consisting of observations from both files?
Use IN=variable in MERGE statements. It is used for matched merge to track and select which observations in the data set from the merge statement will go to a new data set.
- data reading;
- merge file1(in=infile1) file2(in=infile2);
- by id;
- if i file1=infile2;
76. How to do a Matched Merge and output consisting of observations in file1 but not in file2, or in file2 but not in file1?
- data reading;
- merge file1(in=infile1)file2(in=infile2);
- by id;
- if infile1 ne infile2;
77. How to do Matched Merge and output consisting of observations from only file1?
- merge file1(in=infile1)file2(in=infile2);
- by id;
- if infile1;
78. How do I create a data set with observations=100, mean 0 and standard deviation 1?
- data reading;
- do i=1 to 100;
- temp=0 + rannor(1) * 1;
- proc means data=readin mean stddev;
- var temp;
79. How to label values and use it in PROC FREQ?
Use PROC FORMAT to set up a format.
- proc format;
- value score 0 – 100=‘100-‘
- 101 – 200=‘101+’
- proc freq data=reading;
- tables outdata;
- format outdatascore. ;
80. How to use arrays to recode a set of variables?
Recode the set of questions: Q1,Q2,Q3…Q20 in the same way: if the variable has a value of 6 recode it to SAS missing.
- data reading;
- set outdata;
- array Q(20) Q1-Q20;
- do i=1 to 20;
- if Q(i)=6 then Q(i)=.;
81. How to use arrays to recode all the numeric variables?
Use _numeric_ and dim functions in the array.
- data reading;
- set outdata;
- array Q(*) _numeric_;
- do i=1 to dim(Q);
- if Q(i)=6 then Q(i)=.;
Note : DIM returns a total count of the number of elements in array dimension Q.
82. How to calculate mean for a variable by group?
Suppose Q1 is a numeric variable and Age a grouping variable. You wish to compute the mean for Q1 by Age.
- PROC MEANS DATA=READING;
- VAR Q1;
- CLASS AGE;
83. How to generate cross tabulation?
- Use PROC FREQ code.
- PROC FREQ DATA=auto;
- TABLES A*B ;
SAS will produce a table of A by B.
84. How to generate detailed summary statistics?
Use PROC UNIVARIATE code.
- PROC UNIVARIATE DATA=READING;
- CLASS Age;
- VAR Q1;
Note : Q1 is a numeric variable and Age a grouping variable.
85. How to count missing values for numeric variables?
Use PROC MEANS with NMISSoption.
86. How to count missing values for all variables?
- proc format;
- value $missfmt ‘ ‘=’Missing’ other=’Not Missing’;
- value missfmt .=’Missing’ other=’Not Missing’;
- proc freq data=one;
- format _CHAR_ $missfmt.;
- tables _CHAR_ / missing missprint nocum nopercent;
- format _NUMERIC_ missfmt.;
- tables _NUMERIC_ / missing missprint nocum nopercent;
87. Describe the ways in which you can create macro variables
There are 5 ways to create macro variables:
- Iterative %DO statement
- Call Symput
- Proc SQl into clause
- Macro Parameters.
88. Use of CALL SYMPUT
CALL SYMPUT puts the value from a dataset into a macro variable.
- proc means data=test;
- var x;
- output out=testmean mean=x bar;
- data _null_;
- set testmean;
- call symput(“xbarmac”,xbar);
- %put mean of x is & barmac;
89. What are SYMGET and SYMPUT?
SYMPUT puts the value from a dataset into a macro variable whereas
SYMGET gets the value from the macro variable to the dataset.
90. How to count the number of intervals between two given SAS dates?
INTCK(interval,start-of-period,end-of-period) is an interval function that counts the number of intervals between two given SAS dates, Time and/or datetime.
91. Difference between SCAN and SUBSTR?
SCAN extracts words within a value that is marked by delimiters. SUBSTR extracts a portion of the value by stating the specific location. It is best used when we know the exact position of the substring to extract from a character value.
92. The following data step executes:
- Data strings;
- Text1=“MICKEY MOUSE & DONALD DUCK”;
93. When grouping is in effect, can the WHERE clause be used in PROC SQL to subset data?
No. In order to subset data when grouping is in effect, the HAVING clause must be used. The variable specified in the clause must contain summary statistics.
94. How to use IF THEN ELSE in PROC SQL?
- PROC SQL;
- SELECT WEIGHT,
- WHEN WEIGHT BETWEEN 0 AND 50 THEN ’LOW’
- WHEN WEIGHT BETWEEN 51 AND 70 THEN ’MEDIUM’
- WHEN WEIGHT BETWEEN 71 AND 100 THEN ’HIGH’
- ELSE ’VERY HIGH’
- END AS NEW WEIGHT FROM HEALTH;
95. How to remove duplicates using PROC SQL?
- Proc SQL noprint;
- Create Table inter.Merged 1 as
- Select distinct * from inter.reading ;
100. How to count unique values by a grouping variable?
You can use PROC SQL with COUNT(DISTINCT variable_name) to determine the number of unique values for a column.
Are you looking training with Right Jobs?Contact Us
- MicroStrategy Tutorial
- Msbi Tutorial
- Apache Spark & Scala Tutorial
- Tableau Tutorial
- Advanced SAS Interview Questions and Answers
- E Learning Sample Resumes
- Apache Oozie Sample Resumes
- Business Objects Interview Questions and Answers
- Cassandra Interview Questions and Answers
- Sqoop Interview Questions and Answers