You can use PROC COMPARE in SAS to quickly identify the similarities and differences between two datasets.
This procedure uses the following basic syntax:
proc compare base=data1 compare=data2; run;
The following example shows how to use this procedure in practice.
Example: Using Proc Compare in SAS
Suppose we have the following two datasets in SAS:
/*create datasets*/
data data1;
input team $ points rebounds;
datalines;
A 25 10
B 18 4
C 18 7
D 24 12
E 27 11
;
run;
data data2;
input team $ points;
datalines;
A 25
B 18
F 27
G 21
H 20
;
run;
/*view datasets*/
proc print data=data1;
proc print data=data2;
We can use the following PROC COMPARE statement to find the similarities and differences between the two datasets:
/*compare the two datasets*/
proc compare
base=data1
compare=data2;
run;
This will produce three tables in the output:
Table 1: A Summary of Both Tables
The first table shows a brief summary of each dataset, including:
1. The number of variables (NVar) and observations (NObs) in each dataset.
- Data1 has 3 variables and 5 observations
- Data2 has 2 variables and 5 observations
2. The number of variables in common between the two datasets.
- Data1 and Data2 have 2 variables in common (team and points)
Table 2: A Summary of the Number of Differences in Values
The second table summarizes the number of differences in values between the two tables.
The most interesting part of this output is located at the end of the table where we can see a summary of differences between the variables:
- The team variable has 3 observations with different values.
- The points variables has 3 observations with different values. The max difference is 9.
Table 3: The Actual Differences Between Observations
The third table shows the actual differences between the observations in the two datasets.
The first table shows the differences in the team variable between the two datasets.
- For example, in data1 the third observation has a value of C for team while in data2 the third observation has a value of F.
The second table shows the differences in the points variables between the two datasets.
- For example, in data1 the third observation has a value of 18 for points while in data2 the third observation has a value of 27. The difference between the two values is 9.
These three tables give us a complete understanding of the differences between the two datasets.
Note that if you only want to compare the differences between the two datasets for one specific variable, you can use the following syntax:
/*compare the differences between the datasets only for 'points' variable*/
proc compare
base=data1
compare=data2;
var points;
run;
This will produce the same three tables as earlier, but only the output for the points variable will be shown.
Note: You can find the complete documentation for PROC COMPARE here.
Additional Resources
The following tutorials explain how to perform other common tasks in SAS:
How to Use Proc Summary in SAS
How to Use Proc Tabulate in SAS
How to Use Proc Rank in SAS