1) (15pts) Open the data2.dta file, after changing to the right directory (use the
command “cd”), and describe the variables included by using the commands:
a. Describe
b. Codebook
2) (15pts) Choose two variables which attracted your attention and which can be
related one to each other (in other words with a meaningful economic
relationship) and motivate your choice. One of this two variables MUST be
categorical (e.g. male and female; marital status; east or west location; etc.) and
one MUST be continuous (e.g. income, earnings). Summarize these two
variables using the appropriate command(s). Please use a combination of two
variables which has not been analysed during the tutorials, in other words a
replication of any exercise performed during the tutorials will not be
3) (40pts) For the continuous variable:
a. Show the histogram and show the kernel density
b. Perform the normality test on the variable as well as on the Logarithm of
the variable. Are they normally distributed? Both? If not, are they left or
right skewed?
c. Transform the continuous variable into a categorical variable with a
maximum of 10 categories (you can choose also less than 10). Show the
histogram and the table of the new categorical variable (in each row you
will have the mean of each category). For example if your continuous
variable is Wage you could call the new categorical one “Wage_Cat”
4) (4pts) Cross tab the frequencies (with row, column and cell %) of the categorical
variable you have just created (e.g. Wage_Cat) vis-a-vis the initial categorical
variable you have selected at the beginning.
5) (4pts) Cross tab the mean value of each category of the variable you have just
created (e.g. Wage_Cat) vis-a-vis the initial categorical variable you have
selected at the beginning.
6) (12pts) How would you find the conditional probabilities and marginal
probabilities of the two categorical variables by looking at your results in point
(4)? Explain in details by referring to the concepts of joint probability and
conditional probabilities with numerical examples from the table.
7) (10pts) Now take two new continuous variables from the database and compute
their covariance and their correlation. Show their scatter plot: is the correlation
positive, null or negative? Is it high or low? Why?