life priced,do not fight,peace no war
Read more ...
Demand Estimation by Using Regression
Analysis
Regression
Analysis a statistical method used to establish a relationship between a
variable (Dependent Variable) and other factors that will affect it
(Independent Variables).
This relationship can be expressed
as a functional form:
Q = a0 + a1 A + a2 B + a3 C
Demand
Estimation for a product or service using regression analysis is important in
the business world especially to the corporate executives and managers because
it will enable them to make reasonable forecast for their goods and services in
the near future. The manager can narrow down those factors that are important
in influencing their sales and thereby formulate appropriate strategies or
policies to achieve their management objectives.
The actual
process of Regression Analysis can be very complex but it can be summarized
into FOUR important steps:
- Model Specification: Set the objective and identify the important variables which have influence on the dependent variable.
- Data collected for all the variables specified.
- Choice of a function form
e.g.
Linear or non-linear form
- Estimation and interpretation of results.
1. Model Specification
If we want to
study the factors affecting the demand for automobiles (Qx) in the country, we
must identify the most important variables that are believed to affect the
demand for automobiles
e.g. a) Price of the automobile (Px)
b) Per capita income (Yc)
c) No. of working population (L)
d) Rate of interest, etc (I)
Qx
= f(Px, Yc, L, I,…..)
2.
Data
collection on the variables.
2
types of data :
a)
Time Series Data
Data is
collected for each variable over time (yearly, quarterly, monthly or daily,
etc)
b) Cross-Sectional Data
Data are
collected for same time period but from different section or geographical area
of the society.
Types of data to
be used depend on the availability of data.
a)
Primary data
– Data collected from the field through market survey, sampling, & etc.
b)
Secondary data
– These are published data by relevant authority such as Statistical Department,
Economic Reports, etc.
3.
Specifying
the form of Equation.
i)
The simplest model to deal with and the one which is
often also the most realistic is the linear model.
e.g.
Qx = a0 + a1
Px + a2 Y + a3 L + a4 I + ……..+ e
a0,a1,….,a4
are parameters (coefficients) to be estimated
e = disturbance
term or error term
ii)
Non- Linear model
Sometimes a non-linear form may be the
data better than a linear equation.
Qx = a0
Pxα1.Yc α2. L α3. I α4 (Power Function)
4.
Testing the
(Econometric) Result
To
evaluate the regression results several statistics are examined.
a)
The sign of each estimated coefficient must be checked
to see if it conforms to what is expected on the theoretical grounds.
b)
Coefficient of Determination, R2
c)
t – tests (coefficient)
d)
Durbin-Watson statistics, etc.
e)
The F-statistics (F-stats)
Note : The statistical procedure
in solving Multiple Regression Problems can be very complicated. Fortunately
there are many computer software’s available to achieve our objective.
i.e TSP (Time-Series Processor) or
SPSS can be used to solve our problems.
REGRESSION ANALYSIS
REGRESSION ANALYSIS
It describes the way in which one variable is related to
another. Regression analysis derives an equation that can be used to estimate
the unknown values of one variable on the basis of known values of another
variable.
(a) Simple Regression Analysis
Y
= a + bX where Y is sales
volume & X is advertising expenditure
Example 1
(Taken from ECO556
Manual Table 4.1, page 136 )
Year
|
Sales (Y)
(million dollars)
|
Advertising
Expenditure (X)
(million dollars)
|
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
|
44
58
48
46
42
60
52
54
56
40
|
10
13
11
12
11
15
12
13
14
9
|
The result from computer print out :
LS// Dependent variable is SAL
SMPL range 1986 - 1995
Number of observation 10
|
||||
Variable
|
Coefficient
|
Std. Error
|
T-Stat
|
2-Tail Sig.
|
C
ADV
|
7.6000000
3.5333333
|
6.332345
0.5222813
|
1.2001912
6.751919
|
0.264
0.000
|
R-squared 0.851212 Mean of dependent var 50.00000
Adjusted R-squared
0.832614 S.D of
dependent var 6.992059
S.E. of regression
2.860653 Sum of
squared resid 65.46667
Durbin-Watson stat
1.224915
F-statistic
45.76782
Log likelihood
-23.58417
|
^ ^
^
Y
= a
+ bX
^ ^ ^
=> Y = 7.6 + 3.53X
(b) Multiple Regression Analysis
Y = a1
+ b 1 X 1 + b 2 X 2
where Y is sales
volume , a1 is the intercept
X 1 is advertising
expenditure , b1 is the Y/X1,
marginal effect of adv on sales
X 2 is price of the
product , b2 is the Y/X2,
marginal effect of price on sales
Example 2
(Taken from ECO556
Manual Table 4.3, page 141 )
Year
|
Sales (Y)
(million dollars)
|
Advertising
Expenditure (X1)
(million dollars)
|
Price
(X2)
(million dollars)
|
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
|
44
58
48
46
42
60
52
54
56
40
|
10
13
11
12
11
15
12
13
14
9
|
1
1.2
2
1.8
2.1
0.8
1.4
2.0
1.5
1.0
|
The result from computer print out :
LS// Dependent variable is SAL
SMPL range 1986 - 1995
Number of observation 10
|
||||
Variable
|
Coefficient
|
Std. Error
|
T-Stat
|
2-Tail Sig.
|
C
ADV
P
|
11.60403
3.4936051
-2.3836921
|
6.9633945
0.5078770
1.9495316
|
1.6665152
6.8788413
-1.2226999
|
0.140
0.000
0.261
|
R-squared
0.877397 Mean of
dependent var 50.00000
Adjusted R-squared
0.842367 S.D of
dependent var 6.992059
S.E. of regression
2.776058 Sum of
squared resid 53.94549
Durbin-Watson stat
1.41 F-statistic 25.04734
|
^ ^
^ ^
Y
= a1 +
b1X1 + b2X2
^ ^
^ ^
=> Y = 11.60
+ 3.49X1 - 2.38X2
Evaluation of Results
(Computer Printouts)
These are the importance statistical results should be interpreted:
- The sign of each estimated coefficient
- Coefficient of determination (R2)
- Standard error of estimate (Se)
- The t-statistics (t-stats)
- The F-statistics (F-stats)
Interpretation :
a. The sign of each estimated
coefficient must be checked to see if it conforms to what is expected on the
theoretical grounds.
^ ^ ^
From
Example 1: Y =
7.6 + 3.53X
The estimated function show positive
value (+ 3.53) , so it conforms to the expected economic theory. If we spend $1
on Advertisement (X) then the Sales(Y)
will increase by 3.53 units.
b. Coefficient
of determination (R2)
The
value of R2 ranges from ‘0’ to ‘1’
R2 = ‘0’ (it
shows that none of the independent
variables explain the changes in the dependent variable)
R2 = ‘1’ (it shows that all the changes in the
dependent variable is explained by the
variation in the independent variables)
R2 = ‘0.85’ (it shows that 85% of the changes in the
dependent variables is explained by the variation in the independent variables,
advertising expenditure. The other 15% cannot be explaine by the regression
analysis. This may be due to the omission of some important independent
variables.)
c.
Standard error of estimate
(Se)
It is a measure of dispersion of data points from
the line of best fit (regression line). Actual points do not lie on the
regression line but are dispersed above and below the line. Thus, the value
predicted by regression line will be subjected to error. Therefore, the Se measures the probable error
in the predicted value.
For example,
data from table 4.1, when the advertising expenditure is $9 the sales is
$40. If we use the regression results, the sales is $39.37. Therefore the value predicted will have an
error.
The std. error of estimation can be calculated by
using the following formula:
n
٨
Se = Σ (Y t – Y) 2
t=1
n - k
Se is useful to estimate the range within which the
dependent variable will lie at a specified probability. At 95% probability the
dependent variable will lie in the predicted interval of :
٨
Y +
t n – k * Se
٨
Where Y is the predicted value of dependent value
based on the regression,
n – k is
the degree of freedom (df), it is used to get the critical value for students’
distribution, n is the number of observation and k is the number of coefficient
estimated.
Example :
Se = 2.8 At 95% confidence interval of sales when
Adv. Exp. (X) = 9 and ٨ ٨
Y
= 39.37 then Y
+ t n – k *
Se
=>
39.37 + (2.306)(2.8)
ð
39.37 +
6.457
Thus, at 95% C.I. when
adv. Exp. Is $9 million, the range of Sales from $32.913 to $45.827 million
d. T-Statistics
The t-statistics is used
in t – test to determine if there is a significant relationship between the
dependent and each of the independent variable.
To do this test, we
need the std. error of
coefficient (Sb) and calculate the ‘t’ value. Then we compare the calculated
‘t’ value and the critical ‘t’ value from the student ‘t’ distribution table.
The ‘t’ value is
calculated by dividing the value of coefficient (b) by Sb :
٨
Calculated t
= b
٨
Sb
i.e :
Calculated t = 3.53
= 6.79
0.52
To calculate the critical
value from student ‘t’ distribution table:
n – k = 10 – 2 = 8 df at 95% C.I and the ‘t critical ‘
= 2.306
Since t computed ( 6.79) > t critical (2.306)
then adv.exp. is statistically significant in explaining the variations in
sales at 95% C.I.
Note: if there is more than
one independent variable then you have to test significance for all the
independent vars.
e. Durbin-Watson Statistics
It indicates that whether the presence or
absence of auto correlation means the problem that can arise in regression
analysis with time series data.
There are 3
possibilities where autocorrelation or multi-co linearity problem can arise:
·
When independent
variables are interrelated or duplicated
·
Where independent
variables have been miss- specified
·
Where important
independent variables are found missing.
f. F-statistics
It is another test of overall explanatory
power of regression analysis. (Refer pg 147 manual)
----end of short notes on
demand estimation----