Demand Estimation by Using Regression Analysis

life priced,do not fight,peace no war



Demand Estimation by Using Regression Analysis

Regression Analysis a statistical method used to establish a relationship between a variable (Dependent Variable) and other factors that will affect it (Independent Variables).

This relationship can be expressed as a functional form:

Q = a0 + a1 A + a2 B + a3 C

Demand Estimation for a product or service using regression analysis is important in the business world especially to the corporate executives and managers because it will enable them to make reasonable forecast for their goods and services in the near future. The manager can narrow down those factors that are important in influencing their sales and thereby formulate appropriate strategies or policies to achieve their management objectives.

The actual process of Regression Analysis can be very complex but it can be summarized into FOUR important steps:
  1. Model Specification: Set the objective and identify the important variables which have influence on the dependent variable.
  2. Data collected for all the variables specified.
  3. Choice of a function form
e.g. Linear or non-linear form
  1. Estimation and interpretation of results.

1. Model Specification
If we want to study the factors affecting the demand for automobiles (Qx) in the country, we must identify the most important variables that are believed to affect the demand for automobiles
e.g.      a)         Price of the automobile           (Px)
b)         Per capita income                    (Yc)
c)         No. of working population     (L)
d)         Rate of interest, etc                 (I)
Qx = f(Px, Yc, L, I,…..)


2.                  Data collection on the variables.
2 types of data :
            a)  Time Series Data
                  Data is collected for each variable over time (yearly, quarterly, monthly or daily, etc)

b) Cross-Sectional Data
Data are collected for same time period but from different section or geographical area of the society.

Types of data to be used depend on the availability of data.
a)      Primary data – Data collected from the field through market survey, sampling, & etc.

b)      Secondary data – These are published data by relevant authority such as Statistical Department, Economic Reports, etc.

3.                  Specifying the form of Equation.
i)        The simplest model to deal with and the one which is often also the most realistic is the linear model.

e.g.   Qx = a0 +  a1 Px +  a2 Y +  a3 L + a4 I + ……..+ e
a0,a1,….,a4 are parameters (coefficients) to be estimated
e = disturbance term or error term

ii)      Non- Linear model
      Sometimes a non-linear form may be the data better than a linear equation.

Qx = a0 Pxα1.Yc α2. L α3. I α4          (Power Function)

4.                  Testing the (Econometric) Result
            To evaluate the regression results several statistics are examined.
a)      The sign of each estimated coefficient must be checked to see if it conforms to what is expected on the theoretical grounds.
b)      Coefficient of Determination, R2
c)      t – tests (coefficient)
d)     Durbin-Watson statistics, etc.
e)      The F-statistics (F-stats)

Note : The statistical procedure in solving Multiple Regression Problems can be very complicated. Fortunately there are many computer software’s available to achieve our objective.
            i.e TSP (Time-Series Processor) or SPSS can be used to solve our problems.
REGRESSION ANALYSIS

It describes the way in which one variable is related to another. Regression analysis derives an equation that can be used to estimate the unknown values of one variable on the basis of known values of another variable.

(a) Simple Regression Analysis

                  Y  = a + bX     where Y is sales volume & X is advertising expenditure

Example 1

(Taken from ECO556 Manual Table 4.1, page 136 )

Year
Sales (Y)
(million dollars)
Advertising Expenditure (X)
(million dollars)
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
44
58
48
46
42
60
52
54
56
40
10
13
11
12
11
15
12
13
14
9

The result from computer print out :

LS// Dependent variable is SAL
SMPL range 1986 - 1995
Number of observation 10
Variable
Coefficient
Std. Error
T-Stat
2-Tail Sig.





C
ADV
7.6000000
3.5333333
6.332345
0.5222813
1.2001912
6.751919
0.264
0.000

R-squared                   0.851212                Mean of dependent var     50.00000
Adjusted R-squared   0.832614                 S.D of dependent var        6.992059
S.E. of regression       2.860653                 Sum of squared resid        65.46667
Durbin-Watson stat   1.224915                 F-statistic                           45.76782
Log likelihood          -23.58417               

^    ^    ^
Y   =    a    +     bX                 

^    ^    ^
=>        Y  =  7.6   +    3.53X


(b) Multiple Regression Analysis

                  Y  = a1 + b 1 X 1 + b 2 X 2    

where Y is sales volume                         , a1 is the intercept
X 1 is advertising expenditure      , b1 is the Y/X1, marginal effect of adv on sales
X 2 is price of the product            , b2 is the Y/X2, marginal effect of price on sales

 

Example 2

(Taken from ECO556 Manual Table 4.3, page 141 )

Year
Sales (Y)
(million dollars)
Advertising Expenditure (X1)
(million dollars)
Price
(X2)
(million dollars)
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
44
58
48
46
42
60
52
54
56
40
10
13
11
12
11
15
12
13
14
9
1
1.2
2
1.8
2.1
0.8
1.4
2.0
1.5
1.0

The result from computer print out :

LS// Dependent variable is SAL
SMPL range 1986 - 1995
Number of observation 10
Variable
Coefficient
Std. Error
T-Stat
2-Tail Sig.





C
ADV
P
11.60403
3.4936051
-2.3836921
6.9633945
0.5078770
1.9495316
1.6665152
6.8788413
-1.2226999
0.140
0.000
0.261

R-squared                   0.877397                Mean of dependent var     50.00000
Adjusted R-squared   0.842367                S.D of dependent var         6.992059
S.E. of regression       2.776058                Sum of squared resid         53.94549
Durbin-Watson stat    1.41                        F-statistic                           25.04734 

^     ^     ^      ^
Y   =       a1    +     b1X1    +   b2X2

^     ^     ^      ^
=>        Y  =   11.60  +    3.49X1  -  2.38X2      


Evaluation of Results (Computer Printouts)

These are the importance statistical results should be interpreted:

  1. The sign of each estimated coefficient
  2. Coefficient of determination (R2)
  3. Standard error of estimate (Se)
  4. The t-statistics (t-stats)
  5. The F-statistics (F-stats)

Interpretation :
a.         The sign of each estimated coefficient must be checked to see if it conforms to what is expected on the theoretical grounds.
                                               ^        ^            ^
            From Example 1:       Y  =  7.6   +    3.53X

            The estimated function show positive value (+ 3.53) , so it conforms to the expected economic theory. If we spend $1 on Advertisement (X)  then the Sales(Y) will increase by 3.53 units.

b.         Coefficient of determination (R2)
            The value of R2 ranges from ‘0’ to ‘1’

R2  = ‘0’           (it shows that none of  the independent variables explain the changes in the dependent variable)
R2  = ‘1’           (it shows that all the changes in the dependent variable is explained by the   variation in the independent variables)
R2  = ‘0.85’      (it shows that 85% of the changes in the dependent variables is explained by the variation in the independent variables, advertising expenditure. The other 15% cannot be explaine by the regression analysis. This may be due to the omission of some important independent variables.)




c.         Standard error of estimate (Se)
It is a measure of dispersion of data points from the line of best fit (regression line). Actual points do not lie on the regression line but are dispersed above and below the line. Thus, the value predicted by regression line will be subjected to error.  Therefore, the Se measures the probable error in the predicted value.

For example,   data from table 4.1, when the advertising expenditure is $9 the sales is $40. If we use the regression results, the sales is $39.37.  Therefore the value predicted will have an error.
The std. error of estimation can be calculated by using the following formula:


 
                        n            ٨
Se    =              Σ  (Y t – Y) 2  
                        t=1                
                              n - k

Se is useful to estimate the range within which the dependent variable will lie at a specified probability. At 95% probability the dependent variable will lie in the predicted interval of :
            ٨
            Y  +    t n – k * Se

٨
      Where  Y   is the predicted value of dependent value based on the regression,
n – k   is the degree of freedom (df), it is used to get the critical value for students’ distribution, n is the number of observation and k is the number of coefficient estimated.






Example : 
Se = 2.8    At 95% confidence interval of sales when Adv. Exp. (X) = 9 and          ٨                                  ٨
            Y  = 39.37  then         Y  +    t n – k * Se
 =>  39.37 + (2.306)(2.8)
ð  39.37 +  6.457
Thus, at 95% C.I. when adv. Exp. Is $9 million, the range of Sales from $32.913 to $45.827 million

d.    T-Statistics
The t-statistics is used in t – test to determine if there is a significant relationship between the dependent and each of the independent variable.  To do this test, we
need the std. error of coefficient (Sb) and calculate the ‘t’ value. Then we compare the calculated ‘t’ value and the critical ‘t’ value from the student ‘t’ distribution table.

The ‘t’ value is calculated by dividing the value of coefficient (b) by Sb :
                                ٨
Calculated  t  =  b
                                ٨
                                Sb

i.e  :   Calculated t  =  3.53  =  6.79
                                    0.52
To calculate the critical value from student ‘t’ distribution table:
            n – k = 10 – 2 = 8 df at 95% C.I and the ‘t critical ‘ =  2.306
Since  t computed ( 6.79) > t critical (2.306) then adv.exp. is statistically significant in explaining the variations in sales at 95% C.I.
Note: if there is more than one independent variable then you have to test significance for all the independent vars.








e.  Durbin-Watson Statistics

     It indicates that whether the presence or absence of auto correlation means the problem that can arise in regression analysis with time series data.

There are 3 possibilities where autocorrelation or multi-co linearity problem can arise:
·        When independent variables are interrelated or duplicated
·        Where independent variables have been miss- specified
·        Where important independent variables are found missing.


f.  F-statistics
   It is another test of overall explanatory power of regression analysis. (Refer pg 147 manual)


----end of short notes on demand estimation----
 

Tiada ulasan:

Catat Ulasan