Maximum Entropy Autoregressive
Conditional Heteroskedasticity Model
â
Anil K. Bera
Department of Economics, University of Illinois
1206 S. Sixth, Champaign, IL, 61820
abera@uiuc.edu
and
Sung Yong Park
Department of Economics, University of Illinois
1206 S. Sixth, Champaign, IL, 61820
sungpark@uiuc.edu
[ Preliminary paper ]
â
We thank the participants of The Conference on Recent Developments in the Theory,
Method, and Application of Information and Entropy Econometrics at American Univer-
sity, Sep. 19-21, 2003 and The 13th Annual Meeting of Midwest Econometrics Group at
University of Mizzouri, Oct. 17-18,2003, for helpful comments and discussions. Corre-
spondence to: Anil K. Bera, Department of Economics, University of Illinois, 1206 S. 6th
street, Champaign, IL 61820, U.S.A.; email: anil@fisher.econ.uiuc.edu
Abstract
The maximum entropy (ME) approach is based on the efficient
use of available information. As is well known, many of the standard
families of distributions can be derived from the ME principle. One
purpose of this paper is to show how can we extract information func-
tional from the data; for example, if information , which is in the form
of estimating equation, is given as arithmetic moment conditions, the
resulting density has a very attractive form. We propose a criteria for
selecting side conditions in ME problems. We apply the criteria for for-
mulating autoregressive conditional heteroskedasticity (ARCH) model.
The procedure is illustrated with an application to the leptokurtic and
negatively skewed financial data.
Keywords
: Maximum entropy density, Semiparametric density, Den-
sity estimation, ARCH, GARCH.
1
Introduction
The second law of thermodynamics states that since there is an inherent
tendency for disorder to increase, the entropy of the physical universe in-
creased constantly. Jaynes (1979) pointed out that a particular measure
can be defined in the space of probability distribution such that the distri-
bution of higher entropy represent greater disorder, more smoother, or more
probable. If there is no prior information, uniform distribution maximizes
the entropy. Thus entropy measure is equivalent to minimize certain dis-
tance measure between uniform distribution and subject distribution. This
kind of measure or pseudo-distance function had been provided in statistics
literature [ Kullback and Leibler (1961), Renyi (1960), Cressie and Read
(1984) among others] If one maximizes this measure subject to certain con-
straints which possibly contain some unknown parameters, then one can
achieve very probable distribution and consistent estimates of parameters.
Generally, this approach is called information theoretic approach.
In this paper we use maximum entropy density as conditional density
function in ARCH type model. Maximum entropy density (MED) can be
obtained by maximizing Shannonâs (1948) entropy measure subject to known
moment conditions. It is well known that MED is the least biased distribu-
tion given known constraints. By choosing different sequences of side con-
straint functional, we have various flexible MED functions. We show some
useful characterization of MED in continuous case. Although MED has such
a flexible functional, there is a few research in economics and econometrics
literature. Since there are no possible analytic solution when the moment
constraints are more than two, we should solve the problem by numeri-
cal iterative nonlinear optimization. There have been valuable research in
applying MED in the economics literatures such as Zellner and Highfield
(1988), Buchen and Kelly (1996), Ormoneit and White (1999), Rockinger
and Jondeau (2002) and Wu (2003).
Since Engleâs (1982) pioneering work and its generalization by Bollerslev
(1986), ARCH type model has been widely used to explain the behavior of fi-
nancial time series. There have been numerous studies to extend ARCH type
model to a more general model in two ways. First extension has been con-
centrated on generalizing nonlinear conditional variance function. Geweke
(1986) and Milhøj (1987) suggested logarithmic ARCH model to eliminate
non-negative parameter restriction on conditional variance function. Nelson
(1991) proposed exponential GARCH (EGARCH) model to explain âlever-
age effectâ. Higgins and Bera (1992) proposed a nonlinear ARCH model
(NARCH) and they show that logarithmic ARCH model is a special case
1
of NARCH. Second extension deals with the form of conditional density.
Various ARCH-type model with non-normal conditional density function
have been proposed to explain leptokurtic behavior of unconditional den-
sity function. Bollerslev (1987) used Student-t distribution. However, Hsieh
(1989) found that GARCH (1,1) model with Student-t distribution could
not capture the excess kurtosis daily returns in British pound and Japanese
yen. Nelson (1991) employed generalized error distribution (GED) distri-
bution with EGARCH model but found the estimated GED did not have
enough thick tail behavior to capture unconditional leptokurtosis. Condi-
tional skewed densities also had been used to explain the skewness of uncon-
ditional density. Lee and Tse (1991) used a distribution based on the first
three terms of the Gram-Charlier series. Engle and Gonz´alez-Rivera (1991)
adapted nonparametric conditional density and Premaratne and Bera (2000)
used Pearson type-IV distribution as conditional density function. If we im-
pose certain moment side conditions, we could attain normal, student-t,
GED and Pearson type-IV distribution by maximum entropy formalism. In
this sense, proposed MEARCH model is very general one. Rockinger and
Jondeau (2002) applied MED to GARCH model. Since they considered first
fourth arithmetic moment conditions, the resulting MED has the form of
quartic exponential distribution. However, quartic exponential distribution
could not sufficiently explain heavy tail behavior such as Student-t distri-
bution.
MEARCH model is quite related to other moment based estimation such
as generalized mothod of moments (GMM) and maximum empirical likeli-
hood (MEL) estimation. All these estimations might be considered within
estimating function (EF) approach.
The purpose of this paper is twofold. First, we present the characteriza-
tion of continuous MED and show how the moment equations capture be-
havior of skewed and leptokurtic financial time series in ARCH-type model.
Second, we introduce estimation procedure of MEARCH model and suggest
moment selection criteria based on Rao-Score test.
The rest of the paper is organized as follows. In Section 2, summary
of recent information-theoretic approach is provided. In Section 2, some
basic characteristics of MED is presented with characterization problem and
numerical procedure of MED. In Section 4, we proposed ARCH-type model
under the ME conditional density with estimation and moment selecting
test. Section 5 shows simple empirical applications to the daily return of
NYSE with specific moment equations which generate skewed and heavy
tail distribution. Conclude remark follows in Section 6.
2
2
Information Theoretic Approaches
Consider estimator based on minimization of Cressie-Read power divergence
(CRPD) measure (Cressie and Read (1984)). For a fixed scalar parameter
Îť
, let us define
I
Îť
(
p, q
) =
1
Îť
(
Îť
+ 1)
n
X
i
=1
p
i
"Âľ
p
i
q
i
Âś
Îť
â
1
#
,
where
p
= (
p
1
, p
2
,
¡ ¡ ¡
, p
n
)
0
and
q
= (
q
1
, q
2
,
¡ ¡ ¡
, q
n
)
0
. We refer to the first
distribution in the argument list,
p
, and the second list,
q
, as subject dis-
tribution and reference distribution respectively. Cressie and Read (1984)
have used
I
Îť
(
p, q
) as a measure of the discrepancy between
p
and
q
.
I
Îť
(
p, q
)
provide very rich class of divergence measures. If we choose
p
=
n
â
1
1
where
1
is
N
Ă
1 vector of 1 and
q
=
Ď
, as
Îť
â
0, the limit of
I
Îť
(
p, q
) can be shown
by a LâHospitalâs rule
lim
Îť
â
0
I
Îť
(
n
â
1
1
, Ď
) =
KL
(
n
â
1
1
, Ď
) =
â
n
â
1
n
X
i
=1
ln(
Ď
i
)
â
ln(
n
)
,
which is the negative log empirical likelihood (EL) objective function except
for an additive constant where KL is Kullback-Leibler information. [see
Owen (2001) p.35, Imbens (1993) and Qin and Lawless (1994) for MEL
estimation] As
Îť
â â
1, we have
lim
Îť
ââ
1
I
Îť
(
n
â
1
1
, Ď
) =
KL
(
Ď, n
â
1
1) =
n
X
i
=1
Ď
i
ln
Ď
i
+ ln(
n
)
,
which is same as negative maximum entropy empirical likelihood (MEEL)
[Mittelhammer, Judge and Miller (2000)] or exponential tilting estimation
(ET) [Golan and Judge (1996), Kitamura and Stutzer (1997), Imbens, Spady
and Johnson (1998), and Choi, Hall and Presnell (2000)] objective function
except for a constant. If
Îť
=
â
2,
I
Îť
=
â
1
(
n
â
1
1
, Ď
) yields an objective function
of maximum log Euclidean likelihood (MLEL) which is quite related with
the generalized method of moment (GMM) estimator proposed by Hansen,
Heaton and Yaron (1996)
I
Îť
=
â
2
(
n
â
1
1
, Ď
) =
1
2
n
X
i
=1
1
n
ÂŁ
n
2
Ď
2
i
â
1
¤
,
We consider the estimator which minimizes the CRPD measure with re-
spect to the empirical distribution and satisfies, the moment equations,
3
E
[
Ď
j
(
x, θ
)] = 0 for
j
= 1
,
2
,
¡ ¡ ¡
N
and the normalization constraint
P
n
i
=1
Ď
i
=
1 where
θ
â
Î, i.e, we solve the optimization problem.
min
Ď,θ
lim
Îť
â
c
I
Îť
(
n
â
1
1
, Ď
)
s.t
n
X
i
=1
Ď
i
Ď
j
(
x, θ
) = 0
,
n
X
i
=1
Ď
i
= 1
,
j
= 1
,
2
,
¡ ¡ ¡
, N
Objective function in minimization problem depends on constant
c
.
In over-identified case the role of
Ď
is similar to weight matrix â
â
1
=
[
E
(
Ď
(
Z, θ
)
¡
Ď
(
Z, θ
)
0
)]
â
1
in GMM estimator in the sense that
Ď
and â
â
1
give efficient weight to estimate
θ
. However, since on does not have infor-
mation about â
â
1
, Ë
â
â
1
should be estimated by inefficient estimates of Ë
θ
by using arbitrary weight matrix in 2-step GMM estimator. Moreover, Ë
Ď
is
chosen in a way that minimize CRPD measure. Thus, Ë
Ď
-weighted estimate
would be more efficient in finite sample. If the side conditions are accu-
rate, MEL estimator is equivalent to 2-step GMM estimator up to order
O
p
(
N
â
1
/
2
) and the limiting distributions of MEL and MEEL estimates for
θ
are identical under certain regularity conditions [see Imbens (1997)].
In the preceding section we focus on the problem in which we minimize the
CRPD measure in the case of
Îť
â â
1, MEEL or ET estimation case under
the known side conditions. The solution of this problem is, well-known
in information theory literatures, maximum entropy distribution (MED)
which has exponential tilting parameters, the Lagrange multipliers in our
constraint optimization problem.
3
Maximum entropy density
The maximum entropy density is obtained by maximizing Shannonâs (1948)
entropy measure which is continuous version of negative lim
Îť
ââ
1
I
Îť
(
n
â
1
1
, Ď
)
except for a constant, ln(
n
),
max
f
H
(
f
) =
â
Z
f
(
x
) log
f
(
x
)
dx
(3.1)
satisfying
Z
Ď
j
(
x
)
f
(
x
)
dx
=
Âľ
j
,
j
= 0
,
1
,
¡ ¡ ¡
, N,
with the
Âľ
j
having known values
.
Problem in (3.1) turns out a mathematical optimization problem subject
to given side conditions and can be represented by following Lagrangian,
4
L
=
â
Z
f
(
x
) log
f
(
x
)
dx
+
N
X
j
=0
Îť
j
¡Z
Ď
j
(
x
)
f
(
x
)
dx
â
Âľ
j
¸
,
(3.2)
The solution of the problem (3.2) can be achieved by simple calculus of
variation,
f
(
x
) = exp


â
N
X
j
=0
Îť
j
Ď
j
(
x
)


,
(3.3)
where
Îť
0
is calculated by normalization constraint
R
f
(
x
)
dx
= 1. Thus,
Îť
0
can be expressed in terms of the remaining Lagrangian multipliers,
Îť
0
= log


Z
exp


â
N
X
j
=1
Îť
j
Ď
j
(
x
)


dx


.
(3.4)
Let âŚ(
Îť
) = exp
{
Îť
0
}
, where
Îť
stands for vector of (
Îť
1
, Îť
2
,
¡ ¡ ¡
, Îť
N
)
0
.
Equation (3.3) with equation (3.4) can be written as
f
(
x
) =
1
âŚ(
Îť
)
exp


â
N
X
j
=1
Îť
j
Ď
j
(
x
)


.
(3.5)
âŚ(
¡
) is known as the âpartition functionâ that converts the relative proba-
bilities to absolute probabilities. [see Golan, Judge and Miller (1996), p23.]
One can define a potential Î(
Îť
1
, Îť
2
,
¡ ¡ ¡
, Îť
N
) through the Legendre trans-
formation,
Î(
Îť
1
, Îť
2
,
¡ ¡ ¡
, Îť
N
) = ln âŚ(
Îť
) +
N
X
j
=1
Îť
j
Âľ
j
,
(3.6)
The Lagrangian multipliers,
Îť
, is determined by the
N
simultaneous equa-
tions,
â
Î
âÎť
j
= 0
â â
â
ln âŚ
âÎť
j
=
Âľ
j
j
= 1
,
¡ ¡ ¡
, N.
We can see the uniqueness of ME solution, if it exists, is provided by con-
vexity of Î(
¡
).
5
â
2
Î
âÎť
j
Îť
k
= ÎŁ
[
j,k
]
=
Z
Ď
j
(
x
)
Ď
k
(
x
) exp


â
N
X
j
=0
Îť
j
Ď
j
(
x
)


dx
â
Z
Ď
j
(
x
) exp


â
N
X
j
=0
Îť
j
Ď
j
(
x
)


dx
¡
Z
Ď
k
(
x
) exp


â
N
X
j
=0
Îť
j
Ď
j
(
x
)


dx
where
ÎŁ
[
j
,
k
]
is the variance-covariance matrix. Because variance-covariance
matrix is always positive definite, Î(
¡
) is a convex function of
Îť
.
The moment conditions can be interpreted as known prior information.
Given prior information, we can achieve a least unbiased distribution func-
tion by maximum entropy principle. Suppose we have no prior information,
which means no moment condition except for normalization constraint, then
the solution is uniform distribution. If we have one more additional infor-
mation about the state, say
R
xf
(
x
)
dx
=
Âľ
1
>
0, then the solution takes
the form,
f
(
x
|
Îť
1
) =
Îť
1
exp[
â
Îť
1
x
] where
x
â
[0
,
â
). If we have another ad-
ditional information, say
R
x
2
f
(
x
)
dx
=
Âľ
2
, then the solution will be normal
distribution.
All solutions are functions of Lagrangian multipliers. Like most optimization
problem, Lagrangian multipliers represent
âmarginal contribution (shadow
price)â
of each constraint to the objective value. For example, suppose Ë
Îť
2
is
estimated to be close to 0. Then, there is little contribution of moment con-
straint condition ,
R
x
2
f
(
x
)
dx
=
Âľ
2
, to the objective value. Consequently,
the Lagrangian multiplier reflect the information content of each constraint.
It can be shown that [Dixit (1976)]
Ë
Îť
j
=
âH
( Ë
f
)
âÂľ
j
,
where
Ë
f
= arg max
H
(
f
)
.
That is, the multipliers provides rate of change of the maximum attainable
value with respect to the change in the constraints. If the
j
th
moment
constraint in (3.1) is true, that means the
j
th
moment condition would not
work as constraint, then, Ë
Îť
j
should be very close to zero. If information in
the moment condition is useless, then change of optimal value is very close
to zero.
Although potential, Î(
Îť
), has a unique solution, that may not exist given
certain
Âľ
j
. Mead and Papanicolaou (1984) showed the necessary and suffi-
cient condition in arithmetic moments case that a MED exists is completely
monotonicity of the moment sequence,
{
Âľ
j
, j
= 0
,
1
,
2
,
¡ ¡ ¡ }
over a finite
interval [0
.
1], which is called Hausdorff moment problem.
6
3.1
Characterization problem
Maximum entropy distribution has a very flexible functional form. By
choosing a sequence of known function
Ď
j
(
x
) for
j
= 0
,
1
,
2
,
¡ ¡ ¡
, N
, we
can generate a sequence of various flexible maximum entropy density func-
tions. Many well-known families of distributions can be obtained as special
cases of maximum entropy density function. It is known that the normal
distribution maximizes the entropy among continuous univariate distribu-
tion on (
ââ
,
â
) with given mean and variance [see Rao (2002), p.163].
A similar property holds for the multivariate normal distribution. Kagan,
Linnik and Rao (1973) gave a list of well-known distributions, including
the Beta, Gamma, Exponential and Laplas distributions, which maximize
the entropy subject to certain constraints. Gokhale (1975) showed chara-
terizations of the normal, double exponential and cauchy families in the
univariate case and the Dirichlet and Wishart families in the multivariate
case. Especially, Cobb, Koppstein and Chen (1983) showed a general class
of multimodal density function within unified frame work by the stochas-
tic catastrophe model. Since the system behaves in stochastic catastrophe
model as if it moves towards the points of lowest potential, it has similar
property of maximum entropy principle. They proposed four major types
of cusp probability density function. The general functional is
f
k
(
x
|
β
) =
Ξ
(
β
) exp
Âł
â
R h
g
k
(
x
)
V
(
x
)
i
dx
´
where
g
k
(
x
) =
β
0
+
β
1
x
+
β
2
x
2
+
¡ ¡ ¡
+
β
k
x
k
,
k >
0
and
V
(
x
) has four kinds of forms: Type
N
:
V
(
x
) = 1
,
ââ
< x <
â
,
Type
G
:
V
(
x
) =
x,
0
< x <
â
, Type
I
:
V
(
x
) =
x
2
,
0
< x <
â
, Type
B
:
V
(
x
) =
x
(1
â
x
)
,
0
< x <
1. The shape of density is determined by
polynomial
g
(
x
) which is called
shape polynimial
. If the order of polynomial
g
and
V
are one and two respectively, then equation represents Pearson
system. Zellner and Highfield (1988) showed that CKC (1983) distribution
can be derived from maximum entropy distribution under specified moment
conditions [see Table 1.].
Table 1: Maximum entropy density characterization in CKC (1983)
Type
Side constraints
1
N
k
R
x
r
f
(
x
)
dx
=
Âľ
r
,
r
= 0
,
1
,
¡ ¡ ¡
, k
+ 1
,
Âľ
0
= 1
.
2
G
k
R
x
r
f
(
x
)
dx
=
Âľ
r
,
r
= 0
,
1
,
¡ ¡ ¡
, k.
R
log
xf
(
x
)
dx
=
c.
3
I
k
R
x
r
f
(
x
)
dx
=
Âľ
r
,
r
= 0
,
1
,
¡ ¡ ¡
, k
â
1
,
R
log
xf
(
x
)
dx
=
c
1
,
R
(1
/x
)
f
(
x
)
dx
=
c
2
.
4
B
k
R
x
r
f
(
x
)
dx
=
Âľ
r
,
r
= 0
,
1
,
¡ ¡ ¡
, k
â
1
,
R
log
xf
(
x
)
dx
=
c
1
,
R
log(1
â
x
)
f
(
x
)
dx
=
c
2
7
Table 2 shows well-known types of distribution function which can be gen-
erated by certain given moment conditions. We can interpret MED in
information-theoretic way that by imposing prior moment conditions which
is inherent in data in hand we generate the least biased distribution function.
There are some links among provided MEDs in terms of
Ď
j
(
x
) in moment
equation. Consider moment functional,
x
2
, ln(
r
2
+
x
2
) and ln(1 +
x
2
), which
represent a degree of dispersion in Normal, Student-t and Cauchy distribu-
tion respectively where
r
2
is a degree of freedom in Student-t distribution.
If there is no moment condition but normalization constraint, the resulting
MED is uniform distribution. Thus, uniform distribution maximizes entropy
if we do not have prior information. As we impose some moment constraints,
entropy is lower than that of uniform distribution.
Figure 1: Moment function
Ď
j
(
x
) represent dispersion
â3
â2
â1
0
1
2
3
0
1
2
3
4
5
6
x
Ď
(
x
)
Functions representing dispersion and excess kurtosis
x1
x^2
log(4+x^2)
log(10+x^2)
log(1+x^2)
|x|^(0.5)
|x|^(1.3)
Figure 1 shows
Ď
j
(
x
) in Normal, Student-t and Cauchy distribution cases.
We can interpret
Ď
j
as
pseudo weight function
in the sense that given random
variable,
x
, we increase information in
f
(
x
) by
Ď
j
(
x
) multiplying
Îť
j
. Thus,
at point
x
1
,
Ď
j
(
x
) in normal moment constraint penalize
x
1
event more
than those in student-t and cauchy distribution do to adhere maximum
value of entropy under constraints. Hence, this catches that tails of student-
t and cauchy distribution is more heavier than normal distribution. On the
other side, log
x
, log(1
â
x
) and arctan(
x
r
) represent measure of skewness.
8
Figure 2: Moment function
Ď
j
(
x
) represent skewness
â15
â10
â5
0
5
10
15
â1
0
1
2
x
Ď
(
x
)
Functions representing skewness
log(x/(1âx))
arctan(x/4)
arctan(x/10)
(log(x))^2
E
[log
x
] =
Î
0
(
a
)
Î(
a
)
used as
Ď
j
(
x
) in gamma distribution and
Ď
2
ν
distribution
is a special case of gamma distribution when
E
[log
x
] =
Î
0
(1
/
2)
Î(1
/
2)
+ log 2 for
0
â¤
x <
â
. If we combine 2 moment conditions
E
[log
x
] and
E
[log(1
â
x
)],
then beta distribution is generated within range 0
â¤
x
â¤
1. arctan(
x
r
) is
measure of skewness in pearson type-IV distribution. Figure 1 shows that
Ď
j
(
x
) has an asymmetric functional whereas
Ď
j
(
x
) in the left panel has a
symmetric functional. This difference provides, intuitively, why
Ď
j
in the left
panel represent measure of kurtosis and
Ď
j
(
x
) in the right panel represents
measure of skewness. Premaratne and Bera (2001) used arctan(
x
) to test
asymmetry in leptokurtic distribution. Indeed, arctan(
x
) for
ââ
< x <
â
,
log
x
for 0
â¤
x <
â
and log(
r
2
+
x
2
) would be more robust test statistic
than classical skewness and kurtosis parameter in the sense of Huber (1980).
9
T
able
2:
Characterization
of
maxim
um
en
trop
y
densit
y
T
yp
e
Side
constrain
ts
Resulting
distribution,
f
(
x
)
=
Distribution
form,
f
(
x
)
=
Uniform
exp
[
â
Îť
0
]
1
b
â
1
,
a
â¤
x
â¤
b,
(
b
>
a
)
Exp.
R
x
xf
(
x
)
dx
=
m
(
m
>
0)
exp
[
â
Îť
0
â
Îť
1
x
]
1
m
exp
h
â
x m
i
,
0
â¤
x
<
â
Normal
R
x
xf
(
x
)
dx
=
Âľ
exp
h
â
Îť
0
â
Îť
1
x
â
Îť
2
x
2
i
1
Ď
â
2
Ď
exp
¡
â
(
x
â
Âľ
)
2
2
Ď
2
¸
,
ââ
<
x
<
â
R
x
(
x
â
Âľ
)
2
f
(
x
)
dx
=
Ď
2
Log
Normal
R
x
log
xf
(
x
)
dx
=
Âľ
exp
h
â
Îť
0
â
Îť
1
log
x
â
Îť
2
(log
x
)
2
i
1
Ď
â
2
Ď
exp
¡
â
(
x
â
Âľ
)
2
2
Ď
2
¸
,
0
<
x
<
â
R
x
(log
x
â
Âľ
)
2
f
(
x
)
dx
=
Ď
2
Gen.
Exp.
R
x
x
i
f
(
x
)
dx
=
Âľ
i
i
=
1
,
2
,
¡
¡
¡
N
exp
h
â
P
N n
=0
Îť
n
x
n
i
exp
h
â
P
N n
=0
Îť
n
x
n
i
ââ
<
x
<
â
Double
Exp.
R
x
|
x
â
Âľ
|
f
(
x
)
dx
=
Ď
2
exp
h
â
Îť
0
â
Îť
1
x
â
Îť
2
|
x
|
2
i
C
(
θ
)
exp
h
â|
x
â
Âľ
|
/Ď
2
i
ââ
<
x
<
â
Gamma
R
x
xf
(
x
)
dx
=
a
(
a
â
<
;
a
>
0)
exp
[
â
Îť
0
â
Îť
1
x
â
Îť
2
log
x
]
1
Î(
a
)
exp
â
x
x
a
â
1
,
0
<
x
<
â
R
x
log
xf
(
x
)
dx
=
Î
0
(
a
)
Î(
a
)
Chi-square
R
x
xf
(
x
)
dx
=
ν
exp
[
â
Îť
0
â
Îť
1
x
â
Îť
2
log
x
]
1
2
1 2
ν
Î(
a
)
exp
â
1 2
x
x
1 2
ν
â
1
,
0
<
x
<
â
ν
df
R
x
log
xf
(
x
)
dx
=
Î
0
(
1 2
)
Î(
1 2
)
+
log
2
W
eibull
R
x
x
a
f
(
x
)
dx
=
1
(
a
â
<
,
a
>
0)
exp
[
â
Îť
0
â
Îť
1
x
a
â
Îť
2
log
x
]
ax
a
â
1
exp
[
â
x
a
]
,
0
<
x
<
â
GED
R
x
|
x
|
ν
f
(
x
)
dx
=
c
(
ν
)
exp
[
â
Îť
0
â
Îť
1
|
x
|
ν
]
C
(
ν
)
exp
[
|
x
|
ν
]
,
ââ
<
x
<
â
Beta
R
x
log
xf
(
x
)
dx
=
Î
0
(
a
)
Î(
a
)
â
Î
0
(
a
+
b
)
Î(
a
+
b
)
exp
[
â
Îť
0
â
Îť
1
log
x
â
Îť
2
log
(1
â
x
)]
1
B
(
a,b
)
x
a
â
1
(1
â
x
)
b
â
1
,
0
<
x
<
1
R
x
log
(1
â
x
)
f
(
x
)
dx
=
Î
0
(
b
)
Î(
b
)
â
Î
0
(
a
+
b
)
Î(
a
+
b
)
Cauc
h
y
R
x
log
(1
+
x
2
)
f
(
x
)
dx
=
2
log
2
exp
h
â
Îť
0
â
Îť
1
log
(1
+
x
2
)
i
1
Ď
(1+
x
2
)
,
ââ
<
x
<
â
Studen
t-t
R
x
log
(
r
2
+
x
2
)
f
(
x
)
dx
=
log
(
r
2
)
exp
h
â
Îť
0
â
Îť
1
log
(
r
2
+
x
2
)
i
Î[(
r
2
+1)
/
2]
â
Ď
r
2
Î(
r
2
/
2)
1
(1+
x
2
/r
2
)
(
r
2
+1)
/
2
,
ââ
<
x
<
â
+
Î
0
(
1+
r
2
2
)
Î(
1+
r
2
2
)
â
Î
0
(
r
2
2
)
Î(
r
2
2
)
P
earson-t
yp
e
IV
R
x
tan
â
1
Âł
x r
´
f
(
x
)
dx
=
c
1
(
r
)
exp
h
â
Îť
0
â
Îť
1
tan
â
1
Âł
x r
´
K
Âľ
1
+
x
2
r
2
Âś
â
m
exp
h
δ
tan
â
1
Âł
x r
´i
,
ââ
<
x
<
â
R
x
log
(
r
2
+
x
2
)
f
(
x
)
dx
=
c
2
(
r
)
â
Îť
2
log
(
r
2
+
x
2
)
i
Gen.
studen
t-t
R
x
x
i
â
2
f
(
x
)
dx
=
Âľ
i
i
=
3
,
4
,
¡
¡
¡
k
exp
h
â
Îť
0
â
Îť
1
tan
â
1
(
x r
)
R
x
tan
â
1
(
x r
)
f
(
x
)
=
c
1
(
r
)
â
Îť
2
log
(
r
2
+
x
2
)
â
P
k i
=3
Îť
i
x
i
â
2
i
R
x
log
(
r
2
+
x
2
)
f
(
x
)
=
c
2
(
r
)
Gen.
log
normal
R
x
x
i
â
2
f
(
x
)
dx
=
Âľ
i
i
=
3
,
4
,
¡
¡
¡
k
exp
h
â
Îť
0
â
Îť
1
log
x
â
Îť
2
(log
(
x
))
2
R
x
log
xf
(
x
)
=
c
1
(
r
)
â
P
k i
=3
Îť
i
x
i
â
2
i
R
x
(log
(
x
))
2
f
(
x
)
=
c
2
(
r
)
10
Last two distributions in Table 2, generalized student-t and general-
ized log-normal distributions, are proposed by Lye and Martin (1983) which
generalized CKC (1983) distribution by extending
g
(
x
) and
V
(
x
) to include
more general function such as logarithm, trigonometric expression and so
on.
3.2
Numerical implementation
There are a few researches of optimizing procedure of ME problem in econo-
metrics literatures. Zellner and Highfield (1988) solved for MED, which
is quartic exponential distribution, by taking Taylorâs expansion on
Îť
and
using Newton method to optimize objective function. Ryu (1990) consid-
ered moment recursion of quartic exponential distribution to solve for MED.
Ormoneit and White (1999) applied same method as Zellner and Highfield
(1988) but they proposed more accurate way to minimize of numerical in-
tegration. Rockinger and Jondeau (2002) used Gauss-Legendre quadrature
in optimizing procedure and they calculated skewness and kurtosis bound-
ary numerically in which MED could exist. Wu (2003) proposed sequential
updating method which incorporates the moment constraints into the cal-
culation from lower to high moment and updates the density estimates.
However, all above numerical methods are used in arithmetic moment case.
Essentially, our procedure has the same spirit of other numerical procedure
but we deal with general moment case in ME problem.
The solution of the ME problem in general case is given by (3.3). The La-
grange multipliers,
Îť
, are calculated by solving following nonlinear equation.
G
n
(
Îť
) =
Z
Ď
n
(
x
) exp
"
â
N
X
n
=0
Îť
n
Ď
n
(
x
)
#
dx
=
Âľ
n
,
n
= 0
,
¡ ¡ ¡
, N
(3.7)
If we substitute
Îť
0
in (3.7) by (3.4), we can reduce
N
+ 1 dimensions to
N
.
G
n
(
Îť
) =
Z
Ď
n
(
x
)


exp
n
â
P
N
n
=1
Îť
n
Ď
n
(
x
)
o
R
exp
n
â
P
N
n
=1
Îť
n
Ď
n
(
x
)
o
dx


dx
=
Âľ
n
,
n
= 1
,
¡ ¡ ¡
, N
(3.8)
These equations can be solved by Newton method which consists of ex-
panding
G
n
(
Îť
) in Taylorâs series and drop higher order term and solve the
linear system iteratively. The linear system by first order Taylorâs expansion
around initial value
Îť
0
are give by
11
G
n
(
Îť
)
âź
=
G
n
(
Îť
0
) + (
Îť
â
Îť
0
)
¡
âG
n
(
Îť
)
âÎť
k
¸
(
Îť
=
Îť
0
)
=
Âľ
n
,
n, k
= 1
,
¡ ¡ ¡
, N.
(3.9)
(3.9) can be represented by the forms of vectors and matrix
H
δ
=
Ξ
(3.10)
where
δ
and
Ξ
by
δ
=
Îť
â
Îť
0
Ξ
=
ÂŁ
Âľ
0
â
G
0
(
Îť
0
)
,
¡ ¡ ¡
, Âľ
N
â
G
N
(
Îť
0
)
¤
0
and the
n
th row and
k
th column element of the matrix
H
by
H
[
n,k
]
=
Z
Ď
n
(
x
)
Ď
k
(
x
) exp
"
â
N
X
n
=0
Îť
n
Ď
n
(
x
)
#
dx
(3.11)
â
Z
Ď
n
(
x
) exp
"
â
N
X
n
=0
Îť
n
Ď
n
(
x
)
#
dx
¡
Z
Ď
k
(
x
) exp
"
â
N
X
n
=0
Îť
n
Ď
n
(
x
)
#
dx
Starting from initial choices for
Îť
1
,
¡ ¡ ¡
, Îť
N
, updated
Îť
0
are defined from
Îť
0
=
Îť
â
H
â
1
Ξ
(3.12)
where
H
is positive definite matrix and
H
â
1
is its inverse matrix.
When we calculate the values of the matrix
H
and the vector
Ξ
we need
a particular method of numerical integration. We adapt Gauss-Legendre
quadrature to perform numerical integration. Consider the case such that
we want to calculate âŚ(
Îť
)
âŚ(
Îť
) =
Z
u
l
exp
(
â
N
X
n
=1
Îť
n
Ď
n
(
x
)
)
dx
(3.13)
Define
Z
= [2
x
â
(
u
+
l
)]
/
(
u
â
l
). Substitute
x
in (3.13) by
x
=
1
2
[
Z
(
u
â
l
) +
(
u
+
l
)]. Then, (3.7) is written as follows
Z
1
â
1
(
u
â
l
)
2
exp
(
â
N
X
n
=1
Îť
n
¡
Ď
n
Âľ
1
2
(
Z
(
u
â
l
) + (
u
+
l
))
œ¸)
dZ
(3.14)
12
Define Ë
Z
node
and
W
by nodes vector and weights vector of the Gauss-
Legendre quadrature respectively and Ë
Z
new
âĄ
1
2
( Ë
Z
node
(
u
â
l
) + (
u
+
l
)).
Then, (3.14) is as follows.
âŚ(
Îť
) =
(
u
â
l
)
2
h
W
0
exp
n
â
Ď
( Ë
Z
new
)
Îť
oi
(3.15)
Consider integration problem of
G
n
(
Îť
)
G
n
(
Îť
) =
Z
u
l
Ď
n
(
x
)


exp
n
â
P
N
n
=1
Îť
n
Ď
n
(
x
)
o
R
u
l
exp
n
â
P
N
n
=1
Îť
n
Ď
n
(
x
)
o
dx


dx
(3.16)
By using Ë
Z
node
and
W
, (3.16) is written as follows
G
(
Îť
) =
1
âŚ(
Îť
)
(
u
â
l
)
2
h
W
0
h
Ë
Z
new
â
exp
n
â
Ď
( Ë
Z
new
)
Îť
oii
(3.17)
where
G
(
Îť
) is [
G
1
(
Îť
)
,
¡ ¡ ¡
, G
N
(
Îť
)]
0
and
â
is element by element multiplica-
tion operator.
To estimate MED we propose the following procedure:
1. Set the domain where the density will be defined, [
l, u
].
2. Using
x
=
1
2
[
Z
(
u
â
l
) + (
u
+
l
)] change the domain into [
â
1
,
1]. Use a
j
= 50 points Gaussian quadrature such that Ë
Z
j
node
â
[
â
1
,
1] and the
weights
W
j
where
j
= 1
,
¡ ¡ ¡
,
50.
3. At first step
t
= 0, set the initial values of
Îť
0
as [0
,
¡ ¡ ¡
,
0].
4. At step
t
, Calculate
Ξ
(
t
)
and
H
(
t
)
by using numerical integration in
(3.9) and (3.11).
5. Let
δ
(
t
)
be the solution to
H
(
t
)
δ
(
t
)
=
Ξ
(
t
)
6. Update the vector of Lagrange multipliers
Îť
(
t
)
=
Îť
(
t
â
1)
â
δ
(
t
)
.
7. Set
t
=
t
+ 1 and go to step 4 until
δ
(
t
)
becomes appropriately small.
(use 1
e
â
10 as stopping rule)
13
4
Maximum entropy GARCH model
Various GARCH model under the assumption of non-normal conditional
density function have been proposed to explain leptokurtic behavior of un-
conditional density function. Bollerslev (1987) used Student-t distribution
as conditional distribution in GARCH model. However, Hsieh (1989) found
that GARCH (1,1) model with Student-t distribution as a conditional dis-
tribution could not capture the daily returns in British pound and Japanese
yen. Nelson (1991) employed generalized error distribution (GED) distri-
bution with exponential GARCH model but he found estimated GED has
not enough thick tail behavior to capture unconditional leptokurtosis. On
the other hand, conditional skewed density also had been used to explain
the skewness of unconditional density. Lee and Tse (1991) used a distri-
bution based on the first three terms of the Gram-Charlier series. Engle
and Gonz´alez-Rivera (1991) adapted nonparametric conditional density and
Premaratne and Bera (2000) used Pearson type-IV distribution as condi-
tional density function. Pearson type-IV distribution is derived by maxi-
mum entropy principle by the side condition, say
E
[tan
â
1
(
x
r
)] =
c
1
(
r
) and
E
[log(
r
2
+
x
2
)] =
c
2
(
r
). Thus, we can incorporate with Pearson type-IV
distribution within MED formulation. Rockinger and Jondeau (2002) ap-
plied MED to GARCH model. Since they considered first fourth arithmetic
moment condition, the resulting MED has the form of quartic exponential
distribution. However, quartic exponential distribution could not sufficiently
explain heavy tail behavior such as Student-t distribution.
Let us consider the following model:
y
t
=
m
t
(
x
t
;
Îś
) +
²
t
,
t
= 1
,
2
,
¡ ¡ ¡
, T
(4.1)
with
²
t
|
Ď
t
â
1
âź
f
(0
, h
t
).
f
is the unknown density function of
²
t
conditional
on the set of past information
Ď
t
â
1
,
m
t
(
¡
) is the conditional mean function,
x
t
is a
K
Ă
1 vector of exogenous variables,
Îś
is a vector of parameter and
h
t
is random variable parameterized as
h
t
=
Îą
0
+
P
p
j
=1
Îą
j
²
2
t
â
j
+
P
q
j
=1
β
j
h
t
â
j
.
MED is the least biased density function and minimize Kullback-Leibler
information criteria under the correctly specified moment condition. We
can write conditional density function in general MED as follows,
f
(
Ρ
t
) = exp
"
â
N
X
i
=0
Îť
i
Ď
i
Âľ
y
t
â
x
0
t
Îś
â
h
t
Âś#
.
(4.2)
where
f
(
Ρ
t
) satisfies
E
[
Ď
j
(
Ρ
t
)] = 0. The quasi-log density function at time
t
is given by
14
l
Q
t
(
θ
) =
â
N
X
i
=0
Îť
i
Ď
i
Âľ
y
t
â
x
0
t
Îś
â
h
t
Âś
â
1
2
ln
h
t
,
(4.3)
where
θ
= (
ι, β, Μ, Ν
)
â
Î. Given this, the qiasi-log likelihood function is
l
Q
(
θ
) =
P
T
t
=1
l
Q
t
(
θ
).
MEARCH approach is quite related with other semi-nonparametric ARCH
approach. First, first order condition of log quasi-likelihood yields quasi-
score function.
âl
Q
(
θ
)
âÎą
=
1
2
T
X
t
=1
N
X
i
=0
Îť
i
Ď
0
i
(
Ρ
t
)
âh
t
âÎą
h
3
/
2
t
(
y
t
â
x
0
t
Îś
)
â
1
2
T
X
t
=0
ln
h
t
= 0
(4.4)
âl
Q
(
θ
)
âβ
=
1
2
T
X
t
=1
N
X
i
=0
Îť
i
Ď
0
i
(
Ρ
t
)
âh
t
âβ
h
3
/
2
t
(
y
t
â
x
0
t
Îś
)
â
1
2
T
X
t
=0
ln
h
t
= 0
(4.5)
âl
Q
(
θ
)
âÎś
=
T
X
t
=1
N
X
i
=0
Îť
i
Ď
0
i
(
Ρ
t
)
âx
0
t
Îś
âÎś
â
h
t
= 0
(4.6)
âl
Q
(
θ
)
âÎť
i
=
â
T
âÎť
0
âÎť
i
â
T
X
t
=1
Ď
i
(
Ρ
t
) = 0
(4.7)
If underlined conditional density is correctly specified, then equations (4.4)
- (4.7) are optimal estimating equation (EF) [ see Godambe (1960) ]. Li and
Turtle (2000) proposed semiparametric ARCH model within EF approach
and the optimal EF for GARCH model are given by
`
â
1
=
â
T
X
t
=1
âh
t
âδ
h
2
t
(
Îł
2
t
+ 2
â
Îł
2
1
t
)
g
2
t
(4.8)
`
â
2
=
â
T
X
t
=1
âx
0
t
Îś
âÎś
h
t
g
1
t
+
T
X
t
=1
h
1
/
2
t
Îł
1
t
âx
0
t
Îś
âÎś
â
âh
t
âÎś
h
2
t
(
Îł
2
t
+ 2
â
Îł
2
1
t
)
g
2
t
(4.9)
15
where
δ
= (
ι, β
) and
g
1
t
=
y
t
â
x
0
t
Îś
g
2
t
= (
y
t
â
x
0
t
Îś
)
2
â
h
t
â
Îł
1
t
h
1
/
2
t
(
y
t
â
x
0
t
Îś
)
Îł
1
t
=
E
[(
y
t
â
x
0
t
Îś
)
3
|
Ď
t
â
1
]
h
3
/
2
t
Îł
2
t
=
E
[(
y
t
â
x
0
t
Îś
)
4
|
Ď
t
â
1
]
h
2
t
Equations (4.8) and (4.9) are actually same as efficient generalized method
of moment (GMM) moment conditions which are attainable by optimal in-
strumental variables. There is no
priori
assumption of conditional density
in EF approach. Under the conditional normality assumption, equations
(4.8) and (4.9) are equivalent to the first order condition of Engle (1982)
up to sign change. Quasi-score function, equation (4.4) - (4.7), should be
equivalent to those of Engle (1982) under the conditional normality condi-
tion. The gains of Li and Turtle (2000)âs approach is that we do not have
to assume any conditional density. However, in practice,
Îł
1
t
and
Îł
2
t
should
be specified in some way. Let us consider , although extreme, conditional
cauchy density case. Parameters in GARCH model cannot be estimated
consistently in EF approach under the conditional cauchy case. MEARCH
approach is more robust in this sense, since all we need the moment condition
is
E
ÂŁ
log(1 +
x
2
)
¤
=
c
2
.
Secondly, extreme estimation problem can be represented by MEEL estima-
tion procedure.
max
Ď
i
,θ
â
n
X
i
=1
Ď
i
ln
Ď
i
s.t.
n
X
i
=1
Ď
i
Ď
(
h
(
Ρ, θ
)) = 0
,
n
X
i
=1
Ď
i
= 1
,
(4.10)
where
Ď
(
¡
) is a vector of quasi-score functions. Imbens et.al.(1998) argued
that the influence function of exponential tilting estimator stays bounded
where that of EL become unbounded even if
Ď
(
¡
) is bounded. Since ME esti-
mates incorporate the additional moment equation information, Ë
Ď
i
weighted
estimates would be more efficient in finite sample. Under certain regularity
condition MEEL estimators are consistent and asymptotically normal, i.e.,
n
1
/
2
(Ë
θ
M EEL
â
θ
0
)
ââ
d
N
(0
,
ÎŁ)
where
16
ÎŁ =
E
¡
âh
(
Ď
(
Ρ
)
, θ
)
âθ
|
θ
0
¸
â
1
(
E
ÂŁ
h
(
Ď
(
Ρ
)
, θ
)
h
(
Ď
(
Ρ
)
, θ
0
)
0
¤
E
¡
âh
(
Ď
(
Ρ
)
, θ
)
âθ
|
θ
0
¸
â
1
which is equivalent to asymptotic properties of quasi maximum likelihood
estimators. Imbens (1997) showed that under certain regularity conditions,
the limiting distributions of MEEL and MEL estimators for
θ
are the same.
Finally, when we add more side moment conditions in (4.10), then Ë
Ď
will
become more wriggle. In this sense maximum entropy mechanism works
for smoothing curves. If we add more plausible moments in (4.10), Ë
Ď
will
be similar to nonparametric density estimates. Since Engle and Gonz´alez-
Rivera (1991) used nonparametric conditional density their approach could
be considered within our framework.
4.1
Estimation
Although maximum likelihood (ML) estimates are conceptually different
from ME procedure, it is mathematically equivalent to ME procedure when
we replace known moments by consistent estimates. Since distributions in
exponential family have a unique ML solution, ME solution is also unique, if
it exists. Thus, we replace known moments by sample moments,
1
T
P
T
t
=1
Ď
j
(
x
t
;
Îł
),
and perform a maximization of quasi-likelihood function that is based on
estimated MED. Quasi-log likelihood function, (4.3), can be represented as
follows;
`
Q
(
θ
) =
T
X
t
=1
log
f
Ă
²
t
h
1
/
2
t
!
â
1
2
T
X
t
=1
log
h
t
,
(4.11)
Since the standardized residual follows MED in our model, the quasi-likelihood
function is given by
`
Q
(
θ
) =
â
T
X
t
=1
N
X
j
=0
Îť
j
Ď
j
Ă
²
t
h
1
/
2
t
!
â
1
2
T
X
t
=1
log
h
t
,
(4.12)
where
Ď
j
(
¡
) is associated moment function. We optimize the objective func-
tion, (4.10), by following procedures.
1. Estimate initial consistent estimates of
θ
by an appropriate QMLE
and forms standardized residual, Ë
Ρ
t
=
²
t
/h
1
/
2
t
, from those estimates.
17
2. Calculate
1
T
P
T
t
=1
Ď
j
(Ë
Ρ
t
, Îł
) and use these estimates as known moments
in ME problem.
3. Estimate MED by proposed numerical algorithm and fix this MED as
conditional density function,
f
, in (4.11). Iterate (4.12) and evaluate
quasi-log likelihood function until we get convergence.
4. With the standardized residual from step 3, continue with step 2 and
3 until each
Îť
j
gets convergent result.
A range of numerical optimization routine can by used. We adapted the
Broyden, Fletcher, Goldfarb and Shannon (BFGS) algorithm. For compu-
tational convenience, the derivatives are computed numerically.
Our estimates share asymptotic properties with those of QMLE that attain
asymptotic consistency and asymptotic normality.[ White (1982), Bollerslev
and Wooldridge (1988)]
Ë
θ
QM LE
â
arg max
θ
â
Î
h
l
Q
t
(
θ
;
Ρ
)
i
,
The limit distribution is given by
â
T
Âł
Ë
θ
QM LE
â
θ
0
´
ââ
d
N
Âł
0
, A
0
T
â
1
B
0
T
A
0
T
â
1
´
,
where
θ
0
is the quasi-true parameter and
A
0
T
=
â
T
â
1
T
X
t
=1
E
Ă
â
2
l
Q
t
(
θ
0
)
âθâθ
0
!
B
0
T
=
T
â
1
T
X
t
=1
E
Ă
âl
Q
t
(
θ
0
)
âθ
âl
Q
t
(
θ
0
)
âθ
0
!
The matrix
A
0
T
and
B
0
T
are consistently estimated by
Ë
A
T
=
â
T
â
1
T
X
t
=1
E
Ă
â
2
l
Q
t
(Ë
θ
QM LE
)
âθâθ
0
|
Ď
t
â
1
!
Ë
B
T
=
T
â
1
T
X
t
=1
E
Ă
âl
Q
t
(Ë
θ
QM LE
)
âθ
âl
Q
t
(Ë
θ
QM LE
)
âθ
0
|
Ď
t
â
1
!
If conditional density function is correctly specified with the true density,
then we have well known asymptotic normality,
18
â
T
Âł
Ë
θ
QM LE
â
θ
0
´
ââ
d
N
Âł
0
, B
0
T
â
1
´
4.2
Moment selection test
Consider we are interested in whether our prior is informative. Since the
multipliers provides rate of change of the maximum attainable value with
respect to the change in the constraints,
Îť
j
should be very closed to zero if
its associated moment equation does not convey any valuable information.
Imbens et.al.(1998) provided a Lagrangian multiplier test to test the validity
of moment equation in over-identified moment equation case. Imbens et.al
(1997) and Kitamura and Stutzer (1997) also provided various tests of the
validity of moment equation. Our test statistic based on Rao-Score (RS)
test. The score test will be based on
S
(
θ
j
) =
â`
Q
(
θ
)
âÎť
j
=
T
¡
E
f
[
Ď
j
(
x
t
;
Îł
)]
â
T
X
t
=1
Ď
j
(
x
t
;
Îł
)
,
j
= 1
,
2
,
¡ ¡ ¡
, N,
(4.13)
which, under the null,
H
0
:
θ
j
= 0 where
θ
j
â {
(
Îť
j
)
N
j
=1
}
, reduced to
S
(
θ
j
)
|
θ
j
=0
=
T
¡
δ
j
â
T
X
t
=1
Ď
j
(
x
t
;
Îł
)
(4.14)
where
δ
j
=
R
Ď
j
(
x
t
;
Îł
) exp
h
â
P
i
â{
1
,
2
,
¡¡¡
,N
}\{
j
}
Îť
i
Ď
i
(
x
t
;
Îł
)
i
dx
t
R
exp
h
â
P
i
â{
1
,
2
,
¡¡¡
,N
}\{
j
}
Îť
i
Ď
i
(
x
t
;
Îł
)
i
dx
t
(4.15)
We, again, adapted Gauss-Legendre quadrature to perform numerical inte-
gration. The RS test statistic is given by
RS
=
T
¡
T
2
n
(Ë
θ
)
Ë
V
,
T
n
(
θ
) =
δ
j
â
1
T
T
X
t
=1
Ď
j
(
x
t
;
Îł
)
(4.16)
where Ë
V
is the consistent estimator of asymptotic variance of
â
T
¡
T
n
(Ë
θ
).
Under null hypothesis, RS will be distributed as
Ď
2
1
asymptotically. Pierce
(1982) provide useful result to calculate asymptotic variance of
â
T
¡
T
n
(Ë
θ
)
V
h
â
T
¡
T
n
(Ë
θ
)
i
=
T
¡
V
[
T
n
(
θ
)]
â
lim
n
ââ
E
¡
âT
n
(
θ
)
âθ
¸
V
(
â
T
¡
Ë
θ
) lim
n
ââ
E
¡
âT
n
(
θ
)
âθ
¸
0
19
V
(
â
T
¡
Ë
θ
) becomes block diagonal matrix and lim
n
ââ
E
h
âT
n
(
θ
)
âθ
i
contains
zero elements in many cases. However, since
Ď
j
(
x
t
;
Îł
) is determined by
given moment conditions there is high tendency that
V
(
â
T
¡
Ë
θ
) is not block
diagonal matrix neither lim
n
ââ
E
h
âT
n
(
θ
)
âθ
i
does not contains several zero
values. This possibility makes calculating
V
[
â
T
¡
T
n
(Ë
θ
)] complicate. [see
appendix about derivation of
V
[
â
T
¡
T
n
(Ë
θ
)]. Thus, we estimate variance of
â
T
¡
T
n
(Ë
θ
) using bootstrap method.
5
Empirical illustration
We considered the daily returns of NYSE (equally weighted returns (EWR))
from the CRPS database for Oct, 1990 - Dec, 1996, a total of 1,771 observa-
tions. The data is plotted in Figure 3 and the summary statistics of sample
moments are presented in Table 3. The horizontal line in Figure 3 repre-
sents sample mean value of the data. The sample kurtosis and skewness
are 7
.
611827 and
â
0
.
943375 respectively and Jarque and Bera (JB) normal-
ity test statistics is 1832
.
157. This indicates high non-normality. Figure 3
indicates a high degree of conditional heteroskedasticity. To explain such
behaviors of stock return data, basically, we need to consider a model which
captures dynamic higher moment structure and distributional characteris-
tics simultaneously.
Table 3: Summary statistics
Number of observation
1771
Mean
0.0013
Median
0.0018
Standard Deviation
0.0052
Skewness
-0.9434
Kurtosis
7.6118
Max
0.0236
Min
-0.0332
Q(12)
368.55
Q(24)
452.28
Q(36)
476.49
JB normality test
1832.157
For empirical illustration purpose, we estimated MEARCH model in which
moment conditions are given by
E
f
h
arctan(
x
t
Îł
)
i
=
c
1
(
Îł
) and
E
f
ÂŁ
log(
Îł
2
+
x
2
t
)
¤
=
c
2
(
Îł
). These two moment conditions yield GARCH model which is gener-
20
Figure 3: Equal weighted returns (EWR) data
0
500
1000
1500
â0.03
â0.02
â0.01
0.00
0.01
0.02
Time
EWR
Equally weighted returns data
ated by the Pearson type-IV distribution. Estimates of GARCH(1,1) model
under the conditional normality, student-t distribution and MED are given
in Table 4. The standard error, which are obtained by ârobustâ covariance
estimates, are given in the parenthesis. It is clear that MED-GARCH(1,1)
model is better in terms of log-likelihood values. Nishii (1988) showed that
Aikaike information criteria (AIC) is not a consistent model selection criteria
when the true model is unspecified. On the other hand, Schwarz information
criteria (SIC) satisfies the necessary condition to be a consistent model se-
lection criteria [see Theorem 5. in Nishii (1988)]. MED-GARCH(1,1) model
has the lowest SIC value. We could consider the difference of log-likelihood
values between Normal-GARCH(1,1) model and Studentâs t-GARCH(1,1)
model, 101
.
94, as increments for incorporating the excess kurtosis. Differ-
encce of the likelihood values between student-t-GARCH(1,1) and MED-
21
GARCH(1,1), 21
.
55, could be thought of as gain by the skewed density
function. Estimated degree of freedom, 4
.
6662, in Student-t-GARCH(1,1)
model does support that conditional normal model might not explain high
leptokurtic behavior. In MED-GARCH(1,1) model,
Îł
2
= 4
.
3397 is the asso-
ciated to degree of freedom in student-t GARCH(1,1) model and it is quite
similar.
We can detect skewed and leptokurtic behaviors by Lagrange multipliers,
Îť
a
and
Îť
2
. If underlying conditional distribution is free of skewness, then
Îť
1
which is associated with moment function, arctan(
x
t
Îł
), should be closed to
zero. If there is no further excess kurtosis in conditional distribution, then
Îť
2
which corresponds to log(
Îł
2
+
x
2
t
), should be closed to zero. To test validity
of moment equations we performed RS test by applying bootstrap methods.
1
,
770 observations are resampled from Ë
Ρ
t
=
Ë
²
t
â
Ë
h
t
by 1
,
000 times. The RS
test statistics are 4
.
08 and 13
.
47 for
Îť
1
and
Îť
2
respectively. Therefore, none
of moment equations are redundant. This implies that there are leptokurtic
and skewed behaviors at the same time.
Figure 4 presents conditional densities for all the models. MED-GARCH(1,1)
model explain excess kurtosis and skewness at the same time. Figure reveals
that there is a strong asymmetry in the data along with excess kurtosis. The
skewness is due to negative realizations. MED-GARCH(1,1) model explains
better tail behavior than student-t GARCH(1,1) model. In many finance
literature there are numerous research in conditional value at risk (VaR).
If we concern VaR analysis, parametric model might be better to explain
tail behavior than nonparametric model under the assumption such that the
model is correctly specified. Extremely, adding more and more moment con-
straints, conditional density of MEARCH model is more like nonparametric
density. We might interpret MEARCH model such that it has a smooth
and sensible conditional density function that is generated by extracting
plausible information from the data and our MED approach could provide
improved estimates.
22
â2
0
2
0.0
0.1
0.2
0.3
0.4
Return
conditional density
Figure 4: Comparison of the MED conditional density of EWR to a normal
density
N
(0
,
1) and to student-t with 4.6663 df:
¡ ¡ ¡
, normal;
âââ
, student-
t; lines, MED
23
Table 4: GARCH estimates with three kinds of densities
Normal-GARCH(1,1)
Student-t-GARCH(1,1)
MED-GARCH(1,1)
GARCH(1,1)
Îą
0
3.954e-06
2.664e-06
1.541e-06
(7.999e-07)
(7.493e-07)
(3.900e-07)
Îą
1
0.1694
0.1874
0.1070
(0.0277)
(0.0360)
(0.0197)
β
1
0.6595
0.7102
0.6828
(0.0537)
(0.0543)
(0.0537)
AR(1)
Îś
0
0.0010
0.0012
0.0024
(0.0001)
(9.845e-05)
(9.978e-05)
Îś
1
0.3283
0.3042
0.2839
(0.0264)
(0.0239)
(0.0235)
Distribution Parameters
Îł
4.6662
2.0832
(0.4954)
(0.1293)
Îť
0
-2.9124
Îť
1
0.7001
Îť
2
2.6829
Lik
7044.248
7146.188
7167.738
AIC
-14078.496
-14280.376
-14317.476
SIC
-14051.102
-14247.504
-14268.167
24
0
500
1000
1500
0.00000
0.00005
0.00010
0.00015
Time
Conditional variance
Conditional variance estimated from MEDâGARCH(1,1)
Figure 5: Conditional variance estimated from MED-GARCH(1,1)
25
6
Concluding remarks and future research
In this paper, we have shown a generalization of GARCH model by incor-
porating MED as conditional density function. We characterized MED and
argued which moment function might be used in skewed and heavy tail dis-
tributions. In empirical application, we selected one case which could take
account of skewed and leptokurtic behavior of stock return data and showed
MEARCH model is quite useful in explaining behavior of financial time se-
ries. The moment selection procedure was performed by testing Lagrange
multipliers in Rao-Score testing procedure. Many other moment equations
or mixture of given moment equations could be chosen to generate general
and plausible conditional density. We plan to use other moment conditions
to build a more general MEARCH model.
26
Appendix
A.1.
Derivation of moment selection test based on the Rao-Score test.
Consider maximum entropy density,
f
(
x
t
, θ
) =
C
(
θ
) exp
"
â
N
X
i
=1
Îť
i
Ď
i
(
x
t
;
Îł
)
#
,
which satisfies
E
f
h
(
x
t
;
θ
) =
E
f
[
Ď
(
x
t
;
Îł
)
â
Ξ
(
Îł
)] = 0 where
θ
=
{
Îť, Îł
} â
Î. Then the
corresponding log-likelihood function can be written as follows,
l
(
θ
) =
T
ln
C
(
θ
)
â
N
X
i
=1
Îť
i
T
X
t
=1
Ď
i
(
x
t
;
Îł
)
,
where ln
C
(
θ
) =
R
exp
hP
N
i
=1
Îť
i
Ď
i
(
x
t
;
Îł
)
i
dx
.
The score function is
S
(
θ
j
) =
âl
(
θ
)
âθ
j
=
T
â
ln
C
(
θ
)
âθ
j
â
T
X
t
=1
Ď
j
(
x
t
;
Îł
)
,
where
θ
j
=
Îť
j
,
θ
j
6
=
Îł
k
â
Îł
.
Since
R
f
(
x
t
)
dx
= 1,
âC
(
θ
)
âθ
j
Z
exp
"
â
N
X
i
=1
Îť
i
Ď
i
(
x
t
;
Îł
)
#
dx
â
C
(
θ
)
Z
exp
"
â
N
X
i
=1
Îť
i
Ď
i
(
x
t
;
Îł
)
#
Ď
j
(
x
t
;
Îł
)
dx
= 0
.
(6.1)
Then, by dividing both sides by
C
(
θ
) in equation (6.1) we can get,
â
ln
C
(
θ
)
âθ
j
=
R
exp
h
â
P
N
i
=1
Îť
i
Ď
i
(
x
t
;
Îł
)
i
Ď
j
(
x
t
;
Îł
)
dx
R
exp
h
â
P
N
i
=1
Îť
i
Ď
i
(
x
t
;
Îł
)
i
dx
=
E
f
(
Ď
j
(
x
t
;
Îł
))
.
Evaluating above equation under the null,
θ
j
= 0 where
θ
j
â {
(
Îť
)
N
j
=1
}
,
â
ln
C
(
θ
)
âθ
j
|
θ
j
=0
=
R
exp
h
â
P
i
=
{
1
,
2
,
¡¡¡
,N
}\{
j
}
Îť
i
Ď
i
(
x
t
;
Îł
)
i
Ď
j
(
x
t
;
Îł
)
dx
R
exp
h
â
P
i
=
{
1
,
2
,
¡¡¡
,N
}\{
j
}
Îť
i
Ď
i
(
x
t
;
Îł
)
i
dx
âĄ
δ
j
.
27
Thus, under the null the score function can be represented as follows,
S
(
θ
j
) =
T δ
j
â
T
X
t
=1
Ď
j
(
x
t
;
Îł
)
.
Again,
â
ln
C
(
θ
)
âθ
j
can be rewritten to obtain second derivative as follows,
â
ln
C
(
θ
)
âθ
j
=
C
(
θ
)
Z
exp
"
â
N
X
i
=1
Îť
i
Ď
i
(
x
t
;
Îł
)
#
Ď
j
(
x
t
;
Îł
)
dx.
Then, the second derivative of ln
C
(
θ
) are
â
2
ln
C
(
θ
)
âθ
j
âθ
l
=
âC
(
θ
)
âθ
l
Z
exp
"
â
N
X
i
=1
Îť
i
Ď
i
(
x
t
;
Îł
)
#
Ď
j
(
x
t
;
Îł
)
dx
â
C
(
θ
)
Z
exp
"
â
N
X
i
=1
Îť
i
Ď
i
(
x
t
;
Îł
)
#
Ď
j
(
x
t
;
Îł
)
Ď
l
(
x
t
;
Îł
)
dx,
(6.2)
Above equation (6.2) can be represented as
â
2
ln
C
(
θ
)
âθ
j
âθ
l
=
C
(
θ
)
"
â
ln
C
(
θ
)
âθ
l
Z
exp
"
â
N
X
i
=1
Îť
i
Ď
i
(
x
t
;
Îł
)
#
Ď
j
(
x
t
;
Îł
)
dx
â
Z
exp
"
â
N
X
i
=1
Îť
i
Ď
i
(
x
t
;
Îł
)
#
Ď
j
(
x
t
;
Îł
)
Ď
l
(
x
t
;
Îł
)
dx
#
=
â
ln
C
(
θ
)
âθ
l
E
f
[
Ď
j
(
x
t
;
Îł
)]
â
E
f
[
Ď
j
(
x
t
;
Îł
)
Ď
l
(
x
t
;
Îł
)]
,
(6.3)
where
θ
j
, θ
l
â {
(
Îť
j
)
N
i
=1
}
.
Then, we can represent second derivative of ln
C
(
θ
) as follows,
â
2
ln
C
(
θ
)
âθ
j
âθ
l
|
θ
j
=0
=
δ
l
δ
j
â
δ
jl
where
δ
l
=
â
ln
C
(
θ
)
âθ
l
|
θ
l
=0
=
R
exp
h
â
P
i
â{
1
,
2
,
¡¡¡
,N
}\{
j
}
Îť
i
Ď
i
(
x
t
;
Îł
)
i
Ď
l
(
x
t
;
Îł
)
dx
R
exp
h
â
P
i
â{
1
,
2
,
¡¡¡
,N
}\{
j
}
Îť
i
Ď
i
(
x
t
;
Îł
)
i
dx
δ
jl
=
R
exp
h
â
P
i
â{
1
,
2
,
¡¡¡
,N
}\{
j
}
Îť
i
Ď
i
(
x
t
;
Îł
)
i
Ď
j
(
x
t
;
Îł
)
Ď
l
(
x
t
;
Îł
)
dx
R
exp
h
â
P
i
â{
1
,
2
,
¡¡¡
,N
}\{
j
}
Îť
i
Ď
i
(
x
t
;
Îł
)
i
dx
The second derivative of log-likelihood function with respect to
θ
j
â {
(
Îť
j
)
N
j
=1
}
under the
null can be represented by the second derivative of ln
C
(
θ
),
28
â
2
l
(
θ
)
âθ
j
âθ
l
|
θ
j
=0
=
T
¡
â
2
ln
C
(
θ
)
âθ
l
âθ
l
|
θ
j
=0
=
T
¡
(
δ
l
δ
j
â
δ
lj
)
where
θ
j
, θ
l
â {
(
Îť
i
)
N
i
=1
}
.
Let
Îł
= (
Îł
1
,
¡ ¡ ¡
, Îł
N
) and consider following equations,
âl
(
θ
)
âÎł
j
=
T
¡
â
ln
C
(
θ
)
âÎł
j
â
N
X
i
=1
Îť
i
T
X
t
=1
âĎ
i
(
x
t
;
Îł
)
âÎł
j
â
2
l
(
θ
)
âÎł
j
âÎł
l
=
T
¡
â
2
ln
C
(
θ
)
âÎł
j
âÎł
l
â
N
X
i
=1
Îť
i
T
X
t
=1
â
2
Ď
i
(
x
t
;
Îł
)
âÎł
j
âÎł
l
â
2
l
(
θ
)
âθ
j
âÎł
l
=
â
2
l
(
θ
)
âÎł
j
âθ
l
=
T
¡
â
2
ln
C
(
θ
)
âθ
j
âÎł
l
â
T
X
t
=1
âĎ
i
(
x
t
;
Îł
)
âÎł
l
where
θ
j
â {
(
Îť
i
)
n
i
=1
}
.
Under the null,
âl
(
θ
)
âÎł
j
|
θ
j
=0
=
T
¡
â
ln
C
(
θ
)
âÎł
j
|
θ
j
=0
â
X
i
â{
1
,
2
,
¡¡¡
,N
}\{
j
}
Îť
i
T
X
t
=1
âĎ
i
(
x
t
;
Îł
)
âÎł
j
â
2
l
(
θ
)
âÎł
j
âÎł
l
|
θ
j
=0
=
T
¡
â
2
ln
C
(
θ
)
âÎł
j
âÎł
l
|
θ
j
=0
â
X
i
â{
1
,
2
,
¡¡¡
,N
}\{
j
}
Îť
i
T
X
t
=1
â
2
Ď
i
(
x
t
;
Îł
)
âÎł
j
âÎł
l
â
2
l
(
θ
)
âθ
j
âÎł
l
|
θ
j
=0
=
â
2
l
(
θ
)
âÎł
j
âθ
l
|
θ
j
=0
=
T
¡
â
2
ln
C
(
θ
)
âθ
j
âÎł
l
|
θ
j
=0
â
T
X
t
=1
âĎ
i
(
x
t
;
Îł
)
âÎł
l
Then, the information matrix evaluated under the null hypothesis,
H
o
:
θ
(=
Îť
j
) = 0, can
ve written in the following form.
E
h
V
ΝΝ
V
Νγ
V
γΝ
V
γγ
i
(6.4)
where
V
ΝΝ
is
N
Ă
N
matrix in which elements are given as
V
Îť
i
Îť
j
=
â
â
2
l
(
θ
)
âÎť
i
âÎť
j
, for
i, j
=
1
,
2
,
¡ ¡ ¡
, N
,
V
Νγ
and
V
γΝ
are
N
Ă
K
and
K
Ă
N
matrix respectively with
V
Îť
i
Îł
j
=
V
Îł
j
Îť
i
=
â
â
2
l
(
θ
)
âÎť
i
âÎł
j
, for
i
= 1
,
2
,
¡ ¡ ¡
, N
and
j
= 1
,
2
,
¡ ¡ ¡
, K
and
V
γγ
is
K
Ă
K
matrix with
V
Îł
i
Îł
j
=
â
â
2
l
(
θ
)
âÎł
i
âÎł
j
, for
i, j
= 1
,
2
,
¡ ¡ ¡
, K
.
To see the impact of the nuisance parameter vector
Îł
, let us use the result of Pierce (1982).
For a statistic
T
n
(
¡
) depending on parameter vector
Îł
, the asymptotic variance of
T
n
(
θ
)
and
T
n
(Ë
θ
), where Ë
θ
is an efficient estimator of
θ
, are related by
29
V ar
h
â
nT
n
(Ë
θ
)
i
=
V ar
ÂŁ
â
nT
n
(
θ
)
¤
â
lim
n
ââ
E
¡
âT
n
(
θ
)
âθ
¸
0
V ar
h
â
n
Ë
θ
i
lim
n
ââ
E
¡
âT
n
(
θ
)
âθ
¸
In our notation,
T
n
(Ë
θ
) =
1
T
S
(Ë
θ
) =
δ
j
â
1
T
T
X
t
=1
Ď
j
(
x
t
; Ë
Îł
)
,
(6.5)
V
(
T
n
(
θ
)) can be replace by
j
th diagonal element of
V
ΝΝ
in
I
(
θ
). E
h
âT
n
(
θ
)
âθ
i
can be written
as 1
Ă
(
N
+
K
) vector,
E
¡
âT
n
(
θ
)
âθ
¸
= E
h
âδ
j
âÎť
1
,
¡ ¡ ¡
,
âδ
j
âÎť
N
,
âδ
j
âÎł
1
â
1
T
P
âĎ
j
âÎł
1
,
¡ ¡ ¡
,
âδ
j
âÎł
K
â
1
T
P
âĎ
j
âÎł
K
i
(6.6)
Since Rao-Score test statistic is given by
RS
=
T
¡
T
2
n
(Ë
θ
)
Ë
V
,
derivation of
V
h
â
T
¡
T
n
(Ë
θ
)
i
becomes complicated, where Ë
V
is the consistent estimatpr
of asymptotic variance of
â
T
¡
T
n
(Ë
θ
). Thus, we estimated variance of
T
n
(Ë
θ
) by bootstrap
method.
30
References
Agmon, N., Alhassid, Y. and Levine, R. D.
, 1981, âAn algorithm for
determining the Lagrange parameters in the maximum entropy formalismâ,
The Maximum Entropy Formalism,
Cambridge, MA: MIT Press, 207-209.
Aroian, L.A.
, 1948, âThe fourth degree exponential distribution functionâ,
Annals of Mathematical Statistics,
19, 589-592.
Bera, A.K. and Bilias, Y.
, 2002, âThe MM, ME, ML, EL, EF and GMM
approachs to estimation: a synthesisâ,
Journal of Econometrics,
107, 51-86.
Bera, A.K. and Ghosh, A.
, 2001, âNeymanâs smooth test and its appli-
cations in Econometricsâ,
Handbook of Applied Econometrics and Statistical
Inference,â
177-230.
Bera, A.K. and Higgins M.L.
, 1993, âARCH models: properties, esti-
mation and testingâ,
Journal of Economic Survey,
7, 305-366.
Bera, A.K. and Kim, S.
, 2002, âTesting constancy of correlation and other
specification of the BGARCH model with an application to international
equity returnsâ,
Journal of Empirical Finance,
9, 171-195.
Bera, A.K. and Lee S.
, 1993, ââInformation matrix test, parameter het-
erogeneity and ARCH: a synthesisâ,
Review of Economic Studies,
60, 229-
240.
Bera, A.K. and Ullah, A.
,1991, âRaoâs score test in econometricsâ,
Jour-
nal of Quantitative Economics,â
7(2), 189-220.
Bollerslev, T.
, 1986, âGeneralized autoregressive conditional heteroskedas-
ticityâ, Journal of Econometrics, 31, 307-327.
Bollerslev, T.
, 1987, âA conditionally heteroskedasticity time series model
for speculative prices and rates of returnâ,
Review of Economics and Statis-
tics,
69, 542-547
Bollerslev, T. and Wooldridge, J.M.
, 1988, âQuasi-maximum likeli-
hood estimation of dynamic models with time varying covarianceâ,
Econo-
metric Reviews,
â 11, 143-172.
Buchen, P. and Kelly, M.,
1996, âThe maximum entropy distribution of
an asset inferred from option pricesâ,
Journal of Financial and Quantitative
Analysis,
31(1), 143-159.
Chamberlain, G.
, 1987, âAsymptotic efficiency in estimation with condi-
tional moment restrictionsâ,
Journal of Econometrics,
34, 305-334.
Choi, E., Hall, P. and Presnell, B.
, 2000, âRendering parametric
procedures more robust by empirical tilting the modelâ,
Biometrika,
87,
453-465.
Cobb, L., Koppstein, P. and Chen, N.H.
, 1983, âEstimation and mo-
ment recursion relations for multimodal distributions of the exponential fam-
ilyâ,
Journal of the Ameriacan Statistical Association,
78, 124-130.
31
Cressie, N. and Read, T.
, 1984, âMultimodel goodness-of-feet testsâ,
Journal of the Royal Statistical Society. Ser B,
46. 440-464.
Dixit, A.K.
, 1976, Optimization in Economic Theory, Oxford University
Press.
Engle, R.F.
, 1982, âAutoregressive conditional heteroskedasticity with es-
timates of the variance of United Kingdom inflationâ,
Econometrica,
50,
987-1007.
Engle R.F. and Gonz´
alez-Rivera, G.
, 1991, âSemiparametric ARCH
modelsâ,
Journal of Business and Economic Statistics,
9, 345-359.
Frontini, N. and Tagliani, A.
, 1994, âMaximum entropy in the fi-
nite stieltjes and hamburger moment problemâ,
Journal of Mathematical
Physics,
35(12), 6748-6756.
Godambe, V.P.
, 1960, âAn optimal property of regular maximum likeli-
hood estimationâ,
The Annals of Mathematical Statistics,
31, 1208-1212.
Godambe, V.P. and Heyde, C.C.
, 1987, âQuasi-likelihood and optimal
equation,â
International Statistics Review,
55, 231-244.
Gokhale, D.V.
, 1975, âMaximum entropy characterization of some distri-
butionsâ, Statistical Distribution in Scientific Work, Vol 3, 299-304.
Golan, A. and Judge, G.G.
, 1996, âA maximum entropy approach to
empirical likelihood estimation and inferenceâ, ARE Working Paper, Uni-
versity of California, 34.
Golan, A., Judge, G. and Miller, D.
, 1996, Maximum Entropy Econo-
metrics Robust estimation with limited data, Wiley.
Hansen, B.E.
, 1994, âAutoregressive conditional density estimationâ,
In-
ternational Economic Review,
35, 705-730.
Hansen, L.P.
, 1982, âLarge sample properties of generalized method of
moments estimatorâ,
Econometrica,
50, 1029-1054.
Hansen, L.P., Heaton, J. and Yaron, A
, 1996, âFinite sample prop-
erties of some alternative GMM estimatorsâ,
Journal of Business and Eco-
nomic Statistics,
14(3), 262-280.
Hsieh, D.A.
, 1989, âModelling heteroscedasticity in daily foreign exchange
rateâ,
Journal of Business and Economic Statistics,
7, 307-317.
Higgins, M.L. and Bera, A.K.
, 1992, âA class of nonlinear ARCH mod-
elsâ,
International Economic Review,
33, 137-158.
Huber, P.J.
, 1980, Robust Statistics, Wiley, New York.
Imbens, G.W.
, 1993, âA new approach to generalized method od moments
estimationâ, Harvard Institute of Economic Research Working Paper 1633.
Imbens, G.W.
, 1997, âOne-step estimators for over-identified generalized
method of moments modelsâ,
Review of Economic Studies,
64, 359-383.
32
Imbens, G.W., Spady, R.H. and Johnson, P.
, 1998, âInformation the-
oretic approaches to inference in moment condition modelsâ,
Econometrica,
66, 333-357.
Imbens, G.W.
, 2002, âGeneralized method of moments and empirical like-
lioodâ,
Journal of Business and Economic Statistics,
20(4), 493-506.
Jaynes, E.
, 1979, âConcentration of distributionsâ, R. Rosenkrantz. E.
Jaynes: Paper on Probability, Statistics and Statistical Physics, Dordrecht,
Reidel.
Kagan, A.M, Linik, Y.V. and Rao, C.R.
, 1973, Characterization Prob-
lems in Mathematical Statistics, Wiley, New York.
Kendal, M.G. and A. Stuart
, 1977, The Advanced Theory of Statistics,
Volume 1, Griffen, London.
Kitamura, Y. and Stutzer, M.
, 1997, âAn information-theoretic alter-
native to generalized method of moments estimation.â,
Econometrica,
65,
861-874.
Kullback, S and Leibler, R.A.
, 1951, âOn information and sufficiencyâ,
Annals of Mathematical Statistics,
22, 79-86.
Lee, T.K.Y. and Tse, Y.K.
, 1991, âTerm structure of interest eates in
the Singapore asian dollar marketâ,
Journal of Applied Econometrics,â
6,
143-152.
Li, D.X. and Turtle, H.J.
,2000, âSemiparametric ARCH models: An es-
timating function approachâ,
Journal of Business and Economic Statistics,
18, 174-186.
Lisman, J.H.C. and van Zuylen, M.C.A.
, 1972, âNote on the generation
of most probable frequencyâ,
Statistica Neerlandica,
26, 19-23.
Lye, J.N. and Martin, V.L.
, 1993, âRobust estimation, nonnormalities,
and generalized exponential distributionsâ,
Journal of the American Sta-
tistical Association,
421, 261-267.
Matz, A.W.
, 1978, âMaximum likelihood parameter estimation for the
quartic exponential distributionâ,
Technometrics,
20, 475-484.
Milh
ø
j, A.
, 1985, âThe moment structure of ARCH processesâ,
Scandina-
vian Journal of Statistics,â
12, 281-292.
Mittelhammer, R.C., Judge, G.G. and Miller, D.J.
, 2000, Econo-
metric Foundation, Cambridge.
Mead, L.R. and Papanicolaou, N.
, 1984, âMaximum entropy in the
problem of momentsâ,
Journal of Mathematical Physics,
25(8), 2404-2417.
Nishii, R.
, 1988, âMaximum likelihood principle and model selection when
the true model is unspecifiedâ,
Journal of Multivariate Analysis,
27(2), 392-
403.
Nelson, D.B.
, 1991, âConditional heteroskedasticity in asset returns: A
33
new approachâ,
Econometrica,
51, 347-370.
Neyman, J.
1937. ââSmooth testâ for goodness of fit,â
Skandinaviske Akua-
rietidskrift,
20, 150-199.
Neyman, J. and Pearson, E.S.
1936. Contribution to the theory of
testing statistical hypothesis 1 : Unbiased critical regions of type
A
and
A
1
,
Statistical Research Memoirs,
1, 1-37.
Ord, J.K.
, 1972, Families of Frequency Distributions, New York, Hafner.
Ormoneit, D. and White, H.
, 1999, âAn efficient algorithm to compute
maximum entropy densitiesâ,
Econometric Review,
18(2), 127-140.
Owen, A.
2001. Empirical Likelihood, London, Chapman and Hall.
Premaratne, G. and Bera, A.K.
, 2000, âModelling asymmetry and
excess kurtosis in stock return data.â, Working paper, University of Illinois
at Urbana-Champaign.
Premaratne, G. and Bera, A.K.
, 2001, âA test for asymmetry with
leptokurtic financial data.â, Working paper, University of Illinois at Urbana-
Champaign.
Qin, J. and Lawless, J
, 1994, âGeneralized estimating equationsâ,
The
Annals of Statistics,
20, 300-325.
Rao, C.R.
, 2002, Linear Statistical Inference and Its Application. New
York.
Renyi, A.
, 1960, âOn measures of entropy and informationâ,
Proceeding of
the fourth berkeley Symposium on Mathematics, Statistics, and Probability,â
Vol I, 547.
Rich, R.W., Raymond, J. and Butler, J.S.
, 1991, âGeneralized in-
strumental variables estimation of autoregressive conditional heteroskedastic
modelsâ,
Economics letters,
35, 179-185.
Rockinger, M. and Jondeau, E.
, 2002, âEntropy densities with an ap-
plication to autoregressive conditional skewness and kurtosisâ,
Journal of
Econometrics,
106, 119-142.
Ryu, H.K.
, 1990, âOrthonormal basis and maximum entropy estimation
of probability density and regression functionsâ, Doctoral Dissertation, De-
partment of Economics, University of Chicago, IL.
Shannon, C.E.
, 1948, âThe mathematical theory of communicationâ,
Bell
System Technical Journal,
July-Oct.; reprinted in: C.E. Shannon and W.
Weaver, The mathematical theory of communication (University of Illinois
Press, Urbana, IL), 3-91.
Stuart, A. and J.K. Ord
, 1994, Kendallâs Advanced Theory of Statistics
Vol 1: Distribution Theory, Edward Arnold, London.
Weiss, A.A
, 1986, âAsymptotic theory for ARCH models: Estimation and
34
testingâ,
Econometric Theory,
2, 107-131.
White, H.
, 1982, âMaximum likelihood estimation of misspecified modelsâ,
Econometrica,
50, 1-25.
Wu, X.
, 2003, âCalculation of maximum entropy densities with application
to income distributionâ,
Journal of Econometrics,
115, 347-354.
Zellner, A. and Highfield, A.R.
, 1988, âCalculation of maximum en-
tropy distribution and approximation of marginalposterior distributionsâ,
Journal of Econometrics,
37, 195-209.
Zellner, A.
, 1998, âOn order invariance of maximum entropy procedures,â
Mimeo.
Zeller, A.
, 1991, âBayesian methods and entropy in economics and econo-
metricsâ, Maximum Entropy and Bayesian Methods, Kluwer, Amsterdam,
17-31.
35