convert

Maximum Entropy Autoregressive

Conditional Heteroskedasticity Model

∗

Anil K. Bera

Department of Economics, University of Illinois

1206 S. Sixth, Champaign, IL, 61820

abera@uiuc.edu

and

Sung Yong Park

Department of Economics, University of Illinois

1206 S. Sixth, Champaign, IL, 61820

sungpark@uiuc.edu

[ Preliminary paper ]

∗

We thank the participants of The Conference on Recent Developments in the Theory,

Method, and Application of Information and Entropy Econometrics at American Univer-
sity, Sep. 19-21, 2003 and The 13th Annual Meeting of Midwest Econometrics Group at
University of Mizzouri, Oct. 17-18,2003, for helpful comments and discussions. Corre-
spondence to: Anil K. Bera, Department of Economics, University of Illinois, 1206 S. 6th
street, Champaign, IL 61820, U.S.A.; email: anil@fisher.econ.uiuc.edu

Abstract

The maximum entropy (ME) approach is based on the efficient

use of available information. As is well known, many of the standard
families of distributions can be derived from the ME principle. One
purpose of this paper is to show how can we extract information func-
tional from the data; for example, if information , which is in the form
of estimating equation, is given as arithmetic moment conditions, the
resulting density has a very attractive form. We propose a criteria for
selecting side conditions in ME problems. We apply the criteria for for-
mulating autoregressive conditional heteroskedasticity (ARCH) model.
The procedure is illustrated with an application to the leptokurtic and
negatively skewed financial data.

Keywords

: Maximum entropy density, Semiparametric density, Den-

sity estimation, ARCH, GARCH.

Introduction

The second law of thermodynamics states that since there is an inherent
tendency for disorder to increase, the entropy of the physical universe in-
creased constantly. Jaynes (1979) pointed out that a particular measure
can be defined in the space of probability distribution such that the distri-
bution of higher entropy represent greater disorder, more smoother, or more
probable. If there is no prior information, uniform distribution maximizes
the entropy. Thus entropy measure is equivalent to minimize certain dis-
tance measure between uniform distribution and subject distribution. This
kind of measure or pseudo-distance function had been provided in statistics
literature [ Kullback and Leibler (1961), Renyi (1960), Cressie and Read
(1984) among others] If one maximizes this measure subject to certain con-
straints which possibly contain some unknown parameters, then one can
achieve very probable distribution and consistent estimates of parameters.
Generally, this approach is called information theoretic approach.

In this paper we use maximum entropy density as conditional density

function in ARCH type model. Maximum entropy density (MED) can be
obtained by maximizing Shannon’s (1948) entropy measure subject to known
moment conditions. It is well known that MED is the least biased distribu-
tion given known constraints. By choosing different sequences of side con-
straint functional, we have various flexible MED functions. We show some
useful characterization of MED in continuous case. Although MED has such
a flexible functional, there is a few research in economics and econometrics
literature. Since there are no possible analytic solution when the moment
constraints are more than two, we should solve the problem by numeri-
cal iterative nonlinear optimization. There have been valuable research in
applying MED in the economics literatures such as Zellner and Highfield
(1988), Buchen and Kelly (1996), Ormoneit and White (1999), Rockinger
and Jondeau (2002) and Wu (2003).

Since Engle’s (1982) pioneering work and its generalization by Bollerslev

(1986), ARCH type model has been widely used to explain the behavior of fi-
nancial time series. There have been numerous studies to extend ARCH type
model to a more general model in two ways. First extension has been con-
centrated on generalizing nonlinear conditional variance function. Geweke
(1986) and Milhøj (1987) suggested logarithmic ARCH model to eliminate
non-negative parameter restriction on conditional variance function. Nelson
(1991) proposed exponential GARCH (EGARCH) model to explain ‘lever-
age effect’. Higgins and Bera (1992) proposed a nonlinear ARCH model
(NARCH) and they show that logarithmic ARCH model is a special case

of NARCH. Second extension deals with the form of conditional density.
Various ARCH-type model with non-normal conditional density function
have been proposed to explain leptokurtic behavior of unconditional den-
sity function. Bollerslev (1987) used Student-t distribution. However, Hsieh
(1989) found that GARCH (1,1) model with Student-t distribution could
not capture the excess kurtosis daily returns in British pound and Japanese
yen. Nelson (1991) employed generalized error distribution (GED) distri-
bution with EGARCH model but found the estimated GED did not have
enough thick tail behavior to capture unconditional leptokurtosis. Condi-
tional skewed densities also had been used to explain the skewness of uncon-
ditional density. Lee and Tse (1991) used a distribution based on the first
three terms of the Gram-Charlier series. Engle and Gonz´alez-Rivera (1991)
adapted nonparametric conditional density and Premaratne and Bera (2000)
used Pearson type-IV distribution as conditional density function. If we im-
pose certain moment side conditions, we could attain normal, student-t,
GED and Pearson type-IV distribution by maximum entropy formalism. In
this sense, proposed MEARCH model is very general one. Rockinger and
Jondeau (2002) applied MED to GARCH model. Since they considered first
fourth arithmetic moment conditions, the resulting MED has the form of
quartic exponential distribution. However, quartic exponential distribution
could not sufficiently explain heavy tail behavior such as Student-t distri-
bution.

MEARCH model is quite related to other moment based estimation such

as generalized mothod of moments (GMM) and maximum empirical likeli-
hood (MEL) estimation. All these estimations might be considered within
estimating function (EF) approach.

The purpose of this paper is twofold. First, we present the characteriza-

tion of continuous MED and show how the moment equations capture be-
havior of skewed and leptokurtic financial time series in ARCH-type model.
Second, we introduce estimation procedure of MEARCH model and suggest
moment selection criteria based on Rao-Score test.

The rest of the paper is organized as follows. In Section 2, summary

of recent information-theoretic approach is provided. In Section 2, some
basic characteristics of MED is presented with characterization problem and
numerical procedure of MED. In Section 4, we proposed ARCH-type model
under the ME conditional density with estimation and moment selecting
test. Section 5 shows simple empirical applications to the daily return of
NYSE with specific moment equations which generate skewed and heavy
tail distribution. Conclude remark follows in Section 6.

Information Theoretic Approaches

Consider estimator based on minimization of Cressie-Read power divergence
(CRPD) measure (Cressie and Read (1984)). For a fixed scalar parameter

, let us define

(

p, q

) =

(

+ 1)

"µ

−

where

= (

, p

· · ·

, p

)

and

= (

, q

· · ·

, q

)

. We refer to the first

distribution in the argument list,

, and the second list,

, as subject dis-

tribution and reference distribution respectively. Cressie and Read (1984)
have used

(

p, q

) as a measure of the discrepancy between

and

(

p, q

)

provide very rich class of divergence measures. If we choose

−

where

1 vector of 1 and

, as

→

0, the limit of

(

p, q

) can be shown

by a L’Hospital’s rule

lim

→

(

−

, π

) =

(

−

, π

) =

−

ln(

)

−

ln(

)

which is the negative log empirical likelihood (EL) objective function except
for an additive constant where KL is Kullback-Leibler information. [see
Owen (2001) p.35, Imbens (1993) and Qin and Lawless (1994) for MEL
estimation] As

→ −

1, we have

lim

→−

(

−

, π

) =

(

π, n

−

1) =

+ ln(

)

which is same as negative maximum entropy empirical likelihood (MEEL)
[Mittelhammer, Judge and Miller (2000)] or exponential tilting estimation
(ET) [Golan and Judge (1996), Kitamura and Stutzer (1997), Imbens, Spady
and Johnson (1998), and Choi, Hall and Presnell (2000)] objective function
except for a constant. If

−

(

−

, π

) yields an objective function

of maximum log Euclidean likelihood (MLEL) which is quite related with
the generalized method of moment (GMM) estimator proposed by Hansen,
Heaton and Yaron (1996)

−

(

−

, π

) =

1
2

−

We consider the estimator which minimizes the CRPD measure with re-
spect to the empirical distribution and satisfies, the moment equations,

[

(

x, θ

)] = 0 for

= 1

· · ·

and the normalization constraint

1 where

∈

Θ, i.e, we solve the optimization problem.

min

π,θ

lim

→

(

−

, π

)

s.t

(

x, θ

) = 0

= 1

· · ·

, N

Objective function in minimization problem depends on constant

In over-identified case the role of

is similar to weight matrix ∆

−

[

(

Z, θ

)

(

Z, θ

)

)]

−

in GMM estimator in the sense that

and ∆

−

give efficient weight to estimate

. However, since on does not have infor-

mation about ∆

−

, ˆ

∆

−

should be estimated by inefficient estimates of ˜

by using arbitrary weight matrix in 2-step GMM estimator. Moreover, ˆ

chosen in a way that minimize CRPD measure. Thus, ˆ

-weighted estimate

would be more efficient in finite sample. If the side conditions are accu-
rate, MEL estimator is equivalent to 2-step GMM estimator up to order

(

−

) and the limiting distributions of MEL and MEEL estimates for

are identical under certain regularity conditions [see Imbens (1997)].

In the preceding section we focus on the problem in which we minimize the
CRPD measure in the case of

→ −

1, MEEL or ET estimation case under

the known side conditions. The solution of this problem is, well-known
in information theory literatures, maximum entropy distribution (MED)
which has exponential tilting parameters, the Lagrange multipliers in our
constraint optimization problem.

Maximum entropy density

The maximum entropy density is obtained by maximizing Shannon’s (1948)
entropy measure which is continuous version of negative lim

→−

(

−

, π

)

except for a constant, ln(

max

(

) =

−

(

) log

(

)

(3.1)

satisfying

(

)

(

)

= 0

· · ·

, N,

with the

having known values

Problem in (3.1) turns out a mathematical optimization problem subject

to given side conditions and can be represented by following Lagrangian,

−

(

) log

(

)

·Z

(

)

(

)

−

(3.2)

The solution of the problem (3.2) can be achieved by simple calculus of

variation,

(

) = exp





−

(

)





(3.3)

where

is calculated by normalization constraint

(

)

= 1. Thus,

can be expressed in terms of the remaining Lagrangian multipliers,

= log





exp





−

(

)









(3.4)

Let Ω(

) = exp

{

}

, where

stands for vector of (

, λ

· · ·

, λ

)

Equation (3.3) with equation (3.4) can be written as

(

) =

Ω(

)

exp





−

(

)





(3.5)

Ω(

) is known as the “partition function” that converts the relative proba-

bilities to absolute probabilities. [see Golan, Judge and Miller (1996), p23.]
One can define a potential Γ(

, λ

· · ·

, λ

) through the Legendre trans-

formation,

Γ(

, λ

· · ·

, λ

) = ln Ω(

) +

(3.6)

The Lagrangian multipliers,

, is determined by the

simultaneous equa-

tions,

∂

∂λ

= 0

⇒ −

∂

ln Ω

∂λ

= 1

· · ·

, N.

We can see the uniqueness of ME solution, if it exists, is provided by con-
vexity of Γ(

∂

∂λ

= Σ

[

j,k

]

(

)

(

) exp





−

(

)





−

(

) exp





−

(

)





(

) exp





−

(

)





where

[

]

is the variance-covariance matrix. Because variance-covariance

matrix is always positive definite, Γ(

) is a convex function of

The moment conditions can be interpreted as known prior information.
Given prior information, we can achieve a least unbiased distribution func-
tion by maximum entropy principle. Suppose we have no prior information,
which means no moment condition except for normalization constraint, then
the solution is uniform distribution. If we have one more additional infor-
mation about the state, say

(

)

0, then the solution takes

the form,

(

) =

exp[

−

] where

∈

∞

). If we have another ad-

ditional information, say

(

)

, then the solution will be normal

distribution.
All solutions are functions of Lagrangian multipliers. Like most optimization
problem, Lagrangian multipliers represent

“marginal contribution (shadow

price)”

of each constraint to the objective value. For example, suppose ˆ

estimated to be close to 0. Then, there is little contribution of moment con-
straint condition ,

(

)

, to the objective value. Consequently,

the Lagrangian multiplier reflect the information content of each constraint.
It can be shown that [Dixit (1976)]

∂H

( ˆ

)

∂µ

where

= arg max

(

)

That is, the multipliers provides rate of change of the maximum attainable
value with respect to the change in the constraints. If the

moment

constraint in (3.1) is true, that means the

moment condition would not

work as constraint, then, ˆ

should be very close to zero. If information in

the moment condition is useless, then change of optimal value is very close
to zero.
Although potential, Γ(

), has a unique solution, that may not exist given

certain

. Mead and Papanicolaou (1984) showed the necessary and suffi-

cient condition in arithmetic moments case that a MED exists is completely
monotonicity of the moment sequence,

{

, j

= 0

· · · }

over a finite

interval [0

1], which is called Hausdorff moment problem.

3.1

Characterization problem

Maximum entropy distribution has a very flexible functional form. By
choosing a sequence of known function

(

) for

= 0

· · ·

, N

, we

can generate a sequence of various flexible maximum entropy density func-
tions. Many well-known families of distributions can be obtained as special
cases of maximum entropy density function. It is known that the normal
distribution maximizes the entropy among continuous univariate distribu-
tion on (

−∞

∞

) with given mean and variance [see Rao (2002), p.163].

A similar property holds for the multivariate normal distribution. Kagan,
Linnik and Rao (1973) gave a list of well-known distributions, including
the Beta, Gamma, Exponential and Laplas distributions, which maximize
the entropy subject to certain constraints. Gokhale (1975) showed chara-
terizations of the normal, double exponential and cauchy families in the
univariate case and the Dirichlet and Wishart families in the multivariate
case. Especially, Cobb, Koppstein and Chen (1983) showed a general class
of multimodal density function within unified frame work by the stochas-
tic catastrophe model. Since the system behaves in stochastic catastrophe
model as if it moves towards the points of lowest potential, it has similar
property of maximum entropy principle. They proposed four major types
of cusp probability density function. The general functional is

(

) =

(

) exp

−

R h

(

)

(

)

where

(

) =

· · ·

k >

and

(

) has four kinds of forms: Type

(

) = 1

−∞

< x <

∞

Type

(

) =

< x <

∞

, Type

(

) =

< x <

∞

, Type

(

) =

−

)

< x <

1. The shape of density is determined by

polynomial

(

) which is called

shape polynimial

. If the order of polynomial

and

are one and two respectively, then equation represents Pearson

system. Zellner and Highfield (1988) showed that CKC (1983) distribution
can be derived from maximum entropy distribution under specified moment
conditions [see Table 1.].

Table 1: Maximum entropy density characterization in CKC (1983)

Type

Side constraints

(

)

= 0

· · ·

, k

+ 1

= 1

(

)

= 0

· · ·

, k.

log

(

)

(

)

= 0

· · ·

, k

−

log

(

)

(

)

(

)

= 0

· · ·

, k

−

log

(

)

log(1

−

)

(

)

Table 2 shows well-known types of distribution function which can be gen-
erated by certain given moment conditions. We can interpret MED in
information-theoretic way that by imposing prior moment conditions which
is inherent in data in hand we generate the least biased distribution function.
There are some links among provided MEDs in terms of

(

) in moment

equation. Consider moment functional,

, ln(

) and ln(1 +

), which

represent a degree of dispersion in Normal, Student-t and Cauchy distribu-
tion respectively where

is a degree of freedom in Student-t distribution.

If there is no moment condition but normalization constraint, the resulting
MED is uniform distribution. Thus, uniform distribution maximizes entropy
if we do not have prior information. As we impose some moment constraints,
entropy is lower than that of uniform distribution.

Figure 1: Moment function

(

) represent dispersion

−3

−2

−1

(

)

Functions representing dispersion and excess kurtosis

x^2
log(4+x^2)
log(10+x^2)
log(1+x^2)
|x|^(0.5)
|x|^(1.3)

Figure 1 shows

(

) in Normal, Student-t and Cauchy distribution cases.

We can interpret

pseudo weight function

in the sense that given random

variable,

, we increase information in

(

) by

(

) multiplying

. Thus,

at point

(

) in normal moment constraint penalize

event more

than those in student-t and cauchy distribution do to adhere maximum
value of entropy under constraints. Hence, this catches that tails of student-
t and cauchy distribution is more heavier than normal distribution. On the
other side, log

, log(1

−

) and arctan(

) represent measure of skewness.

Figure 2: Moment function

(

) represent skewness

−15

−10

−5

−1

(

)

Functions representing skewness

log(x/(1−x))
arctan(x/4)
arctan(x/10)
(log(x))^2

[log

] =

(

)

Γ(

)

used as

(

) in gamma distribution and

distribution

is a special case of gamma distribution when

[log

] =

Γ(1

+ log 2 for

≤

x <

∞

. If we combine 2 moment conditions

[log

] and

[log(1

−

)],

then beta distribution is generated within range 0

≤

1. arctan(

) is

measure of skewness in pearson type-IV distribution. Figure 1 shows that

(

) has an asymmetric functional whereas

(

) in the left panel has a

symmetric functional. This difference provides, intuitively, why

in the left

panel represent measure of kurtosis and

(

) in the right panel represents

measure of skewness. Premaratne and Bera (2001) used arctan(

) to test

asymmetry in leptokurtic distribution. Indeed, arctan(

) for

−∞

< x <

∞

log

for 0

≤

x <

∞

and log(

) would be more robust test statistic

than classical skewness and kurtosis parameter in the sense of Huber (1980).

able

Characterization

maxim

trop

densit

Side

constrain

Resulting

distribution,

(

)

Distribution

form,

(

)

Uniform

exp

[

−

]

−

≤

(

)

Exp.

(

)

(

exp

[

−

]

exp

−

x m

≤

∞

Normal

(

)

exp

−

√

exp

−

(

−

)

−∞

∞

(

−

)

(

)

Log

Normal

log

(

)

exp

−

log

−

(log

)

√

exp

−

(

−

)

∞

(log

−

)

(

)

Gen.

Exp.

(

)

exp

−

N n

exp

−

N n

−∞

∞

Double

Exp.

−

(

)

exp

−

(

)

exp

−|

−

/σ

−∞

∞

Gamma

(

)

(

∈

;

exp

[

−

log

]

Γ(

)

exp

−

∞

log

(

)

(

)

Γ(

)

Chi-square

(

)

exp

[

−

log

]

1 2

Γ(

)

exp

−

1 2

−

∞

log

(

)

(

1 2

)

Γ(

1 2

)

log

eibull

(

)

(

∈

exp

[

−

log

]

−

exp

[

−

]

∞

GED

(

)

(

)

exp

[

−

]

(

)

exp

[

]

−∞

∞

Beta

log

(

)

(

)

Γ(

)

−

(

)

Γ(

)

exp

[

−

log

−

log

−

)]

(

a,b

)

−

)

−

log

−

)

(

)

(

)

Γ(

)

−

(

)

Γ(

)

Cauc

log

)

(

)

log

exp

−

log

)

(1+

)

−∞

∞

Studen

t-t

log

(

)

(

)

log

(

)

exp

−

log

(

)

Γ[(

+1)

√

Γ(

(1+

)

(

+1)

−∞

∞

(

)

Γ(

)

−

(

)

Γ(

)

earson-t

tan

−

x r

(

)

(

)

exp

−

tan

−

x r

−

exp

tan

−

x r

´i

−∞

∞

log

(

)

(

)

(

)

−

log

(

)

Gen.

studen

t-t

−

(

)

exp

−

tan

−

(

x r

)

tan

−

(

x r

)

(

)

(

)

−

log

(

)

−

k i

−

log

(

)

(

)

(

)

Gen.

log

normal

−

(

)

exp

−

log

−

(log

(

))

log

(

)

(

)

−

k i

−

(log

(

))

(

)

(

)

Last two distributions in Table 2, generalized student-t and general-

ized log-normal distributions, are proposed by Lye and Martin (1983) which
generalized CKC (1983) distribution by extending

(

) and

(

) to include

more general function such as logarithm, trigonometric expression and so
on.

3.2

Numerical implementation

There are a few researches of optimizing procedure of ME problem in econo-
metrics literatures. Zellner and Highfield (1988) solved for MED, which
is quartic exponential distribution, by taking Taylor’s expansion on

and

using Newton method to optimize objective function. Ryu (1990) consid-
ered moment recursion of quartic exponential distribution to solve for MED.
Ormoneit and White (1999) applied same method as Zellner and Highfield
(1988) but they proposed more accurate way to minimize of numerical in-
tegration. Rockinger and Jondeau (2002) used Gauss-Legendre quadrature
in optimizing procedure and they calculated skewness and kurtosis bound-
ary numerically in which MED could exist. Wu (2003) proposed sequential
updating method which incorporates the moment constraints into the cal-
culation from lower to high moment and updates the density estimates.
However, all above numerical methods are used in arithmetic moment case.
Essentially, our procedure has the same spirit of other numerical procedure
but we deal with general moment case in ME problem.
The solution of the ME problem in general case is given by (3.3). The La-
grange multipliers,

, are calculated by solving following nonlinear equation.

(

) =

(

) exp

−

(

)

= 0

· · ·

, N

(3.7)

If we substitute

in (3.7) by (3.4), we can reduce

+ 1 dimensions to

(

) =

(

)





exp

−

(

)

exp

−

(

)





= 1

· · ·

, N

(3.8)

These equations can be solved by Newton method which consists of ex-
panding

(

) in Taylor’s series and drop higher order term and solve the

linear system iteratively. The linear system by first order Taylor’s expansion
around initial value

are give by

(

)

∼

(

) + (

−

)

∂G

(

)

∂λ

(

)

n, k

= 1

· · ·

, N.

(3.9)

(3.9) can be represented by the forms of vectors and matrix

(3.10)

where

and

−

(

)

· · ·

, µ

−

(

)

and the

th row and

th column element of the matrix

[

n,k

]

(

)

(

) exp

−

(

)

(3.11)

−

(

) exp

−

(

)

(

) exp

−

(

)

Starting from initial choices for

· · ·

, λ

, updated

are defined from

−

(3.12)

where

is positive definite matrix and

−

is its inverse matrix.

When we calculate the values of the matrix

and the vector

we need

a particular method of numerical integration. We adapt Gauss-Legendre
quadrature to perform numerical integration. Consider the case such that
we want to calculate Ω(

)

Ω(

) =

exp

(

−

(

)

(3.13)

Define

= [2

−

(

)]

(

−

). Substitute

in (3.13) by

[

(

−

) +

(

)]. Then, (3.7) is written as follows

−

(

−

)

exp

(

−

1
2

(

−

) + (

))

¶¸)

(3.14)

Define ˜

node

and

by nodes vector and weights vector of the Gauss-

Legendre quadrature respectively and ˜

new

≡

( ˜

node

(

−

) + (

)).

Then, (3.14) is as follows.

Ω(

) =

(

−

)

exp

−

( ˜

new

)

(3.15)

Consider integration problem of

(

)

(

) =

(

)





exp

−

(

)

exp

−

(

)





(3.16)

By using ˜

node

and

, (3.16) is written as follows

(

) =

Ω(

)

(

−

)

new

∗

exp

−

( ˜

new

)

oii

(3.17)

where

(

) is [

(

)

· · ·

, G

(

)]

and

∗

is element by element multiplica-

tion operator.

To estimate MED we propose the following procedure:

1. Set the domain where the density will be defined, [

l, u

2. Using

[

(

−

) + (

)] change the domain into [

−

1]. Use a

= 50 points Gaussian quadrature such that ˜

node

∈

[

−

1] and the

weights

where

= 1

· · ·

50.

3. At first step

= 0, set the initial values of

as [0

· · ·

0].

4. At step

, Calculate

(

)

and

(

)

by using numerical integration in

(3.9) and (3.11).

5. Let

(

)

be the solution to

(

)

(

)

(

)

6. Update the vector of Lagrange multipliers

(

)

(

−

(

)

7. Set

+ 1 and go to step 4 until

(

)

becomes appropriately small.

(use 1

−

10 as stopping rule)

Maximum entropy GARCH model

Various GARCH model under the assumption of non-normal conditional
density function have been proposed to explain leptokurtic behavior of un-
conditional density function. Bollerslev (1987) used Student-t distribution
as conditional distribution in GARCH model. However, Hsieh (1989) found
that GARCH (1,1) model with Student-t distribution as a conditional dis-
tribution could not capture the daily returns in British pound and Japanese
yen. Nelson (1991) employed generalized error distribution (GED) distri-
bution with exponential GARCH model but he found estimated GED has
not enough thick tail behavior to capture unconditional leptokurtosis. On
the other hand, conditional skewed density also had been used to explain
the skewness of unconditional density. Lee and Tse (1991) used a distri-
bution based on the first three terms of the Gram-Charlier series. Engle
and Gonz´alez-Rivera (1991) adapted nonparametric conditional density and
Premaratne and Bera (2000) used Pearson type-IV distribution as condi-
tional density function. Pearson type-IV distribution is derived by maxi-
mum entropy principle by the side condition, say

[tan

−

(

)] =

(

) and

[log(

)] =

(

). Thus, we can incorporate with Pearson type-IV

distribution within MED formulation. Rockinger and Jondeau (2002) ap-
plied MED to GARCH model. Since they considered first fourth arithmetic
moment condition, the resulting MED has the form of quartic exponential
distribution. However, quartic exponential distribution could not sufficiently
explain heavy tail behavior such as Student-t distribution.
Let us consider the following model:

(

;

) +

= 1

· · ·

, T

(4.1)

with

−

∼

, h

is the unknown density function of

conditional

on the set of past information

−

(

) is the conditional mean function,

is a

1 vector of exogenous variables,

is a vector of parameter and

is random variable parameterized as

p
j

−

q
j

−

MED is the least biased density function and minimize Kullback-Leibler
information criteria under the correctly specified moment condition. We
can write conditional density function in general MED as follows,

(

) = exp

−

√

¶#

(4.2)

where

(

) satisfies

[

(

)] = 0. The quasi-log density function at time

is given by

(

) =

−

√

−

1
2

(4.3)

where

= (

α, β, ζ, λ

)

∈

Θ. Given this, the qiasi-log likelihood function is

(

) =

(

MEARCH approach is quite related with other semi-nonparametric ARCH
approach. First, first order condition of log quasi-likelihood yields quasi-
score function.

∂l

(

)

∂α

1
2

(

)

∂h

∂α

(

−

)

−

1
2

= 0

(4.4)

∂l

(

)

∂β

1
2

(

)

∂h

∂β

(

−

)

−

1
2

= 0

(4.5)

∂l

(

)

∂ζ

(

)

∂x

∂ζ

√

= 0

(4.6)

∂l

(

)

∂λ

−

∂λ

−

(

) = 0

(4.7)

If underlined conditional density is correctly specified, then equations (4.4)
- (4.7) are optimal estimating equation (EF) [ see Godambe (1960) ]. Li and
Turtle (2000) proposed semiparametric ARCH model within EF approach
and the optimal EF for GARCH model are given by

∗

−

∂h

∂δ

(

+ 2

−

)

(4.8)

∗

−

∂x

∂ζ

∂x

∂ζ

−

∂h

∂ζ

(

+ 2

−

)

(4.9)

where

= (

α, β

) and

−

= (

−

)

−

(

−

)

[(

−

)

−

]

[(

−

)

−

]

Equations (4.8) and (4.9) are actually same as efficient generalized method
of moment (GMM) moment conditions which are attainable by optimal in-
strumental variables. There is no

priori

assumption of conditional density

in EF approach. Under the conditional normality assumption, equations
(4.8) and (4.9) are equivalent to the first order condition of Engle (1982)
up to sign change. Quasi-score function, equation (4.4) - (4.7), should be
equivalent to those of Engle (1982) under the conditional normality condi-
tion. The gains of Li and Turtle (2000)’s approach is that we do not have
to assume any conditional density. However, in practice,

and

should

be specified in some way. Let us consider , although extreme, conditional
cauchy density case. Parameters in GARCH model cannot be estimated
consistently in EF approach under the conditional cauchy case. MEARCH
approach is more robust in this sense, since all we need the moment condition
is

log(1 +

)

Secondly, extreme estimation problem can be represented by MEEL estima-
tion procedure.

max

,θ

−

s.t.

(

η, θ

)) = 0

= 1

(4.10)

where

(

) is a vector of quasi-score functions. Imbens et.al.(1998) argued

that the influence function of exponential tilting estimator stays bounded
where that of EL become unbounded even if

(

) is bounded. Since ME esti-

mates incorporate the additional moment equation information, ˆ

weighted

estimates would be more efficient in finite sample. Under certain regularity
condition MEEL estimators are consistent and asymptotically normal, i.e.,

(ˆ

M EEL

−

)

−→

Σ)

where

Σ =

∂h

(

)

, θ

)

∂θ

−

(

)

, θ

)

(

)

, θ

)

∂h

(

)

, θ

)

∂θ

−

which is equivalent to asymptotic properties of quasi maximum likelihood
estimators. Imbens (1997) showed that under certain regularity conditions,
the limiting distributions of MEEL and MEL estimators for

are the same.

Finally, when we add more side moment conditions in (4.10), then ˆ

will

become more wriggle. In this sense maximum entropy mechanism works
for smoothing curves. If we add more plausible moments in (4.10), ˆ

will

be similar to nonparametric density estimates. Since Engle and Gonz´alez-
Rivera (1991) used nonparametric conditional density their approach could
be considered within our framework.

4.1

Estimation

Although maximum likelihood (ML) estimates are conceptually different
from ME procedure, it is mathematically equivalent to ME procedure when
we replace known moments by consistent estimates. Since distributions in
exponential family have a unique ML solution, ME solution is also unique, if
it exists. Thus, we replace known moments by sample moments,

(

;

and perform a maximization of quasi-likelihood function that is based on
estimated MED. Quasi-log likelihood function, (4.3), can be represented as
follows;

(

) =

log

−

1
2

log

(4.11)

Since the standardized residual follows MED in our model, the quasi-likelihood
function is given by

(

) =

−

1
2

log

(4.12)

where

(

) is associated moment function. We optimize the objective func-

tion, (4.10), by following procedures.

1. Estimate initial consistent estimates of

by an appropriate QMLE

and forms standardized residual, ˆ

, from those estimates.

2. Calculate

(ˆ

, γ

) and use these estimates as known moments

in ME problem.

3. Estimate MED by proposed numerical algorithm and fix this MED as

conditional density function,

, in (4.11). Iterate (4.12) and evaluate

quasi-log likelihood function until we get convergence.

4. With the standardized residual from step 3, continue with step 2 and

3 until each

gets convergent result.

A range of numerical optimization routine can by used. We adapted the
Broyden, Fletcher, Goldfarb and Shannon (BFGS) algorithm. For compu-
tational convenience, the derivatives are computed numerically.
Our estimates share asymptotic properties with those of QMLE that attain
asymptotic consistency and asymptotic normality.[ White (1982), Bollerslev
and Wooldridge (1988)]

QM LE

∈

arg max

∈

(

;

)

The limit distribution is given by

√

QM LE

−

−→

, A

−

where

is the quasi-true parameter and

−

∂

(

)

∂θ∂θ

−

∂l

(

)

∂θ

∂l

(

)

∂θ

The matrix

and

are consistently estimated by

−

∂

(ˆ

QM LE

)

∂θ∂θ

−

∂l

(ˆ

QM LE

)

∂θ

∂l

(ˆ

QM LE

)

∂θ

−

If conditional density function is correctly specified with the true density,
then we have well known asymptotic normality,

√

QM LE

−

−→

, B

−

4.2

Moment selection test

Consider we are interested in whether our prior is informative. Since the
multipliers provides rate of change of the maximum attainable value with
respect to the change in the constraints,

should be very closed to zero if

its associated moment equation does not convey any valuable information.
Imbens et.al.(1998) provided a Lagrangian multiplier test to test the validity
of moment equation in over-identified moment equation case. Imbens et.al
(1997) and Kitamura and Stutzer (1997) also provided various tests of the
validity of moment equation. Our test statistic based on Rao-Score (RS)
test. The score test will be based on

(

) =

∂`

(

)

∂λ

[

(

;

)]

−

(

;

)

= 1

· · ·

, N,

(4.13)

which, under the null,

= 0 where

∈ {

(

)

}

, reduced to

(

)

−

(

;

)

(4.14)

where

(

;

) exp

−

∈{

···

}\{

}

(

;

)

exp

−

∈{

···

}\{

}

(

;

)

(4.15)

We, again, adapted Gauss-Legendre quadrature to perform numerical inte-
gration. The RS test statistic is given by

(ˆ

)

(

) =

−

(

;

)

(4.16)

where ˆ

is the consistent estimator of asymptotic variance of

√

(ˆ

Under null hypothesis, RS will be distributed as

asymptotically. Pierce

(1982) provide useful result to calculate asymptotic variance of

√

(ˆ

)

√

(ˆ

)

[

(

)]

−

lim

→∞

∂T

(

)

∂θ

(

√

) lim

→∞

∂T

(

)

∂θ

(

√

) becomes block diagonal matrix and lim

→∞

∂T

(

)

∂θ

contains

zero elements in many cases. However, since

(

;

) is determined by

given moment conditions there is high tendency that

(

√

) is not block

diagonal matrix neither lim

→∞

∂T

(

)

∂θ

does not contains several zero

values. This possibility makes calculating

[

√

(ˆ

)] complicate. [see

appendix about derivation of

[

√

(ˆ

)]. Thus, we estimate variance of

√

(ˆ

) using bootstrap method.

Empirical illustration

We considered the daily returns of NYSE (equally weighted returns (EWR))
from the CRPS database for Oct, 1990 - Dec, 1996, a total of 1,771 observa-
tions. The data is plotted in Figure 3 and the summary statistics of sample
moments are presented in Table 3. The horizontal line in Figure 3 repre-
sents sample mean value of the data. The sample kurtosis and skewness
are 7

611827 and

−

943375 respectively and Jarque and Bera (JB) normal-

ity test statistics is 1832

157. This indicates high non-normality. Figure 3

indicates a high degree of conditional heteroskedasticity. To explain such
behaviors of stock return data, basically, we need to consider a model which
captures dynamic higher moment structure and distributional characteris-
tics simultaneously.

Table 3: Summary statistics

Number of observation

1771

Mean

0.0013

Median

0.0018

Standard Deviation

0.0052

Skewness

-0.9434

Kurtosis

7.6118

Max

0.0236

Min

-0.0332

Q(12)

368.55

Q(24)

452.28

Q(36)

476.49

JB normality test

1832.157

For empirical illustration purpose, we estimated MEARCH model in which
moment conditions are given by

arctan(

)

(

) and

log(

)

(

). These two moment conditions yield GARCH model which is gener-

Figure 3: Equal weighted returns (EWR) data

500

1000

1500

−0.03

−0.02

−0.01

0.00

0.01

0.02

Time

EWR

Equally weighted returns data

ated by the Pearson type-IV distribution. Estimates of GARCH(1,1) model
under the conditional normality, student-t distribution and MED are given
in Table 4. The standard error, which are obtained by “robust” covariance
estimates, are given in the parenthesis. It is clear that MED-GARCH(1,1)
model is better in terms of log-likelihood values. Nishii (1988) showed that
Aikaike information criteria (AIC) is not a consistent model selection criteria
when the true model is unspecified. On the other hand, Schwarz information
criteria (SIC) satisfies the necessary condition to be a consistent model se-
lection criteria [see Theorem 5. in Nishii (1988)]. MED-GARCH(1,1) model
has the lowest SIC value. We could consider the difference of log-likelihood
values between Normal-GARCH(1,1) model and Student’s t-GARCH(1,1)
model, 101

94, as increments for incorporating the excess kurtosis. Differ-

encce of the likelihood values between student-t-GARCH(1,1) and MED-

GARCH(1,1), 21

55, could be thought of as gain by the skewed density

function. Estimated degree of freedom, 4

6662, in Student-t-GARCH(1,1)

model does support that conditional normal model might not explain high
leptokurtic behavior. In MED-GARCH(1,1) model,

= 4

3397 is the asso-

ciated to degree of freedom in student-t GARCH(1,1) model and it is quite
similar.
We can detect skewed and leptokurtic behaviors by Lagrange multipliers,

and

. If underlying conditional distribution is free of skewness, then

which is associated with moment function, arctan(

), should be closed to

zero. If there is no further excess kurtosis in conditional distribution, then

which corresponds to log(

), should be closed to zero. To test validity

of moment equations we performed RS test by applying bootstrap methods.
1

770 observations are resampled from ˆ

√

by 1

000 times. The RS

test statistics are 4

08 and 13

47 for

and

respectively. Therefore, none

of moment equations are redundant. This implies that there are leptokurtic
and skewed behaviors at the same time.
Figure 4 presents conditional densities for all the models. MED-GARCH(1,1)
model explain excess kurtosis and skewness at the same time. Figure reveals
that there is a strong asymmetry in the data along with excess kurtosis. The
skewness is due to negative realizations. MED-GARCH(1,1) model explains
better tail behavior than student-t GARCH(1,1) model. In many finance
literature there are numerous research in conditional value at risk (VaR).
If we concern VaR analysis, parametric model might be better to explain
tail behavior than nonparametric model under the assumption such that the
model is correctly specified. Extremely, adding more and more moment con-
straints, conditional density of MEARCH model is more like nonparametric
density. We might interpret MEARCH model such that it has a smooth
and sensible conditional density function that is generated by extracting
plausible information from the data and our MED approach could provide
improved estimates.

−2

0.0

0.1

0.2

0.3

0.4

Return

conditional density

Figure 4: Comparison of the MED conditional density of EWR to a normal
density

1) and to student-t with 4.6663 df:

· · ·

, normal;

−−−

, student-

t; lines, MED

Table 4: GARCH estimates with three kinds of densities

Normal-GARCH(1,1)

Student-t-GARCH(1,1)

MED-GARCH(1,1)

GARCH(1,1)

3.954e-06

2.664e-06

1.541e-06

(7.999e-07)

(7.493e-07)

(3.900e-07)

0.1694

0.1874

0.1070

(0.0277)

(0.0360)

(0.0197)

0.6595

0.7102

0.6828

(0.0537)

(0.0543)

(0.0537)

AR(1)

0.0010

0.0012

0.0024

(0.0001)

(9.845e-05)

(9.978e-05)

0.3283

0.3042

0.2839

(0.0264)

(0.0239)

(0.0235)

Distribution Parameters

4.6662

2.0832

(0.4954)

(0.1293)

-2.9124

0.7001

2.6829

Lik

7044.248

7146.188

7167.738

AIC

-14078.496

-14280.376

-14317.476

SIC

-14051.102

-14247.504

-14268.167

500

1000

1500

0.00000

0.00005

0.00010

0.00015

Time

Conditional variance

Conditional variance estimated from MED−GARCH(1,1)

Figure 5: Conditional variance estimated from MED-GARCH(1,1)

Concluding remarks and future research

In this paper, we have shown a generalization of GARCH model by incor-
porating MED as conditional density function. We characterized MED and
argued which moment function might be used in skewed and heavy tail dis-
tributions. In empirical application, we selected one case which could take
account of skewed and leptokurtic behavior of stock return data and showed
MEARCH model is quite useful in explaining behavior of financial time se-
ries. The moment selection procedure was performed by testing Lagrange
multipliers in Rao-Score testing procedure. Many other moment equations
or mixture of given moment equations could be chosen to generate general
and plausible conditional density. We plan to use other moment conditions
to build a more general MEARCH model.

Appendix

A.1.

Derivation of moment selection test based on the Rao-Score test.

Consider maximum entropy density,

(

, θ

) =

(

) exp

−

(

;

)

which satisfies

(

;

) =

[

(

;

)

−

(

)] = 0 where

{

λ, γ

} ∈

Θ. Then the

corresponding log-likelihood function can be written as follows,

(

) =

(

)

−

(

;

)

where ln

(

) =

exp

N
i

(

;

)

The score function is

(

) =

∂l

(

)

∂θ

∂

(

)

∂θ

−

(

;

)

where

∈

Since

(

)

= 1,

∂C

(

)

∂θ

exp

−

(

;

)

−

(

)

exp

−

(

;

)

(

;

)

= 0

(6.1)

Then, by dividing both sides by

(

) in equation (6.1) we can get,

∂

(

)

∂θ

exp

−

N
i

(

;

)

(

;

)

exp

−

N
i

(

;

)

(

;

))

Evaluating above equation under the null,

= 0 where

∈ {

(

)

}

∂

(

)

∂θ

exp

−

{

···

}\{

}

(

;

)

(

;

)

exp

−

{

···

}\{

}

(

;

)

≡

Thus, under the null the score function can be represented as follows,

(

) =

T δ

−

(

;

)

Again,

∂

(

)

∂θ

can be rewritten to obtain second derivative as follows,

∂

(

)

∂θ

(

)

exp

−

(

;

)

(

;

)

dx.

Then, the second derivative of ln

(

) are

∂

(

)

∂θ

∂C

(

)

∂θ

exp

−

(

;

)

(

;

)

−

(

)

exp

−

(

;

)

(

;

)

(

;

)

dx,

(6.2)

Above equation (6.2) can be represented as

∂

(

)

∂θ

(

)

∂

(

)

∂θ

exp

−

(

;

)

(

;

)

−

exp

−

(

;

)

(

;

)

(

;

)

∂

(

)

∂θ

[

(

;

)]

−

[

(

;

)

(

;

)]

(6.3)

where

, θ

∈ {

(

)

}

Then, we can represent second derivative of ln

(

) as follows,

∂

(

)

∂θ

−

where

∂

(

)

∂θ

exp

−

∈{

···

}\{

}

(

;

)

(

;

)

exp

−

∈{

···

}\{

}

(

;

)

exp

−

∈{

···

}\{

}

(

;

)

(

;

)

(

;

)

exp

−

∈{

···

}\{

}

(

;

)

The second derivative of log-likelihood function with respect to

∈ {

(

)

}

under the

null can be represented by the second derivative of ln

(

∂

(

)

∂θ

∂

(

)

∂θ

(

−

)

where

, θ

∈ {

(

)

}

Let

= (

· · ·

, γ

) and consider following equations,

∂l

(

)

∂γ

∂

(

)

∂γ

−

∂φ

(

;

)

∂γ

∂

(

)

∂γ

∂

(

)

∂γ

−

∂

(

;

)

∂γ

∂

(

)

∂θ

∂γ

∂

(

)

∂γ

∂θ

∂

(

)

∂θ

∂γ

−

∂φ

(

;

)

∂γ

where

∈ {

(

)

}

Under the null,

∂l

(

)

∂γ

∂

(

)

∂γ

−

∈{

···

}\{

}

∂φ

(

;

)

∂γ

∂

(

)

∂γ

∂

(

)

∂γ

−

∈{

···

}\{

}

∂

(

;

)

∂γ

∂

(

)

∂θ

∂γ

∂

(

)

∂γ

∂θ

∂

(

)

∂θ

∂γ

−

∂φ

(

;

)

∂γ

Then, the information matrix evaluated under the null hypothesis,

) = 0, can

ve written in the following form.

λλ

λγ

γλ

γγ

(6.4)

where

λλ

matrix in which elements are given as

−

∂

(

)

∂λ

, for

i, j

· · ·

, N

λγ

and

γλ

are

and

matrix respectively with

−

∂

(

)

∂λ

∂γ

, for

= 1

· · ·

, N

and

= 1

· · ·

, K

and

γγ

matrix with

−

∂

(

)

∂γ

, for

i, j

= 1

· · ·

, K

To see the impact of the nuisance parameter vector

, let us use the result of Pierce (1982).

For a statistic

(

) depending on parameter vector

, the asymptotic variance of

(

)

and

(ˆ

), where ˆ

is an efficient estimator of

, are related by

V ar

√

(ˆ

)

V ar

√

(

)

−

lim

→∞

∂T

(

)

∂θ

V ar

√

lim

→∞

∂T

(

)

∂θ

In our notation,

(ˆ

) =

(ˆ

) =

−

(

; ˆ

)

(6.5)

(

)) can be replace by

th diagonal element of

λλ

(

). E

∂T

(

)

∂θ

can be written

as 1

(

) vector,

∂T

(

)

∂θ

= E

∂δ

∂λ

· · ·

∂δ

∂λ

∂δ

∂γ

−

∂φ

∂γ

· · ·

∂δ

∂γ

−

∂φ

∂γ

(6.6)

Since Rao-Score test statistic is given by

(ˆ

)

derivation of

√

(ˆ

)

becomes complicated, where ˆ

is the consistent estimatpr

of asymptotic variance of

√

(ˆ

). Thus, we estimated variance of

(ˆ

) by bootstrap

method.

References

Agmon, N., Alhassid, Y. and Levine, R. D.

, 1981, “An algorithm for

determining the Lagrange parameters in the maximum entropy formalism”,

The Maximum Entropy Formalism,

Cambridge, MA: MIT Press, 207-209.

Aroian, L.A.

, 1948, “The fourth degree exponential distribution function”,

Annals of Mathematical Statistics,

19, 589-592.

Bera, A.K. and Bilias, Y.

, 2002, “The MM, ME, ML, EL, EF and GMM

approachs to estimation: a synthesis”,

Journal of Econometrics,

107, 51-86.

Bera, A.K. and Ghosh, A.

, 2001, “Neyman’s smooth test and its appli-

cations in Econometrics”,

Handbook of Applied Econometrics and Statistical

Inference,”

177-230.

Bera, A.K. and Higgins M.L.

, 1993, “ARCH models: properties, esti-

mation and testing”,

Journal of Economic Survey,

7, 305-366.

Bera, A.K. and Kim, S.

, 2002, “Testing constancy of correlation and other

specification of the BGARCH model with an application to international

equity returns”,

Journal of Empirical Finance,

9, 171-195.

Bera, A.K. and Lee S.

, 1993, “”Information matrix test, parameter het-

erogeneity and ARCH: a synthesis”,

Review of Economic Studies,

60, 229-

240.

Bera, A.K. and Ullah, A.

,1991, “Rao’s score test in econometrics”,

Jour-

nal of Quantitative Economics,”

7(2), 189-220.

Bollerslev, T.

, 1986, “Generalized autoregressive conditional heteroskedas-

ticity”, Journal of Econometrics, 31, 307-327.

Bollerslev, T.

, 1987, “A conditionally heteroskedasticity time series model

for speculative prices and rates of return”,

Review of Economics and Statis-

tics,

69, 542-547

Bollerslev, T. and Wooldridge, J.M.

, 1988, “Quasi-maximum likeli-

hood estimation of dynamic models with time varying covariance”,

Econo-

metric Reviews,

” 11, 143-172.

Buchen, P. and Kelly, M.,

1996, “The maximum entropy distribution of

an asset inferred from option prices”,

Journal of Financial and Quantitative

Analysis,

31(1), 143-159.

Chamberlain, G.

, 1987, “Asymptotic efficiency in estimation with condi-

tional moment restrictions”,

Journal of Econometrics,

34, 305-334.

Choi, E., Hall, P. and Presnell, B.

, 2000, “Rendering parametric

procedures more robust by empirical tilting the model”,

Biometrika,

87,

453-465.

Cobb, L., Koppstein, P. and Chen, N.H.

, 1983, “Estimation and mo-

ment recursion relations for multimodal distributions of the exponential fam-

ily”,

Journal of the Ameriacan Statistical Association,

78, 124-130.

Cressie, N. and Read, T.

, 1984, “Multimodel goodness-of-feet tests”,

Journal of the Royal Statistical Society. Ser B,

46. 440-464.

Dixit, A.K.

, 1976, Optimization in Economic Theory, Oxford University

Press.

Engle, R.F.

, 1982, “Autoregressive conditional heteroskedasticity with es-

timates of the variance of United Kingdom inflation”,

Econometrica,

50,

987-1007.

Engle R.F. and Gonz´

alez-Rivera, G.

, 1991, “Semiparametric ARCH

models”,

Journal of Business and Economic Statistics,

9, 345-359.

Frontini, N. and Tagliani, A.

, 1994, “Maximum entropy in the fi-

nite stieltjes and hamburger moment problem”,

Journal of Mathematical

Physics,

35(12), 6748-6756.

Godambe, V.P.

, 1960, “An optimal property of regular maximum likeli-

hood estimation”,

The Annals of Mathematical Statistics,

31, 1208-1212.

Godambe, V.P. and Heyde, C.C.

, 1987, “Quasi-likelihood and optimal

equation,”

International Statistics Review,

55, 231-244.

Gokhale, D.V.

, 1975, “Maximum entropy characterization of some distri-

butions”, Statistical Distribution in Scientific Work, Vol 3, 299-304.

Golan, A. and Judge, G.G.

, 1996, “A maximum entropy approach to

empirical likelihood estimation and inference”, ARE Working Paper, Uni-

versity of California, 34.

Golan, A., Judge, G. and Miller, D.

, 1996, Maximum Entropy Econo-

metrics Robust estimation with limited data, Wiley.

Hansen, B.E.

, 1994, “Autoregressive conditional density estimation”,

In-

ternational Economic Review,

35, 705-730.

Hansen, L.P.

, 1982, “Large sample properties of generalized method of

moments estimator”,

Econometrica,

50, 1029-1054.

Hansen, L.P., Heaton, J. and Yaron, A

, 1996, “Finite sample prop-

erties of some alternative GMM estimators”,

Journal of Business and Eco-

nomic Statistics,

14(3), 262-280.

Hsieh, D.A.

, 1989, “Modelling heteroscedasticity in daily foreign exchange

rate”,

Journal of Business and Economic Statistics,

7, 307-317.

Higgins, M.L. and Bera, A.K.

, 1992, “A class of nonlinear ARCH mod-

els”,

International Economic Review,

33, 137-158.

Huber, P.J.

, 1980, Robust Statistics, Wiley, New York.

Imbens, G.W.

, 1993, “A new approach to generalized method od moments

estimation”, Harvard Institute of Economic Research Working Paper 1633.

Imbens, G.W.

, 1997, “One-step estimators for over-identified generalized

method of moments models”,

Review of Economic Studies,

64, 359-383.

Imbens, G.W., Spady, R.H. and Johnson, P.

, 1998, “Information the-

oretic approaches to inference in moment condition models”,

Econometrica,

66, 333-357.

Imbens, G.W.

, 2002, “Generalized method of moments and empirical like-

liood”,

Journal of Business and Economic Statistics,

20(4), 493-506.

Jaynes, E.

, 1979, “Concentration of distributions”, R. Rosenkrantz. E.

Jaynes: Paper on Probability, Statistics and Statistical Physics, Dordrecht,

Reidel.

Kagan, A.M, Linik, Y.V. and Rao, C.R.

, 1973, Characterization Prob-

lems in Mathematical Statistics, Wiley, New York.

Kendal, M.G. and A. Stuart

, 1977, The Advanced Theory of Statistics,

Volume 1, Griffen, London.

Kitamura, Y. and Stutzer, M.

, 1997, “An information-theoretic alter-

native to generalized method of moments estimation.”,

Econometrica,

65,

861-874.

Kullback, S and Leibler, R.A.

, 1951, “On information and sufficiency”,

Annals of Mathematical Statistics,

22, 79-86.

Lee, T.K.Y. and Tse, Y.K.

, 1991, “Term structure of interest eates in

the Singapore asian dollar market”,

Journal of Applied Econometrics,”

143-152.

Li, D.X. and Turtle, H.J.

,2000, “Semiparametric ARCH models: An es-

timating function approach”,

Journal of Business and Economic Statistics,

18, 174-186.

Lisman, J.H.C. and van Zuylen, M.C.A.

, 1972, “Note on the generation

of most probable frequency”,

Statistica Neerlandica,

26, 19-23.

Lye, J.N. and Martin, V.L.

, 1993, “Robust estimation, nonnormalities,

and generalized exponential distributions”,

Journal of the American Sta-

tistical Association,

421, 261-267.

Matz, A.W.

, 1978, “Maximum likelihood parameter estimation for the

quartic exponential distribution”,

Technometrics,

20, 475-484.

Milh

j, A.

, 1985, “The moment structure of ARCH processes”,

Scandina-

vian Journal of Statistics,”

12, 281-292.

Mittelhammer, R.C., Judge, G.G. and Miller, D.J.

, 2000, Econo-

metric Foundation, Cambridge.

Mead, L.R. and Papanicolaou, N.

, 1984, “Maximum entropy in the

problem of moments”,

Journal of Mathematical Physics,

25(8), 2404-2417.

Nishii, R.

, 1988, “Maximum likelihood principle and model selection when

the true model is unspecified”,

Journal of Multivariate Analysis,

27(2), 392-

403.

Nelson, D.B.

, 1991, “Conditional heteroskedasticity in asset returns: A

new approach”,

Econometrica,

51, 347-370.

Neyman, J.

1937. ”’Smooth test’ for goodness of fit,”

Skandinaviske Akua-

rietidskrift,

20, 150-199.

Neyman, J. and Pearson, E.S.

1936. Contribution to the theory of

testing statistical hypothesis 1 : Unbiased critical regions of type

and

Statistical Research Memoirs,

1, 1-37.

Ord, J.K.

, 1972, Families of Frequency Distributions, New York, Hafner.

Ormoneit, D. and White, H.

, 1999, “An efficient algorithm to compute

maximum entropy densities”,

Econometric Review,

18(2), 127-140.

Owen, A.

2001. Empirical Likelihood, London, Chapman and Hall.

Premaratne, G. and Bera, A.K.

, 2000, “Modelling asymmetry and

excess kurtosis in stock return data.”, Working paper, University of Illinois

at Urbana-Champaign.

Premaratne, G. and Bera, A.K.

, 2001, “A test for asymmetry with

leptokurtic financial data.”, Working paper, University of Illinois at Urbana-

Champaign.

Qin, J. and Lawless, J

, 1994, “Generalized estimating equations”,

The

Annals of Statistics,

20, 300-325.

Rao, C.R.

, 2002, Linear Statistical Inference and Its Application. New

York.

Renyi, A.

, 1960, “On measures of entropy and information”,

Proceeding of

the fourth berkeley Symposium on Mathematics, Statistics, and Probability,”

Vol I, 547.

Rich, R.W., Raymond, J. and Butler, J.S.

, 1991, “Generalized in-

strumental variables estimation of autoregressive conditional heteroskedastic

models”,

Economics letters,

35, 179-185.

Rockinger, M. and Jondeau, E.

, 2002, “Entropy densities with an ap-

plication to autoregressive conditional skewness and kurtosis”,

Journal of

Econometrics,

106, 119-142.

Ryu, H.K.

, 1990, “Orthonormal basis and maximum entropy estimation

of probability density and regression functions”, Doctoral Dissertation, De-

partment of Economics, University of Chicago, IL.

Shannon, C.E.

, 1948, “The mathematical theory of communication”,

Bell

System Technical Journal,

July-Oct.; reprinted in: C.E. Shannon and W.

Weaver, The mathematical theory of communication (University of Illinois
Press, Urbana, IL), 3-91.

Stuart, A. and J.K. Ord

, 1994, Kendall’s Advanced Theory of Statistics

Vol 1: Distribution Theory, Edward Arnold, London.

Weiss, A.A

, 1986, “Asymptotic theory for ARCH models: Estimation and

testing”,

Econometric Theory,

2, 107-131.

White, H.

, 1982, “Maximum likelihood estimation of misspecified models”,

Econometrica,

50, 1-25.

Wu, X.

, 2003, “Calculation of maximum entropy densities with application

to income distribution”,

Journal of Econometrics,

115, 347-354.

Zellner, A. and Highfield, A.R.

, 1988, “Calculation of maximum en-

tropy distribution and approximation of marginalposterior distributions”,

Journal of Econometrics,

37, 195-209.

Zellner, A.

, 1998, “On order invariance of maximum entropy procedures,”

Mimeo.

Zeller, A.

, 1991, “Bayesian methods and entropy in economics and econo-

metrics”, Maximum Entropy and Bayesian Methods, Kluwer, Amsterdam,

17-31.