Estimation
On this page the estimation process of the ruspy package is documented. The structure is the following: The first part documents in detail which format is required on the input data and how you can easily access this for the original data. Then we explain how the initialization dictionary specified, before explaining the different estimation steps and methods.
The input data
The estimation package works with a pandas.DataFrame as input. Each observation in the DataFrame has to be indexed by the “Bus_ID” and the “period” of observation. These two identifiers have to be combined by the pandas.MultiIndex. Furthermore the DataFrame has to include the following information as columns:
Variable 
Description 

state 
Discretized mileage state of the bus. 
decision 
Containing 0 for decision of maintenance and 1 for replacement. 
usage 
Last month mileage usage as discretized state increase. 
If you want to replicate Rust (1987) you can download the raw data from John Rust`s website and prepare it yourself it the desired manner. Or you can use the functions in zurcherdata which are provided by the OpenSourceEconomics community and tailored to the ruspy package.
The estimation process
The estimation process is not directly implemented in ruspy. The package only contains
the likelihood functions and in case of MPEC also the constraints. For the
(minimization) maximization of the (negative) loglikelihood function, an external
optimization library has to be used. Hence ruspy is a so called model package.
OpenSourceEconomics offers several of these model packages, all for different models.
More information can be found on our homepage. The central
function to get the criterion function, it’s derivative and if applicable the constraint
is the get_criterion_function
function. It’s source code can be found here:

This function specifies the criterion function with its derivative, transition probabilites (for NXFP and MPEC) as well as the contraint function with its derivative (for MPEC). 
The returns of the function will be explained below. First, the second input besides the input data The input data, the initialization dictionary is documented:
Estimation initialization dictionary
The initialization dictionary contains model, method and optional algorithmic specific information. The information on theses three categories is are saved under the dictionary keys method, model_specifications and alg_details. The keys method and model_specifications are mandatory. In the following the entries saved under the three keys is explained
Under the key model_specifications a subdictionary has to be provided, with the following mandatory keys:
discount_factor : (float) The discount factor. See Discount factor for details.
num_states : (int) The size of the state space as integer.
maint_cost_func : (string) Name of the cost function. See Maintenance cost function for details.
cost_scale : (float) The scale for the maintenance costs. See Scale for details.
Under the key method the method of estimation has to be specified as a (string): Ruspy supports the following keys: “NFXP”, “NFXP_BHHH” or “MPEC”.
If “NFXP” or “NFXP_BHHH” are chosen as method, then the additional subdictionairy alg_details can be used to specify options for the fixed point algorithm. See Algorithmic details for the possible keys and the default values.
Before explaining the cost parameter estimation with the likelihood function from
get_criterion_function
, the transition probability estimation is documented. This
estimation can be completely separated from the cost parameter estimation.
Transition probability estimation
The functions for estimating the transition probabilities can be found in
estimation.estimation_transitions
. The main function, which coordinates this process
is:
Estimating the transition proabilities. 
So far, there is only the pooled transition estimation from Rust (1987) implemented. The function filters missing values from the usage data from the DataFrame and then counts, how often each increase occurs. With this transition count the loglikelihood function for the transition estimation can be constructed. Note that this is the loglikelihood function of a multinomial distribution:
where \(a_i\) is the number of occurrences for an increase by \(i\) states and \(p_i\) their probability. Note that the minus is introduced, such that a maximization of the likelihood corresponds to a minimization of this function. The corresponding function in the code is:

Sum the individual negative loglikelihood. 
The estimate_transitions
function does not minimize loglike_trans
directly.
Instead ruspy uses the formula for estimating a multinomial distribution and then
calculates the likelihood value from the estimate. Therefore we don’t provide standard
errors of the transitions probabilities at the moment.
The collected results of the transition estimation are collected in a dictionary
described below and returned to the function get_criterion_function
in which
then the respective criterion function for the cost parameter estimation is
specified. The transition result is also returned as the second output from
get_criterion_function
.
Transition results
The dictionary containing the transition estimation results has the following keys:
fun : (numpy.float) Loglikelihood of transition estimation.
x : (numpy.array) Estimated transition probabilities.
trans_count : (numpy.array) Counted state increases for each array index.
So far only a pooled estimation of the transitions is possible. Hence, ruspy
uses the estimated probabilities to construct a transition matrix with the same
nonzero probabilities in each row. This function is:

Creating the transition matrix with the assumption, that in every row the state increases have the same probability. 
The transition matrix is then used for the cost parameter estimation irrespective of using NFXP or MPEC.
Cost parameter estimation
The cost parameters are now estimated differently for NFXP, NFXP_BHHH and MPEC.
get_criterion_function
returns independet of the specified method two objects.
A dictionary of functions and the transition results. Only the keys and the functions
in the function dictionary are different by each method and described below. Note,
that all inputs are fixed for the functions in function dictionary dependent on the
specifications given in the initialization dictionary and the functions only take the
cost parameters as only input.
NFXP
The cost parameters for the NFXP are estimated by minimizing the negative
loglikelihood. The criterion function as well as its analytical derivative
are returned by the function get_criterion_function
in a dictonary with keys
“criterion_function” and “criterion_derivative”. Their source code can be found
in ruspy.estimation.nfxp
:

sums the individual negative log likelihood contributions for algorithms such as the LBFGSB. 

sums up the Jacobian to obtain the gradient of the negative log likelihood function needed for algorithm such as the LBFGSB. 
The minimization of the criterion function is not directly implemented in the
ruspy package, so an minimization routine is needed. In the provided
tutorials, we use the minimize function from the
estimagic library.
Beside the criterion function and its derivative, an algorithm used for optimization
has to be entered and a first guess of the cost params can be provided as inputs
of the minimize
function. Note, again that only the cost parameters are needed in
the minimization, as all other inputs of the functions are fixed.
Depending on the form of the cost functions, the params argument is a vector of
length num_params
, i.e. if we specify a linear cost function in
the initialization dictionary, there are two cost parameters, which are \(RC\)
and \(\theta_1\), respectively. For any other cost function see ref:cost_func.
In the minimization procedure the optimizer calls the likelihood functions and its derivative with different cost parameters. Together with the constant held arguments, the expected value is calculated by fixed point algorithm. Double calculation of the same fixed point is avoided by the following function:

A auxiliary function, which allows the loglikelihood function as well as its derivative to share the same fixed point and to avoid the need to execute the computation double. 
NFXP_BHHH
The cost parameter estimation for “NFXP_BHHH” is similar to the one for
“NFXP” by using the individual log likelihood contributions of a bus at each
time period. The criterion function as well as its analytical derivative
are returned by the function get_criterion_function
in a dictonary with keys
“criterion_function” and “criterion_derivative”. Their source code can also be found
in ruspy.estimation.nfxp
:

This is the individual logliklihood function for the estimation of the cost parameters needed for the BHHH optimizer. 
This is the Jacobian of the individual log likelihood function of the cost parameter estimation with respect to all cost parameters needed for the BHHH. 
The BHHH is a quasiNewton method, which uses the individual likelihood contributions instead of their sum. You can find a BHHH implementation in the overview of estimagic algorithms. Everything else is the same as in the NFXP implementation using the sum of the likelihood contributions.
MPEC
In the case of MPEC there the expected value fixed point is not calculated for a set of
cost parameters and instead the fixed point mapping is implemented as a constraint.
We provide besides the criterion function and its derivative, also the constraint and
its derivative via the get_criterion_function
. They are returned in a dictionary of
functions with keys “criterion_function”, “criterion_derivative”, “constraint” and
“constraint_derivative”. The source code of the four functions can be found in
ruspy.estimation.mpec
:

Calculate the negative partial log likelihood for MPEC depending on cost parameters as well as the discretized expected values. 
Computing the analytical gradient of the objective function for MPEC. 


Calculate the constraint of MPEC. 

Calculating the analytical Jacobian of the MPEC constraint. 
For estimating the model, one can use the optimizers for nonlinear constraint
optimizers implemented in estimagic.
The minimize
function of estimagic takes the criterion function, its derivative,
the constraint function and its derivative as inputs. The constraint can be given to the
minimize
function via a dictionairy under the argument constraint
(see constraint argument) Note that the starting values params
for MPEC
consist of the cost parameters and starting values for the \(EV\) fixed point. The
array has therefore a length of \(num\_states + num\_params\).
Imagine the grid size is 90 and we have linear cost which means there are two cost parameters. Then the first 90 values are the starting values for the expected values in order of increasing state. The last two elements are \(RC\) and \(\theta_1\), respectively.
Auxiliary objects
State matrix
A \(num\_obs \times num\_states\) dimensional bool numpy.array containing a single TRUE in each row at the column in which the bus was in that observation. It is used in the matrix multiplication of the likelihood function. It is created by

This function constructs a auxiliary matrix for the loglikelihood of the cost parameters. 
Decision Matrix
A \(num\_obs \times 2\) dimensional numpy array containing 1 in the first row for maintaining and 1 in the second for replacement. It is used in the matrix multiplication of the likelihood function.
Demonstration
In the tutorials are two demonstration jupyter notebooks of the cost estimation process. The replication notebook allows to easily experiment with the methods described here as well as the implied demand function. The notebook can also be downloaded from the tutorials folder of the repository. If you have have everything setup, then it should be easy to run it. For a more advanced set up have a look at the replication of Iskhakov et al. (2016).