Case Study: Recidivism

Co-authors

Prerequisites

Outcomes

  • See an end-to-end data science exercise

  • Application of regression

# Uncomment following line to install on colab
#! pip install fiona geopandas xgboost gensim folium pyLDAvis descartes
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

from sklearn import (
    linear_model, metrics, neural_network, pipeline, preprocessing, model_selection
)

%matplotlib inline

Introduction to Recidivism

Recidivism is the tendency for an individual who has previously committed a crime to commit another crime in the future.

One key input to a judge’s sentencing decision is how likely a given convict is to re-offend, or recidivate.

In an effort to assist the legal system with sentencing guidelines, data scientists have attempted to predict an individual’s risk of recidivism from known observables.

Some are concerned that this process may exhibit prejudice, either through biased inputs or through statistical discrimination.

For example,

  1. Biased inputs: Imagine that a judge often writes harsher sentences to people of a particular race or gender. If an algorithm is trained to reproduce the sentences of this judge, the bias will be propagated by the algorithm.

  2. Statistical discrimination: Imagine that two variables (say race and income) are correlated, and one of them (say income) is correlated with the risk of recidivism. If income is unobserved, then an otherwise unbiased method would discriminate based on race, even if race has nothing to say about recidivism after controlling for income.

This has given rise to serious discussions about the moral obligations data scientists have to those who are affected by their tools.

We will not take a stance today on our moral obligations, but we believe this is an important precursor to any statistical work with public policy applications.

One predictive tool used by various courts in the United States is called COMPAS (Correctional Offender Management Profiling for Alternative Sanctions).

We will be following a Pro Publica article that analyzes the output of COMPAS.

The findings of the article include:

  • Black defendants were often predicted to be at a higher risk of recidivism than they actually were.

  • White defendants were often predicted to be less risky than they were.

  • Black defendants were twice as likely as white defendants to be misclassified as being a higher risk of violent recidivism.

  • Even when controlling for prior crimes, future recidivism, age, and gender, black defendants were 77 percent more likely to be assigned higher risk scores than white defendants.

Data Description

The authors of this article filed a public records request with the Broward County Sheriff’s office in Florida.

Luckily for us, they did a significant amount of the legwork which is described in this methodology article.

We download the data below.

data_url = "https://raw.githubusercontent.com/propublica/compas-analysis"
data_url += "/master/compas-scores-two-years.csv"

df = pd.read_csv(data_url)
df.head()
id name first last compas_screening_date sex dob age age_cat race ... v_decile_score v_score_text v_screening_date in_custody out_custody priors_count.1 start end event two_year_recid
0 1 miguel hernandez miguel hernandez 2013-08-14 Male 1947-04-18 69 Greater than 45 Other ... 1 Low 2013-08-14 2014-07-07 2014-07-14 0 0 327 0 0
1 3 kevon dixon kevon dixon 2013-01-27 Male 1982-01-22 34 25 - 45 African-American ... 1 Low 2013-01-27 2013-01-26 2013-02-05 0 9 159 1 1
2 4 ed philo ed philo 2013-04-14 Male 1991-05-14 24 Less than 25 African-American ... 3 Low 2013-04-14 2013-06-16 2013-06-16 4 0 63 0 1
3 5 marcu brown marcu brown 2013-01-13 Male 1993-01-21 23 Less than 25 African-American ... 6 Medium 2013-01-13 NaN NaN 1 0 1174 0 0
4 6 bouthy pierrelouis bouthy pierrelouis 2013-03-26 Male 1973-01-22 43 25 - 45 Other ... 1 Low 2013-03-26 NaN NaN 2 0 1102 0 0

5 rows × 53 columns

We summarize some of the variables that we will use.

  • first: An individual’s first name

  • last: An individual’s last name

  • sex: An individual’s sex

  • age: An individual’s age

  • race: An individual’s race. It takes values of Caucasian, Hispanic, African-American, Native American, Asian, or Other

  • priors_count: Number of previous arrests

  • decile_score: The COMPAS risk score

  • two_year_recid: Whether the individual had been jailed for a new crime in next two years

Descriptive Statistics

The first thing we do with our data is to drop any classes without “enough” observations.

One of our focuses will be on inter-race differences in scores and recidivism, so we only keep data on races with at least 500 observations in our data set.

Just be aware that this kind of seemingly and even genuinely benign or “technical” decision can still perpetuate inequality by exclusion.

For example, Asians are a small minority, so they’re not really present in the data, and therefore they’re absent from the policy discussion — we have no inferential knowledge on how COMPAS scores work for them.

race_count = df.groupby(["race"])["name"].count()
at_least_500 = list(race_count[race_count > 500].index)
print("The following race have at least 500 observations:", at_least_500)
df = df.loc[df["race"].isin(at_least_500), :]
The following race have at least 500 observations: ['African-American', 'Caucasian', 'Hispanic']

Next, we explore the remaining data using plots and tables.

Age, Sex, and Race

Let’s look at how the dataset is broken down into age, sex, and race.

def create_groupcount_barplot(df, group_col, figsize, **kwargs):
    "call df.groupby(group_col), then count number of records and plot"
    counts = df.groupby(group_col)["name"].count().sort_index()

    fig, ax = plt.subplots(figsize=figsize)
    counts.plot(kind="bar", **kwargs)

    ax.spines["right"].set_visible(False)
    ax.spines["top"].set_visible(False)
    ax.set_xlabel("")
    ax.set_ylabel("")

    return fig, ax
age_cs = ["Less than 25", "25 - 45", "Greater than 45"]
df["age_cat"] = pd.Categorical(df["age_cat"], categories=age_cs, ordered=True)
fig, ax = create_groupcount_barplot(df, "age_cat", (14, 8), color="DarkBlue", rot=0)
../_images/recidivism_8_0.png
sex_cs = ["Female", "Male"]
df["sex"] = pd.Categorical(df["sex"], categories=sex_cs, ordered=True)
create_groupcount_barplot(df, "sex", (6, 8), color="DarkBlue", rot=0)
(<Figure size 600x800 with 1 Axes>, <AxesSubplot:>)
../_images/recidivism_9_1.png
race_cs = ["African-American", "Caucasian", "Hispanic"]
df["race"] = pd.Categorical(df["race"], categories=race_cs, ordered=True)
create_groupcount_barplot(df, "race", (12, 8), color="DarkBlue", rot=0)
(<Figure size 1200x800 with 1 Axes>, <AxesSubplot:>)
../_images/recidivism_10_1.png

From this, we learn that our population is mostly between 25-45, male, and is mostly African-American or Caucasian.

Recidivism

We now look into how recidivism is split across groups.

recid = df.groupby(["age_cat", "sex", "race"])["two_year_recid"].mean().unstack(level="race")
recid
race African-American Caucasian Hispanic
age_cat sex
Less than 25 Female 0.449704 0.310345 0.411765
Male 0.645806 0.541254 0.536364
25 - 45 Female 0.382278 0.423948 0.333333
Male 0.533074 0.433699 0.375000
Greater than 45 Female 0.227273 0.239766 0.217391
Male 0.425101 0.289157 0.216667

In the table, we see that the young have higher recidivism rates than the old, except for among Caucasian females.

Also, African-American males are at a particularly high risk of recidivism even as they get older.

Risk Scores

Each individual in the dataset was assigned a decile_score ranging from 1 to 10.

This score represents the perceived risk of recidivism with 1 being the lowest risk and 10 being the highest.

We show a bar plot of all decile scores below.

create_groupcount_barplot(df, "decile_score", (12, 8), color="DarkBlue", rot=0)
(<Figure size 1200x800 with 1 Axes>, <AxesSubplot:>)
../_images/recidivism_14_1.png

How do these scores differ by race?

dfgb = df.groupby("race")
race_count = df.groupby("race")["name"].count()

fig, ax = plt.subplots(3, figsize=(14, 8))

for (i, race) in enumerate(["African-American", "Caucasian", "Hispanic"]):
    (
        (dfgb
            .get_group(race)
            .groupby("decile_score")["name"].count() / race_count[race]
        )
        .plot(kind="bar", ax=ax[i], color="#353535")
    )
    ax[i].set_ylabel(race)
    ax[i].set_xlabel("")
    # set equal y limit for visual comparison
    ax[i].set_ylim(0, 0.32)

fig.suptitle("Score Frequency by Race")
Text(0.5, 0.98, 'Score Frequency by Race')
../_images/recidivism_16_1.png

While Caucasians and Hispanics both see the majority of their score distribution on low values, African-Americans are almost equally likely to receive any score.

Risk Scores and Recidivism

Now we can explore the relationship between the risk score and actual two year recidivism.

The first measure we look at is the frequency of recidivism by decile score – these numbers tell us what percentage of people assigned a particular risk score committed a new crime within two years of being released.

df.groupby("decile_score")["two_year_recid"].mean()
decile_score
1     0.220392
2     0.309112
3     0.375887
4     0.426593
5     0.478723
6     0.564228
7     0.590988
8     0.681363
9     0.698795
10    0.770889
Name: two_year_recid, dtype: float64

Let’s also look at the correlation.

df[["decile_score", "two_year_recid"]].corr()
decile_score two_year_recid
decile_score 1.000000 0.346797
two_year_recid 0.346797 1.000000

As the risk score increases, the percentage of people committing a new crime does as well, with a positive correlation (~0.35).

This is good news – it means that the score is producing at least some signal about an individual’s recidivism risk.

One of the key critiques from Pro Publica, though, was that the inaccuracies were nonuniform — that is, the tool was systematically wrong about certain populations.

Let’s now separate the correlations by race and see what happens.

recid_rates = df.pivot_table(index="decile_score", columns="race", values="two_year_recid")

recid_rates
race African-American Caucasian Hispanic
decile_score
1 0.228643 0.208517 0.244898
2 0.302799 0.313019 0.318584
3 0.419075 0.340659 0.313953
4 0.459740 0.396491 0.346154
5 0.482192 0.460581 0.538462
6 0.559896 0.572165 0.567568
7 0.592500 0.615385 0.470588
8 0.682451 0.719298 0.500000
9 0.707895 0.693878 0.550000
10 0.793706 0.703125 0.666667

Or, in plotted form,

fig, ax = plt.subplots(3, sharex="all")

for (i, _race) in enumerate(["African-American", "Caucasian", "Hispanic"]):
    _rr_vals = recid_rates[_race].values

    ax[i].bar(np.arange(1, 11), _rr_vals, color="#c60000")
    ax[i].bar(np.arange(1, 11), 1 - _rr_vals, bottom=_rr_vals, color="#353535")
    ax[i].set_ylabel(_race)
    ax[i].spines["left"].set_visible(False)
    ax[i].spines["right"].set_visible(False)
    ax[i].spines["top"].set_visible(False)
    ax[i].spines["bottom"].set_visible(False)
    ax[i].yaxis.tick_right()
    ax[i].xaxis.set_ticks_position("none")

fig.suptitle("Recidivism Rates by Race")
Text(0.5, 0.98, 'Recidivism Rates by Race')
../_images/recidivism_24_1.png

Regression

In what follows, we will be doing something slightly different than what was done in the Pro Publica article.

First, we will explore what happens when we try to predict the COMPAS risk scores using the observable data that we have.

Second, we will use binary probability models to predict whether an individual is at risk of recidivism.

We will do this first using the COMPAS risk scores, and then afterwards we will try to write our own model based on raw observables, like age, race and sex.

Preprocessing

We would like to use some features that are inherently non-numerical such as sex, age group, and race in our model.

Before we can do that, we need to encode these string values as numerical values so our machine learning algorithms can understand them – an econometrician would call this, creating dummy variables.

sklearn can automatically do this for us using OneHotEncoder.

Essentially, we make one column for each possible value of a categorical variable and then we set just one of these columns equal to a 1 if the observation has that column’s category, and set all other columns to 0.

Let’s do an example.

Imagine we have the array below.

sex = np.array([["Male"], ["Female"], ["Male"], ["Male"], ["Female"]])

The way to encode this would be to create the array below.

sex_encoded = np.array([
    [0.0, 1.0],
    [1.0, 0.0],
    [0.0, 1.0],
    [0.0, 1.0],
    [1.0, 0.0]
])

Using sklearn it would be:

ohe = preprocessing.OneHotEncoder(sparse=False)
sex_ohe = ohe.fit_transform(sex)

# This should shows 0s!
sex_ohe - sex_encoded
array([[0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.]])

We will use this encoding trick below as we create our data.

Predicting COMPAS Scores

First, we proceed by creating the X and y inputs into a manageable format.

We encode the categorical variables using the OneHotEncoder described above, and then merge that with the non-categorical data.

Finally, we split the data into training and validation (test) subsets.

def prep_data(df, continuous_variables, categories, y_var, test_size=0.15):

    ohe = preprocessing.OneHotEncoder(sparse=False)

    y = df[y_var].values
    X = np.zeros((y.size, 0))

    # Add continuous variables if exist
    if len(continuous_variables) > 0:
        X = np.hstack([X, df[continuous_variables].values])

    if len(categories) > 0:
        X = np.hstack([X, ohe.fit_transform(df[categories])])

    X_train, X_test, y_train, y_test = model_selection.train_test_split(
        X, y, test_size=test_size, random_state=42
    )

    return X_train, X_test, y_train, y_test

As we proceed, our goal will be to see which variables are most important for predicting the COMPAS scores.

As we estimate these models, one of our metrics for success will be mean absolute error (MAE).

def fit_and_report_maes(mod, X_train, X_test, y_train, y_test, y_transform=None, y_inv_transform=None):
    if y_transform is not None:
        mod.fit(X_train, y_transform(y_train))
    else:
        mod.fit(X_train, y_train)

    yhat_train = mod.predict(X_train)
    yhat_test = mod.predict(X_test)

    if y_transform is not None:
        yhat_train = y_inv_transform(yhat_train)
        yhat_test = y_inv_transform(yhat_test)

    return dict(
        mae_train=metrics.mean_absolute_error(y_train, yhat_train),
        mae_test=metrics.mean_absolute_error(y_test, yhat_test)
    )

Let’s begin with a simple linear model which uses just prior arrests.

X_train, X_test, y_train, y_test = prep_data(
    df, ["priors_count"], [], "decile_score"
)

fit_and_report_maes(linear_model.LinearRegression(), X_train, X_test, y_train, y_test)
{'mae_train': 2.162527833108664, 'mae_test': 2.191754484529134}

This simple model obtains a MAE of about 2 for both the test data and training data.

This means, on average, that our model can predict the COMPAS score (which ranges from 1-10) within about 2 points.

While the MAE is about 2, knowing what the errors on our prediction model look like is often very useful.

Below, we create a histogram which shows the distribution of these errors. In our case, we take the difference between predicted value and actual value, so a positive value means that we overpredicted the COMPAS score and a negative value means we underpredicted it.

lr_model = linear_model.LinearRegression()
lr_model.fit(X_train, y_train)

yhat_train = lr_model.predict(X_train)
yhat_test = lr_model.predict(X_test)

fig, ax = plt.subplots(1, 2, figsize=(12, 4), sharey="all")

ax[0].hist(yhat_train - y_train, density=True)
ax[0].set_title("Training Data")
ax[1].hist(yhat_test - y_test, density=True)
ax[1].set_title("Test Data")
Text(0.5, 1.0, 'Test Data')
../_images/recidivism_38_1.png

In both cases, the long left tails of errors suggest the existence of relevant features which would improve our model.

The first thing we might consider investigating is whether there are non-linearities in how the number of priors enters the COMPAS score.

First, we try using polynomial features in our exogenous variables.

X_train, X_test, y_train, y_test = prep_data(
    df, ["priors_count"], [], "decile_score"
)

# Transform data to quadratic
pf = preprocessing.PolynomialFeatures(2, include_bias=False)
X_train = pf.fit_transform(X_train)
X_test = pf.fit_transform(X_test)

fit_and_report_maes(linear_model.LinearRegression(), X_train, X_test, y_train, y_test)
{'mae_train': 2.120405801755292, 'mae_test': 2.1179838134597335}

We don’t see a very significant increase in performance, so we also try using log on the endogenous variables.

X_train, X_test, y_train, y_test = prep_data(
    df, ["priors_count"], [], "decile_score"
)

fit_and_report_maes(
    linear_model.LinearRegression(), X_train, X_test, y_train, y_test,
    y_transform=np.log, y_inv_transform=np.exp
)
{'mae_train': 2.2550821558610115, 'mae_test': 2.3332184125647917}

Still no improvement… The next natural thing is to add more features to our regression.

X_train, X_test, y_train, y_test = prep_data(
    df, ["priors_count"], ["age_cat", "race", "sex"], "decile_score"
)

fit_and_report_maes(linear_model.LinearRegression(), X_train, X_test, y_train, y_test)
{'mae_train': 1.8076563650603426, 'mae_test': 1.8277010173497898}

By allowing for indicator variables on age, race, and sex, we are able to slightly improve the MAE. The errors also seem to have a less extreme tail.

X_train, X_test, y_train, y_test = prep_data(
    df, ["priors_count"], ["age_cat", "race", "sex"], "decile_score"
)

lr_model = linear_model.LinearRegression()
lr_model.fit(X_train, y_train)

yhat_train = lr_model.predict(X_train)
yhat_test = lr_model.predict(X_test)

fig, ax = plt.subplots(1, 2, figsize=(12, 4), sharey="all")

ax[0].hist(yhat_train - y_train, density=True)
ax[0].set_title("Training Data")
ax[1].hist(yhat_test - y_test, density=True)
ax[1].set_title("Test Data")
Text(0.5, 1.0, 'Test Data')
../_images/recidivism_46_1.png

The coefficients are listed below:

names = [
    "priors_count", "Less than 25", "25-45", "Greater than 45", "African-American",
    "Caucasian", "Hispanic", "Female", "Male"
]
for (_name, _coef) in zip(names, lr_model.coef_):
    print(_name, ": ", _coef)
priors_count :  0.27999230496845784
Less than 25 :  -0.10262755807946655
25-45 :  -1.7189018948181352
Greater than 45 :  1.8215294528975976
African-American :  0.6102522379830796
Caucasian :  -0.10156939689442249
Hispanic :  -0.5086828410886558
Female :  0.042935824274662374
Male :  -0.04293582427466229

What stands out to you about these coefficients?

Exercise

See exercise 1 in the exercise list.

Binary Probability Models

Binary probability models are used to model “all or nothing” outcomes, like the occurrence of an event.

Their output is the probability that an event of interest occurs.

With this probability in hand, the researcher chooses an acceptable cutoff (perhaps 0.5) above which the event is predicted to occur.

Note

Binary probability models can be thought of as a special case of classification.

In classification, we are given a set of features and asked to predict one of a finite number of discrete labels.

We will learn more about classification in an upcoming lecture!

In our example, we will be interested in how the COMPAS scores do at predicting recidivism and how their ability to predict depends on race or sex.

To assist us in evaluating the performance of various models we will use a new metric called the confusion matrix.

Scikit-learn knows how to compute this metric and also provides a good description of what is computed.

Let’s see what they have to say.

help(metrics.confusion_matrix)
Help on function confusion_matrix in module sklearn.metrics._classification:

confusion_matrix(y_true, y_pred, *, labels=None, sample_weight=None, normalize=None)
    Compute confusion matrix to evaluate the accuracy of a classification.
    
    By definition a confusion matrix :math:`C` is such that :math:`C_{i, j}`
    is equal to the number of observations known to be in group :math:`i` and
    predicted to be in group :math:`j`.
    
    Thus in binary classification, the count of true negatives is
    :math:`C_{0,0}`, false negatives is :math:`C_{1,0}`, true positives is
    :math:`C_{1,1}` and false positives is :math:`C_{0,1}`.
    
    Read more in the :ref:`User Guide <confusion_matrix>`.
    
    Parameters
    ----------
    y_true : array-like of shape (n_samples,)
        Ground truth (correct) target values.
    
    y_pred : array-like of shape (n_samples,)
        Estimated targets as returned by a classifier.
    
    labels : array-like of shape (n_classes), default=None
        List of labels to index the matrix. This may be used to reorder
        or select a subset of labels.
        If ``None`` is given, those that appear at least once
        in ``y_true`` or ``y_pred`` are used in sorted order.
    
    sample_weight : array-like of shape (n_samples,), default=None
        Sample weights.
    
        .. versionadded:: 0.18
    
    normalize : {'true', 'pred', 'all'}, default=None
        Normalizes confusion matrix over the true (rows), predicted (columns)
        conditions or all the population. If None, confusion matrix will not be
        normalized.
    
    Returns
    -------
    C : ndarray of shape (n_classes, n_classes)
        Confusion matrix whose i-th row and j-th
        column entry indicates the number of
        samples with true label being i-th class
        and predicted label being j-th class.
    
    See Also
    --------
    ConfusionMatrixDisplay.from_estimator : Plot the confusion matrix
        given an estimator, the data, and the label.
    ConfusionMatrixDisplay.from_predictions : Plot the confusion matrix
        given the true and predicted labels.
    ConfusionMatrixDisplay : Confusion Matrix visualization.
    
    References
    ----------
    .. [1] `Wikipedia entry for the Confusion matrix
           <https://en.wikipedia.org/wiki/Confusion_matrix>`_
           (Wikipedia and other references may use a different
           convention for axes).
    
    Examples
    --------
    >>> from sklearn.metrics import confusion_matrix
    >>> y_true = [2, 0, 2, 2, 0, 1]
    >>> y_pred = [0, 0, 2, 2, 0, 2]
    >>> confusion_matrix(y_true, y_pred)
    array([[2, 0, 0],
           [0, 0, 1],
           [1, 0, 2]])
    
    >>> y_true = ["cat", "ant", "cat", "cat", "ant", "bird"]
    >>> y_pred = ["ant", "ant", "cat", "cat", "ant", "cat"]
    >>> confusion_matrix(y_true, y_pred, labels=["ant", "bird", "cat"])
    array([[2, 0, 0],
           [0, 0, 1],
           [1, 0, 2]])
    
    In the binary case, we can extract true positives, etc as follows:
    
    >>> tn, fp, fn, tp = confusion_matrix([0, 1, 0, 1], [1, 1, 1, 0]).ravel()
    >>> (tn, fp, fn, tp)
    (0, 2, 1, 1)
def report_cm(mod, X_train, X_test, y_train, y_test):
     return dict(
         cm_train=metrics.confusion_matrix(y_train, mod.predict(X_train)),
         cm_test=metrics.confusion_matrix(y_test, mod.predict(X_test))
     )

We will start by using logistic regression using only decile_score as a feature and then examine how the confusion matrices differ by race and sex.

from patsy import dmatrices
groups = [
    "overall", "African-American", "Caucasian", "Hispanic", "Female", "Male"
]

ind = [
    "Portion_of_NoRecid_and_LowRisk", "Portion_of_Recid_and_LowRisk",
    "Portion_of_NoRecid_and_HighRisk", "Portion_of_Recid_and_HighRisk"
]

fmla = "two_year_recid ~ C(decile_score)"
y,X = dmatrices(fmla, df)
X_train, X_test, y_train, y_test, df_train, df_test = model_selection.train_test_split(
    X,y.reshape(-1),df, test_size=0.25, random_state=42
)


decile_mod = linear_model.LogisticRegression(solver="lbfgs").fit(X_train,y_train)

def cm_tables(pred, y, df):
    output = pd.DataFrame(index=ind, columns=groups)
    for group in groups:
        if group in ["African-American", "Caucasian", "Hispanic"]:
            subset=(df.race==group)
        elif group in ["Female", "Male"]:
            subset=(df.sex==group)
        else:
            subset=np.full(y.shape, True)

        y_sub = y[subset]
        pred_sub = pred[subset]

        cm = metrics.confusion_matrix(y_sub, pred_sub)

        # Compute fraction for which the guess is correct
        total = cm.sum()
        vals = np.array(cm/total)
        output.loc[:, group] = vals.reshape(-1)


    def cond_probs(col, axis):
        d=int(np.sqrt(len(col)))
        pcm = np.array(col).reshape(d,d)
        pcm = pcm/pcm.sum(axis=axis, keepdims=True)
        return(pcm.reshape(-1))

    given_outcome = output.copy()
    given_outcome.index = ["P(LowRisk|NoRecid)","P(HighRisk|NoRecid)","P(LowRisk|Recid)","P(HighRisk|Recid)"]
    given_outcome=given_outcome.apply(lambda c: cond_probs(c,1))

    given_pred = output.copy()
    given_pred.index = ["P(NoRecid|LowRisk)","P(NoRecid|HighRisk)","P(Recid|LowRisk)","P(Recid|HighRisk)"]
    given_pred=given_pred.apply(lambda c: cond_probs(c,0))
    return(output,given_outcome, given_pred)

output, given_outcome, given_pred =cm_tables(decile_mod.predict(X_test),
                                             y_test, df_test)
output
overall African-American Caucasian Hispanic Female Male
Portion_of_NoRecid_and_LowRisk 0.361815 0.263270 0.475000 0.522581 0.438040 0.342222
Portion_of_Recid_and_LowRisk 0.191514 0.230361 0.136667 0.167742 0.207493 0.187407
Portion_of_NoRecid_and_HighRisk 0.152033 0.140127 0.168333 0.161290 0.141210 0.154815
Portion_of_Recid_and_HighRisk 0.294638 0.366242 0.220000 0.148387 0.213256 0.315556

output contains information on the percent of true negatives, false negatives, false positives, and true positives.

What do you see?

The joint probabilities (of prediction and outcome given race or sex) in the above table are a bit hard to interpret.

Conditional probabilities can be easier to think about.

Let’s look at the probability of outcomes given the prediction as well as race or sex.

given_pred
overall African-American Caucasian Hispanic Female Male
P(NoRecid|LowRisk) 0.704128 0.652632 0.738342 0.764151 0.756219 0.688525
P(NoRecid|HighRisk) 0.393939 0.386121 0.383178 0.530612 0.493151 0.372607
P(Recid|LowRisk) 0.295872 0.347368 0.261658 0.235849 0.243781 0.311475
P(Recid|HighRisk) 0.606061 0.613879 0.616822 0.469388 0.506849 0.627393

As you can see, the distribution of outcomes conditional on predictions does not vary too much with race.

Moreover, if anything, it discriminates in favor of African-Americans.

The algorithm does appear to overpredict recidivism for women compared to men.

This is an important concern.

We will not discuss it too much though because (1) we will see below that when fairness is looked at in another way, women are favored over men, and (2) the company that produces COMPAS also produces a separate questionnaire and risk score designed only for women.

False Positive and Negative Rates

What if we flip this around and look at the distributions of predictions conditional on outcomes?

Why look at these probabilities?

One reason is that in law, it’s traditionally far worse to punish innocents than let the guilty free. This idea goes at least back to 1760 and Blackstone’s ratio.

It is better that ten guilty persons escape than that one innocent suffer. -William Blackstone

Blackstone’s ratio says that we should be particularly concerned about P(HighRisk | NoRecid).

This probability is also called the false positive rate.

given_outcome
overall African-American Caucasian Hispanic Female Male
P(LowRisk|NoRecid) 0.653887 0.533333 0.776567 0.757009 0.678571 0.646154
P(HighRisk|NoRecid) 0.346113 0.466667 0.223433 0.242991 0.321429 0.353846
P(LowRisk|Recid) 0.340369 0.276730 0.433476 0.520833 0.398374 0.329134
P(HighRisk|Recid) 0.659631 0.723270 0.566524 0.479167 0.601626 0.670866

Now we see some large disparities by race in the false positive rate (and false negative rate). This is one of the main findings of the Pro Publica article.

In response to Pro Publica, Northpointe, the company that produces COMPAS, argued that COMPAS is not biased because the probabilities of outcomes conditional on predictions (like P(NoRecid|LowRisk)) are approximately equal across races [recidDMB16].

Following [recidKLL+17], we will call a prediction algorithm with this property well-calibrated.

Being well-calibrated is one criteria for fairness of a prediction algorithm.

Pro Publica’s critique focuses on a different criteria – that the the probability of predicted categories conditional on true outcomes should be equal across groups (i.e. P(HighRisk|NoRecid) should be equal across races).

[recidKLL+17] calls a prediction algorithm with this property balanced.

Visualizing Calibration and Balance

We can get a slightly more detailed look at calibration and balance by recognizing that prediction algorithms typically compute a predicted probability, not just a discrete predicted outcome.

The predicted outcome will typically be assigned to the category with the highest predicted probability.

We can examine calibration graphically by plotting the P(recidivism | predicted probability)

import scipy

def calibration_plot(pred, y, df, bins=20):
    fig,ax = plt.subplots(3,2, figsize=(12,6), sharey=True, sharex=True)
    for (g,group) in enumerate(groups):
        if group in ["African-American", "Caucasian", "Hispanic"]:
            subset=(df.race==group)
        elif group in ["Female", "Male"]:
            subset=(df.sex==group)
        else:
            subset=np.full(y.shape,True)
        _ax = ax[np.unravel_index(g, ax.shape)]
        y_sub = y[subset]
        pred_sub = pred[subset]
        mu, edges, n=scipy.stats.binned_statistic(pred_sub,y_sub,'mean',bins=bins)
        se, edges,n=scipy.stats.binned_statistic(pred_sub,y_sub,
                         lambda x: np.std(x)/np.sqrt(len(x)),bins=bins)
        midpts = (edges[0:-1]+edges[1:])/2
        _ax.errorbar(midpts, mu, yerr=1.64*se, fmt='o')
        _ax.set_title(group)
        _ax.set_ylabel("Observed recidivism")
        _ax.set_xlabel("Predicted P(recidivism)")
        x = np.linspace(*_ax.get_xlim())
        _ax.plot(x, x)
        _ax.set_xlim(0.0,1.0)
    fig.tight_layout()
    return(fig,ax)

calibration_plot(decile_mod.predict_proba(X_test)[:,1],
                 df_test["two_year_recid"],
                 df_test);
../_images/recidivism_59_0.png

This figure is one way to visualize how well-calibrated these predictions are.

The dots are binned averages of observed recidivism, conditional on predicted recidivism being in some range.

The error bars represent a 90% confidence interval.

A perfectly calibrated prediction would have these dots all lie along the 45 degree line.

For dots below the 45 degree line, the algorithm is overpredicting recidivism.

Exercise

See exercise 2 in the exercise list.

The algorithm appears fairly well-calibrated.

It does not seem to be making systematic errors in one direction based on any particular race– but it does appear to be systematic overprediction for females compared to males.

Now, let’s create a figure to examine balance.

Balance is about the distribution of predictions conditional on outcomes, so we will plot histograms of predicted probabilities conditional on realized outcomes.

import seaborn as sns
def balance_hist_plot(pred, y, df, bins=20):
    fig,ax = plt.subplots(3,2, figsize=(12,6), sharey=True, sharex=True)
    for (g,group) in enumerate(groups):
        if group in ["African-American", "Caucasian", "Hispanic"]:
            subset=(df.race==group)
        elif group in ["Female", "Male"]:
            subset=(df.sex==group)
        else:
            subset=np.full(y.shape,True)
        _ax = ax[np.unravel_index(g, ax.shape)]
        y_sub = y[subset]
        pred_sub = pred[subset]
        sns.distplot(pred_sub[y_sub==0], hist=True, bins=bins, kde=False, ax=_ax,
                     label="No recidivate", norm_hist=True, axlabel="Predicted Probability")
        sns.distplot(pred_sub[y_sub==1], hist=True, bins=bins, kde=False, ax=_ax,
                     label="Yes recidivate", norm_hist=True, axlabel="Predicted Probability")
        _ax.set_title(group)

    plt.legend()
    fig.tight_layout()
    return(fig,ax)

balance_hist_plot(decile_mod.predict_proba(X_test)[:,1],
                  df_test["two_year_recid"],
                  df_test);
/usr/share/miniconda3/envs/lecture-datascience/lib/python3.9/site-packages/seaborn/distributions.py:2619: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).
  warnings.warn(msg, FutureWarning)
../_images/recidivism_61_1.png

This figure is somewhat useful, but not for depicting balance especially clearly, so let’s try something else.

To get false positive and false negative rates, we must assign the predicted probabilities to outcomes.

The most common choice would be to predict recidivism if the predicted probability is greater than 0.5.

However, if we want to adjust the false positive and false negative rates, we might want to choose some other threshold and predict recidivism if the predicted probability exceeds this threshold.

Different thresholds will lead to different false negative and false positive rates, so let’s plot these rates as functions of the threshold.

def balance_threshold_plot(pred, y, df, bins=20):
    fig,ax = plt.subplots(2,2, figsize=(12,6), sharey=True,
                          sharex=True)
    x = np.linspace(min(pred), max(pred), bins)
    # get colors defined by theme
    colors=plt.rcParams["axes.prop_cycle"].by_key()["color"]
    for (g, group) in enumerate(groups):
        if group in ["African-American", "Caucasian", "Hispanic"]:
            subset=(df.race==group)
            r = 0
        elif group in ["Female", "Male"]:
            subset=(df.sex==group)
            r = 1
        else:
            continue
        y_sub = y[subset]
        pred_sub = pred[subset]
        _ax = ax[r,0]
        fn = np.array([np.mean(pred_sub[y_sub==1]<xi) for xi in x])
        c1 = sum(y_sub==1)
        sen = np.sqrt(fn*(1-fn)/c1)
        fp = np.array([np.mean(pred_sub[y_sub==0]>xi) for xi in x])
        c0 = sum(y_sub==0)
        sep = np.sqrt(fp*(1-fp)/c0)
        p=_ax.plot(x, fn, color=colors[g])
        _ax.fill_between(x, fn-1.64*sen, fn+1.64*sen, alpha=0.25, color=colors[g])
        _ax.annotate(group, (x[bins//7*g], fn[bins//7*g]), color=colors[g])
        _ax.set_ylabel("False +/- Rate")
        _ax.set_xlabel("Threshold")
        _ax.set_title("False Negative Rate")

        _ax = ax[r,1]
        p=_ax.plot(x, fp, color=colors[g])
        _ax.fill_between(x, fp-1.64*sep, fp+1.64*sep, alpha=0.25, color=colors[g])
        _ax.set_xlabel("Threshold")
        _ax.set_title("False Positive Rate")

    fig.tight_layout()
    return(fig,ax)

balance_threshold_plot(decile_mod.predict_proba(X_test)[:,1],
                       df_test["two_year_recid"],
                       df_test);
../_images/recidivism_63_0.png

From this, we can more easily see the balance problem — regardless of which threshold we choose, African-Americans will have a higher false positive rate than Caucasians.

We have seen that COMPAS scores are well-calibrated conditional on race, but not balanced.

Can we create an alternative prediction that is both well-calibrated and balanced?

Creating an Alternative Prediction

As a starting exercise, let’s predict recidivism using the variables in this dataset other than race and COMPAS score.

Almost all variables in this data are categorical.

Any function of categorical variables can be represented as a linear function of indicator variables and their interactions.

Given that linearity in indicators does not impose any substantiative restriction here, a penalized linear model like lasso seems like a good choice for prediction.

To keep the computation time reasonable, we do not include all interaction and indicator terms here.

To ensure that predicted probabilities are between 0 and 1, we fit a logistic regression with an \(\ell-1\) penalty.

from sklearn import model_selection, linear_model
from patsy import dmatrices

# charge_desc has many values with one observations, we will
# combine these descriptions into a single "other." This could
# be improved upon by looking at the text of descriptions and
# combining.
df.c_charge_desc = df.c_charge_desc.fillna("")
df["charge_cat"] = df.c_charge_desc
cnt = df.c_charge_desc.value_counts()[df.c_charge_desc]
cnt.index = df.index
df.loc[cnt<10,"charge_cat"] = "other"
df.charge_cat = df.charge_cat.astype('category')
df.sex = df.sex.astype('category')


fmla = "two_year_recid ~ sex*(age + juv_fel_count + juv_misd_count + juv_other_count + C(priors_count) + c_charge_degree + charge_cat)"

y,X = dmatrices(fmla, df)
print("There are {} features".format(X.shape[1]))
X_train, X_test, y_train, y_test, df_train, df_test = model_selection.train_test_split(
    X,pd.Series(y.reshape(-1),index=df.index),df, test_size=0.25, random_state=42
)

lasso_mod=linear_model.LogisticRegressionCV(cv=5,verbose=True,
                                            Cs=10, penalty='l1',
                                            max_iter=100,
                                            scoring="neg_log_loss",
                                            solver="liblinear").fit(X_train, y_train)
There are 260 features
[LibLinear]iter   1  #CD cycles 1
=========================
optimization finished, #iter = 1
Objective value = 0.282248
#nonzeros/#features = 1/261
[LibLinear]iter   1  #CD cycles 1
iter   2  #CD cycles 1
iter   3  #CD cycles 5
iter   4  #CD cycles 1
=========================
optimization finished, #iter = 4
Objective value = 2.163071
#nonzeros/#features = 1/261
[LibLinear]iter   1  #CD cycles 1
iter   2  #CD cycles 1
iter   3  #CD cycles 4
iter   4  #CD cycles 2
iter   5  #CD cycles 2
iter   6  #CD cycles 1
iter   7  #CD cycles 7
iter   8  #CD cycles 1
iter   9  #CD cycles 4
iter  10  #CD cycles 2
iter  11  #CD cycles 1
iter  12  #CD cycles 11
iter  13  #CD cycles 1
iter  14  #CD cycles 7
iter  15  #CD cycles 2
iter  16  #CD cycles 1
iter  17  #CD cycles 5
=========================
optimization finished, #iter = 17
Objective value = 16.656307
#nonzeros/#features = 2/261
[LibLinear]iter   1  #CD cycles 1
iter   2  #CD cycles 1
iter   3  #CD cycles 15
iter   4  #CD cycles 1
iter   5  #CD cycles 4
iter   6  #CD cycles 1
iter   7  #CD cycles 9
iter   8  #CD cycles 2
iter   9  #CD cycles 3
iter  10  #CD cycles 1
iter  11  #CD cycles 28
iter  12  #CD cycles 10
iter  13  #CD cycles 1
iter  14  #CD cycles 156
iter  15  #CD cycles 8
iter  16  #CD cycles 1
iter  17  #CD cycles 144
iter  18  #CD cycles 42
iter  19  #CD cycles 7
iter  20  #CD cycles 12
iter  21  #CD cycles 5
iter  22  #CD cycles 2
iter  23  #CD cycles 11
iter  24  #CD cycles 96
iter  25  #CD cycles 8
iter  26  #CD cycles 2
iter  27  #CD cycles 1
iter  28  #CD cycles 20
iter  29  #CD cycles 17
iter  30  #CD cycles 9
iter  31  #CD cycles 4
iter  32  #CD cycles 50
iter  33  #CD cycles 3
=========================
optimization finished, #iter = 33
Objective value = 125.461771
#nonzeros/#features = 13/261
[LibLinear]iter   1  #CD cycles 1
iter   2  #CD cycles 4
iter   3  #CD cycles 1
iter   4  #CD cycles 6
iter   5  #CD cycles 4
iter   6  #CD cycles 1
iter   7  #CD cycles 8
iter   8  #CD cycles 1
iter   9  #CD cycles 10
iter  10  #CD cycles 1
iter  11  #CD cycles 122
iter  12  #CD cycles 10
iter  13  #CD cycles 5
iter  14  #CD cycles 3
iter  15  #CD cycles 3
iter  16  #CD cycles 1
iter  17  #CD cycles 34
iter  18  #CD cycles 3
iter  19  #CD cycles 1
iter  20  #CD cycles 37
iter  21  #CD cycles 10
iter  22  #CD cycles 2
iter  23  #CD cycles 3
iter  24  #CD cycles 1
iter  25  #CD cycles 31
=========================
optimization finished, #iter = 25
[LibLinear]Objective value = 918.402894
#nonzeros/#features = 72/261
iter   1  #CD cycles 1
iter   2  #CD cycles 3
iter   3  #CD cycles 1
iter   4  #CD cycles 9
iter   5  #CD cycles 1
iter   6  #CD cycles 9
iter   7  #CD cycles 1
iter   8  #CD cycles 23
iter   9  #CD cycles 1
iter  10  #CD cycles 13
iter  11  #CD cycles 11
iter  12  #CD cycles 7
iter  13  #CD cycles 3
iter  14  #CD cycles 1
iter  15  #CD cycles 178
iter  16  #CD cycles 37
iter  17  #CD cycles 1
iter  18  #CD cycles 234
iter  19  #CD cycles 4
iter  20  #CD cycles 10
iter  21  #CD cycles 22
iter  22  #CD cycles 16
iter  23  #CD cycles 13
iter  24  #CD cycles 3
iter  25  #CD cycles 6
iter  26  #CD cycles 30
iter  27  #CD cycles 3
iter  28  #CD cycles 2
iter  29  #CD cycles 4
iter  30  #CD cycles 1
iter  31  #CD cycles 116
iter  32  #CD cycles 132
=========================
optimization finished, #iter = 32
Objective value = 6675.623323
#nonzeros/#features = 179/261
[LibLinear]iter   1  #CD cycles 1
iter   2  #CD cycles 1
iter   3  #CD cycles 8
iter   4  #CD cycles 4
iter   5  #CD cycles 1
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
iter   6  #CD cycles 15
iter   7  #CD cycles 1
iter   8  #CD cycles 32
iter   9  #CD cycles 1
iter  10  #CD cycles 76
iter  11  #CD cycles 29
iter  12  #CD cycles 1
iter  13  #CD cycles 111
iter  14  #CD cycles 62
iter  15  #CD cycles 3
iter  16  #CD cycles 9
iter  17  #CD cycles 1
iter  18  #CD cycles 114
iter  19  #CD cycles 1
iter  20  #CD cycles 361
=========================
optimization finished, #iter = 20
Objective value = 50324.798065
#nonzeros/#features = 244/261
[LibLinear]iter   1  #CD cycles 1
iter   2  #CD cycles 4
iter   3  #CD cycles 1
iter   4  #CD cycles 8
iter   5  #CD cycles 1
iter   6  #CD cycles 11
iter   7  #CD cycles 2
iter   8  #CD cycles 1
iter   9  #CD cycles 18
iter  10  #CD cycles 6
iter  11  #CD cycles 7
iter  12  #CD cycles 1
iter  13  #CD cycles 105
iter  14  #CD cycles 21
iter  15  #CD cycles 20
iter  16  #CD cycles 6
iter  17  #CD cycles 5
iter  18  #CD cycles 1
iter  19  #CD cycles 340
iter  20  #CD cycles 11
iter  21  #CD cycles 7
iter  22  #CD cycles 16
iter  23  #CD cycles 4
iter  24  #CD cycles 5
iter  25  #CD cycles 3
iter  26  #CD cycles 17
iter  27  #CD cycles 2
iter  28  #CD cycles 4
iter  29  #CD cycles 2
iter  30  #CD cycles 6
iter  31  #CD cycles 8
iter  32  #CD cycles 3
iter  33  #CD cycles 3
iter  34  #CD cycles 6
iter  35  #CD cycles 3
iter  36  #CD cycles 1
iter  37  #CD cycles 883
iter  38  #CD cycles 696
iter  39  #CD cycles 82
iter  40  #CD cycles 25
iter  41  #CD cycles 20
iter  42  #CD cycles 17
iter  43  #CD cycles 53
iter  44  #CD cycles 6
iter  45  #CD cycles 25
iter  46  #CD cycles 11
iter  47  #CD cycles 28
iter  48  #CD cycles 7
iter  49  #CD cycles 6
iter  50  #CD cycles 24
iter  51  #CD cycles 1
WARNING: reaching max number of inner iterations
iter  52  #CD cycles 1000
iter  53  #CD cycles 452
iter  54  #CD cycles 200
=========================
optimization finished, #iter = 54
Objective value = 386592.060085
#nonzeros/#features = 250/261
[LibLinear]iter   1  #CD cycles 1
iter   2  #CD cycles 4
iter   3  #CD cycles 1
iter   4  #CD cycles 6
iter   5  #CD cycles 1
iter   6  #CD cycles 8
iter   7  #CD cycles 4
iter   8  #CD cycles 1
iter   9  #CD cycles 12
iter  10  #CD cycles 7
iter  11  #CD cycles 1
iter  12  #CD cycles 124
iter  13  #CD cycles 17
iter  14  #CD cycles 15
iter  15  #CD cycles 3
iter  16  #CD cycles 3
iter  17  #CD cycles 3
iter  18  #CD cycles 1
iter  19  #CD cycles 379
iter  20  #CD cycles 18
iter  21  #CD cycles 17
iter  22  #CD cycles 3
iter  23  #CD cycles 24
iter  24  #CD cycles 17
iter  25  #CD cycles 12
iter  26  #CD cycles 1
WARNING: reaching max number of inner iterations
iter  27  #CD cycles 1000
WARNING: reaching max number of inner iterations
iter  28  #CD cycles 1000
iter  29  #CD cycles 493
iter  30  #CD cycles 2
iter  31  #CD cycles 3
iter  32  #CD cycles 1
WARNING: reaching max number of inner iterations
iter  33  #CD cycles 1000
WARNING: reaching max number of inner iterations
iter  34  #CD cycles 1000
iter  35  #CD cycles 630
iter  36  #CD cycles 104
=========================
optimization finished, #iter = 36
Objective value = 2988382.136816
#nonzeros/#features = 250/261
[LibLinear]iter   1  #CD cycles 1
iter   2  #CD cycles 4
iter   3  #CD cycles 1
iter   4  #CD cycles 6
iter   5  #CD cycles 3
iter   6  #CD cycles 1
iter   7  #CD cycles 6
iter   8  #CD cycles 3
iter   9  #CD cycles 1
iter  10  #CD cycles 2
iter  11  #CD cycles 16
iter  12  #CD cycles 6
iter  13  #CD cycles 7
iter  14  #CD cycles 3
iter  15  #CD cycles 2
iter  16  #CD cycles 2
iter  17  #CD cycles 1
iter  18  #CD cycles 92
iter  19  #CD cycles 31
iter  20  #CD cycles 7
iter  21  #CD cycles 11
iter  22  #CD cycles 6
iter  23  #CD cycles 1
iter  24  #CD cycles 254
iter  25  #CD cycles 1
WARNING: reaching max number of inner iterations
iter  26  #CD cycles 1000
iter  27  #CD cycles 858
iter  28  #CD cycles 36
iter  29  #CD cycles 311
iter  30  #CD cycles 25
iter  31  #CD cycles 8
iter  32  #CD cycles 4
iter  33  #CD cycles 73
iter  34  #CD cycles 2
iter  35  #CD cycles 25
iter  36  #CD cycles 8
iter  37  #CD cycles 48
iter  38  #CD cycles 3
iter  39  #CD cycles 5
iter  40  #CD cycles 13
=========================
optimization finished, #iter = 40
Objective value = 23132550.455076
#nonzeros/#features = 250/261
[LibLinear]iter   1  #CD cycles 1
=========================
optimization finished, #iter = 1
Objective value = 0.282249
#nonzeros/#features = 1/261
[LibLinear]iter   1  #CD cycles 1
iter   2  #CD cycles 1
iter   3  #CD cycles 1
iter   4  #CD cycles 1
iter   5  #CD cycles 1
iter   6  #CD cycles 4
iter   7  #CD cycles 1
iter   8  #CD cycles 6
iter   9  #CD cycles 1
iter  10  #CD cycles 7
=========================
optimization finished, #iter = 10
Objective value = 2.163544
#nonzeros/#features = 2/261
[LibLinear]iter   1  #CD cycles 1
iter   2  #CD cycles 1
iter   3  #CD cycles 5
iter   4  #CD cycles 1
iter   5  #CD cycles 12
iter   6  #CD cycles 1
iter   7  #CD cycles 8
iter   8  #CD cycles 1
iter   9  #CD cycles 9
iter  10  #CD cycles 3
iter  11  #CD cycles 1
iter  12  #CD cycles 4
iter  13  #CD cycles 1
iter  14  #CD cycles 11
=========================
optimization finished, #iter = 14
Objective value = 16.642208
#nonzeros/#features = 2/261
[LibLinear]iter   1  #CD cycles 1
iter   2  #CD cycles 4
iter   3  #CD cycles 1
iter   4  #CD cycles 7
iter   5  #CD cycles 1
iter   6  #CD cycles 7
iter   7  #CD cycles 1
iter   8  #CD cycles 6
iter   9  #CD cycles 3
iter  10  #CD cycles 1
iter  11  #CD cycles 14
iter  12  #CD cycles 3
iter  13  #CD cycles 1
iter  14  #CD cycles 33
iter  15  #CD cycles 18
iter  16  #CD cycles 1
iter  17  #CD cycles 29
iter  18  #CD cycles 117
iter  19  #CD cycles 2
iter  20  #CD cycles 30
iter  21  #CD cycles 7
iter  22  #CD cycles 8
iter  23  #CD cycles 7
iter  24  #CD cycles 3
iter  25  #CD cycles 3
iter  26  #CD cycles 2
iter  27  #CD cycles 5
iter  28  #CD cycles 13
iter  29  #CD cycles 5
iter  30  #CD cycles 1
iter  31  #CD cycles 82
=========================
optimization finished, #iter = 31
Objective value = 125.892095
#nonzeros/#features = 12/261
[LibLinear]iter   1  #CD cycles 1
iter   2  #CD cycles 5
iter   3  #CD cycles 1
iter   4  #CD cycles 7
iter   5  #CD cycles 1
iter   6  #CD cycles 12
iter   7  #CD cycles 1
iter   8  #CD cycles 15
iter   9  #CD cycles 1
iter  10  #CD cycles 9
iter  11  #CD cycles 81
iter  12  #CD cycles 5
iter  13  #CD cycles 1
iter  14  #CD cycles 34
iter  15  #CD cycles 35
iter  16  #CD cycles 4
iter  17  #CD cycles 1
iter  18  #CD cycles 11
iter  19  #CD cycles 26
iter  20  #CD cycles 2
iter  21  #CD cycles 1
iter  22  #CD cycles 23
=========================
optimization finished, #iter = 22
Objective value = 924.400397
#nonzeros/#features = 68/261
[LibLinear]iter   1  #CD cycles 1
iter   2  #CD cycles 4
iter   3  #CD cycles 1
iter   4  #CD cycles 9
iter   5  #CD cycles 1
iter   6  #CD cycles 17
iter   7  #CD cycles 1
iter   8  #CD cycles 16
iter   9  #CD cycles 4
iter  10  #CD cycles 3
iter  11  #CD cycles 1
iter  12  #CD cycles 38
iter  13  #CD cycles 7
iter  14  #CD cycles 2
iter  15  #CD cycles 1
iter  16  #CD cycles 25
iter  17  #CD cycles 1
iter  18  #CD cycles 114
iter  19  #CD cycles 391
iter  20  #CD cycles 41
iter  21  #CD cycles 1
iter  22  #CD cycles 37
iter  23  #CD cycles 92
=========================
optimization finished, #iter = 23
Objective value = 6720.449115
#nonzeros/#features = 188/261
[LibLinear]iter   1  #CD cycles 1
iter   2  #CD cycles 1
iter   3  #CD cycles 8
iter   4  #CD cycles 5
iter   5  #CD cycles 1
iter   6  #CD cycles 8
iter   7  #CD cycles 2
iter   8  #CD cycles 1
iter   9  #CD cycles 20
iter  10  #CD cycles 1
iter  11  #CD cycles 119
iter  12  #CD cycles 7
iter  13  #CD cycles 1
iter  14  #CD cycles 194
iter  15  #CD cycles 1
iter  16  #CD cycles 155
iter  17  #CD cycles 42
=========================
optimization finished, #iter = 17
Objective value = 50693.785395
#nonzeros/#features = 240/261
[LibLinear]iter   1  #CD cycles 1
iter   2  #CD cycles 4
iter   3  #CD cycles 1
iter   4  #CD cycles 8
iter   5  #CD cycles 2
iter   6  #CD cycles 1
iter   7  #CD cycles 9
iter   8  #CD cycles 2
iter   9  #CD cycles 1
iter  10  #CD cycles 11
iter  11  #CD cycles 4
iter  12  #CD cycles 15
iter  13  #CD cycles 2
iter  14  #CD cycles 7
iter  15  #CD cycles 1
iter  16  #CD cycles 123
iter  17  #CD cycles 11
iter  18  #CD cycles 2
iter  19  #CD cycles 13
iter  20  #CD cycles 1
iter  21  #CD cycles 217
iter  22  #CD cycles 14
iter  23  #CD cycles 42
iter  24  #CD cycles 8
iter  25  #CD cycles 4
iter  26  #CD cycles 3
iter  27  #CD cycles 1
iter  28  #CD cycles 669
iter  29  #CD cycles 911
iter  30  #CD cycles 107
iter  31  #CD cycles 13
iter  32  #CD cycles 98
iter  33  #CD cycles 12
iter  34  #CD cycles 25
iter  35  #CD cycles 26
iter  36  #CD cycles 45
iter  37  #CD cycles 22
iter  38  #CD cycles 2
iter  39  #CD cycles 29
=========================
optimization finished, #iter = 39
Objective value = 389822.797632
#nonzeros/#features = 249/261
[LibLinear]iter   1  #CD cycles 1
iter   2  #CD cycles 5
iter   3  #CD cycles 1
iter   4  #CD cycles 4
iter   5  #CD cycles 3
iter   6  #CD cycles 1
iter   7  #CD cycles 6
iter   8  #CD cycles 6
iter   9  #CD cycles 1
iter  10  #CD cycles 17
iter  11  #CD cycles 7
iter  12  #CD cycles 12
iter  13  #CD cycles 1
iter  14  #CD cycles 87
iter  15  #CD cycles 8
iter  16  #CD cycles 6
iter  17  #CD cycles 6
iter  18  #CD cycles 7
iter  19  #CD cycles 9
iter  20  #CD cycles 4
iter  21  #CD cycles 3
iter  22  #CD cycles 4
iter  23  #CD cycles 6
iter  24  #CD cycles 4
iter  25  #CD cycles 5
iter  26  #CD cycles 4
iter  27  #CD cycles 5
iter  28  #CD cycles 4
iter  29  #CD cycles 1
iter  30  #CD cycles 168
iter  31  #CD cycles 70
iter  32  #CD cycles 10
iter  33  #CD cycles 7
iter  34  #CD cycles 4
iter  35  #CD cycles 7
iter  36  #CD cycles 1
WARNING: reaching max number of inner iterations
iter  37  #CD cycles 1000
WARNING: reaching max number of inner iterations
iter  38  #CD cycles 1000
iter  39  #CD cycles 557
iter  40  #CD cycles 122
iter  41  #CD cycles 106
=========================
optimization finished, #iter = 41
Objective value = 3014287.051545
#nonzeros/#features = 249/261
[LibLinear]iter   1  #CD cycles 1
iter   2  #CD cycles 3
iter   3  #CD cycles 2
iter   4  #CD cycles 1
iter   5  #CD cycles 4
iter   6  #CD cycles 2
iter   7  #CD cycles 1
iter   8  #CD cycles 10
iter   9  #CD cycles 1
iter  10  #CD cycles 17
iter  11  #CD cycles 2
iter  12  #CD cycles 4
iter  13  #CD cycles 3
iter  14  #CD cycles 13
iter  15  #CD cycles 3
iter  16  #CD cycles 1
iter  17  #CD cycles 43
iter  18  #CD cycles 19
iter  19  #CD cycles 7
iter  20  #CD cycles 2
iter  21  #CD cycles 25
iter  22  #CD cycles 7
iter  23  #CD cycles 9
iter  24  #CD cycles 2
iter  25  #CD cycles 3
iter  26  #CD cycles 6
iter  27  #CD cycles 11
iter  28  #CD cycles 3
iter  29  #CD cycles 1
iter  30  #CD cycles 194
iter  31  #CD cycles 10
iter  32  #CD cycles 15
iter  33  #CD cycles 2
iter  34  #CD cycles 7
iter  35  #CD cycles 2
iter  36  #CD cycles 23
iter  37  #CD cycles 6
iter  38  #CD cycles 1
WARNING: reaching max number of inner iterations
iter  39  #CD cycles 1000
iter  40  #CD cycles 244
iter  41  #CD cycles 332
iter  42  #CD cycles 200
iter  43  #CD cycles 89
iter  44  #CD cycles 3
iter  45  #CD cycles 174
iter  46  #CD cycles 50
iter  47  #CD cycles 6
iter  48  #CD cycles 4
iter  49  #CD cycles 75
iter  50  #CD cycles 57
iter  51  #CD cycles 38
iter  52  #CD cycles 66
iter  53  #CD cycles 22
iter  54  #CD cycles 2
iter  55  #CD cycles 15
iter  56  #CD cycles 9
iter  57  #CD cycles 35
iter  58  #CD cycles 2
iter  59  #CD cycles 19
iter  60  #CD cycles 13
iter  61  #CD cycles 14
iter  62  #CD cycles 3
iter  63  #CD cycles 19
iter  64  #CD cycles 15
iter  65  #CD cycles 6
iter  66  #CD cycles 61
iter  67  #CD cycles 7
iter  68  #CD cycles 15
iter  69  #CD cycles 50
iter  70  #CD cycles 4
iter  71  #CD cycles 21
iter  72  #CD cycles 76
=========================
optimization finished, #iter = 72
Objective value = 23333535.745047
#nonzeros/#features = 249/261
[LibLinear]=========================
optimization finished, #iter = 0
Objective value = 0.282250
#nonzeros/#features = 0/261
[LibLinear]iter   1  #CD cycles 1
iter   2  #CD cycles 1
=========================
optimization finished, #iter = 2
Objective value = 2.165028
#nonzeros/#features = 1/261
[LibLinear]iter   1  #CD cycles 1
iter   2  #CD cycles 3
iter   3  #CD cycles 2
iter   4  #CD cycles 1
iter   5  #CD cycles 3
iter   6  #CD cycles 1
iter   7  #CD cycles 10
iter   8  #CD cycles 2
iter   9  #CD cycles 1
iter  10  #CD cycles 7
[LibLinear]iter  11  #CD cycles 1
iter  12  #CD cycles 9
iter  13  #CD cycles 1
iter  14  #CD cycles 11
iter  15  #CD cycles 1
iter  16  #CD cycles 8
=========================
optimization finished, #iter = 16
Objective value = 16.659641
#nonzeros/#features = 2/261
iter   1  #CD cycles 1
iter   2  #CD cycles 3
iter   3  #CD cycles 1
iter   4  #CD cycles 10
iter   5  #CD cycles 1
iter   6  #CD cycles 6
iter   7  #CD cycles 1
iter   8  #CD cycles 12
iter   9  #CD cycles 2
iter  10  #CD cycles 2
iter  11  #CD cycles 2
iter  12  #CD cycles 1
iter  13  #CD cycles 11
iter  14  #CD cycles 13
iter  15  #CD cycles 5
iter  16  #CD cycles 3
iter  17  #CD cycles 2
iter  18  #CD cycles 6
iter  19  #CD cycles 10
iter  20  #CD cycles 2
iter  21  #CD cycles 5
iter  22  #CD cycles 10
iter  23  #CD cycles 7
iter  24  #CD cycles 4
iter  25  #CD cycles 5
iter  26  #CD cycles 3
iter  27  #CD cycles 8
iter  28  #CD cycles 1
iter  29  #CD cycles 29
iter  30  #CD cycles 31
iter  31  #CD cycles 24
iter  32  #CD cycles 8
iter  33  #CD cycles 48
iter  34  #CD cycles 5
iter  35  #CD cycles 2
iter  36  #CD cycles 4
iter  37  #CD cycles 5
iter  38  #CD cycles 5
iter  39  #CD cycles 4
iter  40  #CD cycles 2
iter  41  #CD cycles 3
iter  42  #CD cycles 8
iter  43  #CD cycles 8
iter  44  #CD cycles 7
iter  45  #CD cycles 1
iter  46  #CD cycles 89
iter  47  #CD cycles 5
iter  48  #CD cycles 2
iter  49  #CD cycles 9
iter  50  #CD cycles 1
iter  51  #CD cycles 122
iter  52  #CD cycles 46
=========================
optimization finished, #iter = 52
Objective value = 126.264110
#nonzeros/#features = 13/261
[LibLinear]iter   1  #CD cycles 1
iter   2  #CD cycles 4
iter   3  #CD cycles 1
iter   4  #CD cycles 7
iter   5  #CD cycles 1
iter   6  #CD cycles 14
iter   7  #CD cycles 1
iter   8  #CD cycles 10
iter   9  #CD cycles 3
iter  10  #CD cycles 1
iter  11  #CD cycles 37
iter  12  #CD cycles 1
iter  13  #CD cycles 50
iter  14  #CD cycles 11
iter  15  #CD cycles 16
iter  16  #CD cycles 7
iter  17  #CD cycles 15
iter  18  #CD cycles 2
iter  19  #CD cycles 7
iter  20  #CD cycles 1
iter  21  #CD cycles 129
iter  22  #CD cycles 33
iter  23  #CD cycles 12
iter  24  #CD cycles 10
iter  25  #CD cycles 2
iter  26  #CD cycles 18
iter  27  #CD cycles 3
iter  28  #CD cycles 5
iter  29  #CD cycles 10
iter  30  #CD cycles 5
iter  31  #CD cycles 5
iter  32  #CD cycles 6
iter  33  #CD cycles 6
iter  34  #CD cycles 2
iter  35  #CD cycles 7
iter  36  #CD cycles 6
iter  37  #CD cycles 7
iter  38  #CD cycles 4
iter  39  #CD cycles 3
iter  40  #CD cycles 2
iter  41  #CD cycles 8
iter  42  #CD cycles 3
iter  43  #CD cycles 7
iter  44  #CD cycles 5
=========================
optimization finished, #iter = 44
Objective value = 928.734297
#nonzeros/#features = 74/261
[LibLinear]iter   1  #CD cycles 1
iter   2  #CD cycles 3
iter   3  #CD cycles 1
iter   4  #CD cycles 10
iter   5  #CD cycles 1
iter   6  #CD cycles 11
iter   7  #CD cycles 1
iter   8  #CD cycles 36
iter   9  #CD cycles 1
iter  10  #CD cycles 81
iter  11  #CD cycles 23
iter  12  #CD cycles 27
iter  13  #CD cycles 8
iter  14  #CD cycles 1
iter  15  #CD cycles 41
iter  16  #CD cycles 53
iter  17  #CD cycles 6
iter  18  #CD cycles 2
iter  19  #CD cycles 8
iter  20  #CD cycles 29
iter  21  #CD cycles 1
iter  22  #CD cycles 252
iter  23  #CD cycles 131
iter  24  #CD cycles 1
iter  25  #CD cycles 131
=========================
optimization finished, #iter = 25
Objective value = 6776.722757
#nonzeros/#features = 187/261
[LibLinear]iter   1  #CD cycles 1
iter   2  #CD cycles 3
iter   3  #CD cycles 1
iter   4  #CD cycles 6
iter   5  #CD cycles 1
iter   6  #CD cycles 16
iter   7  #CD cycles 1
iter   8  #CD cycles 80
iter   9  #CD cycles 18
iter  10  #CD cycles 3
iter  11  #CD cycles 9
iter  12  #CD cycles 2
iter  13  #CD cycles 4
iter  14  #CD cycles 4
iter  15  #CD cycles 1
iter  16  #CD cycles 199
iter  17  #CD cycles 5
iter  18  #CD cycles 1
iter  19  #CD cycles 121
iter  20  #CD cycles 35
iter  21  #CD cycles 58
iter  22  #CD cycles 6
iter  23  #CD cycles 32
iter  24  #CD cycles 9
iter  25  #CD cycles 1
iter  26  #CD cycles 555
iter  27  #CD cycles 15
iter  28  #CD cycles 2
iter  29  #CD cycles 8
iter  30  #CD cycles 3
iter  31  #CD cycles 4
iter  32  #CD cycles 14
iter  33  #CD cycles 19
iter  34  #CD cycles 1
iter  35  #CD cycles 405
iter  36  #CD cycles 281
iter  37  #CD cycles 44
=========================
optimization finished, #iter = 37
Objective value = 51111.069536
#nonzeros/#features = 244/261
[LibLinear]iter   1  #CD cycles 1
iter   2  #CD cycles 3
iter   3  #CD cycles 1
iter   4  #CD cycles 8
iter   5  #CD cycles 1
iter   6  #CD cycles 7
iter   7  #CD cycles 4
iter   8  #CD cycles 2
iter   9  #CD cycles 2
iter  10  #CD cycles 1
iter  11  #CD cycles 30
iter  12  #CD cycles 3
iter  13  #CD cycles 3
iter  14  #CD cycles 1
iter  15  #CD cycles 142
iter  16  #CD cycles 57
iter  17  #CD cycles 2
iter  18  #CD cycles 6
iter  19  #CD cycles 6
iter  20  #CD cycles 11
iter  21  #CD cycles 7
iter  22  #CD cycles 5
iter  23  #CD cycles 8
iter  24  #CD cycles 1
iter  25  #CD cycles 215
iter  26  #CD cycles 68
iter  27  #CD cycles 11
iter  28  #CD cycles 17
iter  29  #CD cycles 3
iter  30  #CD cycles 23
iter  31  #CD cycles 9
iter  32  #CD cycles 2
iter  33  #CD cycles 1
iter  34  #CD cycles 696
iter  35  #CD cycles 688
iter  36  #CD cycles 163
iter  37  #CD cycles 1
WARNING: reaching max number of inner iterations
iter  38  #CD cycles 1000
iter  39  #CD cycles 970
=========================
optimization finished, #iter = 39
Objective value = 392993.597132
#nonzeros/#features = 252/261
[LibLinear]iter   1  #CD cycles 1
iter   2  #CD cycles 3
iter   3  #CD cycles 1
iter   4  #CD cycles 8
iter   5  #CD cycles 1
iter   6  #CD cycles 12
iter   7  #CD cycles 2
iter   8  #CD cycles 2
iter   9  #CD cycles 4
iter  10  #CD cycles 3
iter  11  #CD cycles 1
iter  12  #CD cycles 64
iter  13  #CD cycles 4
iter  14  #CD cycles 36
iter  15  #CD cycles 7
iter  16  #CD cycles 1
iter  17  #CD cycles 70
iter  18  #CD cycles 24
iter  19  #CD cycles 30
iter  20  #CD cycles 7
iter  21  #CD cycles 2
iter  22  #CD cycles 12
iter  23  #CD cycles 6
iter  24  #CD cycles 1
iter  25  #CD cycles 223
iter  26  #CD cycles 2
iter  27  #CD cycles 62
iter  28  #CD cycles 26
iter  29  #CD cycles 49
iter  30  #CD cycles 1
WARNING: reaching max number of inner iterations
iter  31  #CD cycles 1000
WARNING: reaching max number of inner iterations
iter  32  #CD cycles 1000
iter  33  #CD cycles 120
iter  34  #CD cycles 161
iter  35  #CD cycles 13
iter  36  #CD cycles 97
iter  37  #CD cycles 34
iter  38  #CD cycles 74
iter  39  #CD cycles 1
WARNING: reaching max number of inner iterations
iter  40  #CD cycles 1000
=========================
optimization finished, #iter = 40
Objective value = 3038641.736251
#nonzeros/#features = 249/261
[LibLinear]iter   1  #CD cycles 1
iter   2  #CD cycles 4
iter   3  #CD cycles 1
iter   4  #CD cycles 7
iter   5  #CD cycles 2
iter   6  #CD cycles 1
iter   7  #CD cycles 7
iter   8  #CD cycles 2
iter   9  #CD cycles 1
iter  10  #CD cycles 10
iter  11  #CD cycles 12
iter  12  #CD cycles 16
iter  13  #CD cycles 17
iter  14  #CD cycles 1
iter  15  #CD cycles 94
iter  16  #CD cycles 23
iter  17  #CD cycles 15
iter  18  #CD cycles 10
iter  19  #CD cycles 6
iter  20  #CD cycles 33
iter  21  #CD cycles 2
iter  22  #CD cycles 4
iter  23  #CD cycles 1
iter  24  #CD cycles 112
iter  25  #CD cycles 66
iter  26  #CD cycles 26
iter  27  #CD cycles 25
iter  28  #CD cycles 4
iter  29  #CD cycles 4
iter  30  #CD cycles 1
WARNING: reaching max number of inner iterations
iter  31  #CD cycles 1000
WARNING: reaching max number of inner iterations
iter  32  #CD cycles 1000
iter  33  #CD cycles 347
iter  34  #CD cycles 94
iter  35  #CD cycles 24
iter  36  #CD cycles 127
iter  37  #CD cycles 102
iter  38  #CD cycles 5
iter  39  #CD cycles 129
iter  40  #CD cycles 13
iter  41  #CD cycles 18
iter  42  #CD cycles 20
iter  43  #CD cycles 64
iter  44  #CD cycles 3
iter  45  #CD cycles 2
iter  46  #CD cycles 13
iter  47  #CD cycles 40
iter  48  #CD cycles 6
iter  49  #CD cycles 7
iter  50  #CD cycles 36
iter  51  #CD cycles 29
iter  52  #CD cycles 31
iter  53  #CD cycles 10
iter  54  #CD cycles 28
iter  55  #CD cycles 3
iter  56  #CD cycles 5
iter  57  #CD cycles 3
iter  58  #CD cycles 27
=========================
optimization finished, #iter = 58
Objective value = 23521809.207024
#nonzeros/#features = 252/261
[LibLinear]iter   1  #CD cycles 1
=========================
optimization finished, #iter = 1
Objective value = 0.282249
#nonzeros/#features = 1/261
[LibLinear]iter   1  #CD cycles 1
iter   2  #CD cycles 3
iter   3  #CD cycles 1
iter   4  #CD cycles 1
iter   5  #CD cycles 1
iter   6  #CD cycles 1
iter   7  #CD cycles 1
iter   8  #CD cycles 6
iter   9  #CD cycles 3
iter  10  #CD cycles 1
iter  11  #CD cycles 6
=========================
optimization finished, #iter = 11
Objective value = 2.163670
#nonzeros/#features = 2/261
[LibLinear]iter   1  #CD cycles 1
iter   2  #CD cycles 5
iter   3  #CD cycles 1
iter   4  #CD cycles 4
iter   5  #CD cycles 1
iter   6  #CD cycles 10
iter   7  #CD cycles 1
iter   8  #CD cycles 5
iter   9  #CD cycles 1
iter  10  #CD cycles 6
iter  11  #CD cycles 1
iter  12  #CD cycles 9
iter  13  #CD cycles 2
iter  14  #CD cycles 1
iter  15  #CD cycles 8
=========================
optimization finished, #iter = 15
Objective value = 16.644976
#nonzeros/#features = 2/261
[LibLinear]iter   1  #CD cycles 1
iter   2  #CD cycles 1
iter   3  #CD cycles 12
iter   4  #CD cycles 1
iter   5  #CD cycles 11
iter   6  #CD cycles 1
iter   7  #CD cycles 5
iter   8  #CD cycles 3
iter   9  #CD cycles 1
iter  10  #CD cycles 4
iter  11  #CD cycles 11
iter  12  #CD cycles 3
iter  13  #CD cycles 3
iter  14  #CD cycles 1
iter  15  #CD cycles 14
iter  16  #CD cycles 8
iter  17  #CD cycles 92
iter  18  #CD cycles 5
iter  19  #CD cycles 28
iter  20  #CD cycles 7
iter  21  #CD cycles 26
iter  22  #CD cycles 7
iter  23  #CD cycles 4
iter  24  #CD cycles 5
iter  25  #CD cycles 17
iter  26  #CD cycles 2
iter  27  #CD cycles 2
iter  28  #CD cycles 13
iter  29  #CD cycles 2
iter  30  #CD cycles 1
iter  31  #CD cycles 59
iter  32  #CD cycles 38
iter  33  #CD cycles 25
iter  34  #CD cycles 8
iter  35  #CD cycles 17
iter  36  #CD cycles 1
iter  37  #CD cycles 140
=========================
optimization finished, #iter = 37
Objective value = 125.655760
#nonzeros/#features = 15/261
[LibLinear]iter   1  #CD cycles 1
iter   2  #CD cycles 3
iter   3  #CD cycles 1
iter   4  #CD cycles 8
iter   5  #CD cycles 4
iter   6  #CD cycles 1
iter   7  #CD cycles 14
iter   8  #CD cycles 1
iter   9  #CD cycles 1
iter  10  #CD cycles 201
iter  11  #CD cycles 1
iter  12  #CD cycles 111
iter  13  #CD cycles 1
iter  14  #CD cycles 17
iter  15  #CD cycles 20
iter  16  #CD cycles 1
iter  17  #CD cycles 47
=========================
optimization finished, #iter = 17
Objective value = 921.272524
#nonzeros/#features = 71/261
[LibLinear]iter   1  #CD cycles 1
iter   2  #CD cycles 3
iter   3  #CD cycles 1
iter   4  #CD cycles 12
iter   5  #CD cycles 1
iter   6  #CD cycles 12
iter   7  #CD cycles 1
iter   8  #CD cycles 36
iter   9  #CD cycles 1
iter  10  #CD cycles 29
iter  11  #CD cycles 1
iter  12  #CD cycles 98
iter  13  #CD cycles 2
iter  14  #CD cycles 3
iter  15  #CD cycles 217
iter  16  #CD cycles 3
iter  17  #CD cycles 4
iter  18  #CD cycles 8
iter  19  #CD cycles 3
iter  20  #CD cycles 1
iter  21  #CD cycles 313
iter  22  #CD cycles 1
iter  23  #CD cycles 37
iter  24  #CD cycles 32
iter  25  #CD cycles 85
=========================
optimization finished, #iter = 25
Objective value = 6693.204425
#nonzeros/#features = 188/261
[LibLinear]iter   1  #CD cycles 1
iter   2  #CD cycles 4
iter   3  #CD cycles 1
iter   4  #CD cycles 9
iter   5  #CD cycles 1
iter   6  #CD cycles 9
iter   7  #CD cycles 3
iter   8  #CD cycles 1
iter   9  #CD cycles 12
iter  10  #CD cycles 1
iter  11  #CD cycles 92
iter  12  #CD cycles 13
iter  13  #CD cycles 7
iter  14  #CD cycles 1
iter  15  #CD cycles 130
iter  16  #CD cycles 5
iter  17  #CD cycles 68
iter  18  #CD cycles 1
iter  19  #CD cycles 379
iter  20  #CD cycles 2
iter  21  #CD cycles 21
iter  22  #CD cycles 43
iter  23  #CD cycles 2
iter  24  #CD cycles 8
iter  25  #CD cycles 13
iter  26  #CD cycles 22
iter  27  #CD cycles 10
iter  28  #CD cycles 2
iter  29  #CD cycles 4
iter  30  #CD cycles 1
iter  31  #CD cycles 749
=========================
optimization finished, #iter = 31
Objective value = 50375.114878
#nonzeros/#features = 246/261
[LibLinear]iter   1  #CD cycles 1
iter   2  #CD cycles 2
iter   3  #CD cycles 2
iter   4  #CD cycles 1
iter   5  #CD cycles 7
iter   6  #CD cycles 1
iter   7  #CD cycles 10
iter   8  #CD cycles 1
iter   9  #CD cycles 11
iter  10  #CD cycles 2
iter  11  #CD cycles 2
iter  12  #CD cycles 3
iter  13  #CD cycles 10
iter  14  #CD cycles 5
iter  15  #CD cycles 3
iter  16  #CD cycles 1
iter  17  #CD cycles 4
iter  18  #CD cycles 44
iter  19  #CD cycles 1
iter  20  #CD cycles 469
iter  21  #CD cycles 1
WARNING: reaching max number of inner iterations
iter  22  #CD cycles 1000
iter  23  #CD cycles 866
iter  24  #CD cycles 210
iter  25  #CD cycles 88
iter  26  #CD cycles 1
WARNING: reaching max number of inner iterations
iter  27  #CD cycles 1000
iter  28  #CD cycles 898
=========================
optimization finished, #iter = 28
Objective value = 387051.570217
#nonzeros/#features = 253/261
[LibLinear]iter   1  #CD cycles 1
iter   2  #CD cycles 2
iter   3  #CD cycles 2
iter   4  #CD cycles 1
iter   5  #CD cycles 7
iter   6  #CD cycles 1
iter   7  #CD cycles 11
iter   8  #CD cycles 1
iter   9  #CD cycles 12
iter  10  #CD cycles 5
iter  11  #CD cycles 6
iter  12  #CD cycles 1
iter  13  #CD cycles 45
iter  14  #CD cycles 12
iter  15  #CD cycles 21
iter  16  #CD cycles 10
iter  17  #CD cycles 1
iter  18  #CD cycles 431
iter  19  #CD cycles 78
iter  20  #CD cycles 68
iter  21  #CD cycles 2
iter  22  #CD cycles 26
iter  23  #CD cycles 154
iter  24  #CD cycles 12
iter  25  #CD cycles 7
iter  26  #CD cycles 10
iter  27  #CD cycles 3
iter  28  #CD cycles 12
iter  29  #CD cycles 8
iter  30  #CD cycles 8
iter  31  #CD cycles 1
WARNING: reaching max number of inner iterations
iter  32  #CD cycles 1000
iter  33  #CD cycles 998
iter  34  #CD cycles 86
iter  35  #CD cycles 13
iter  36  #CD cycles 10
iter  37  #CD cycles 171
=========================
optimization finished, #iter = 37
Objective value = 2992215.810882
#nonzeros/#features = 254/261
[LibLinear]iter   1  #CD cycles 1
iter   2  #CD cycles 3
iter   3  #CD cycles 1
iter   4  #CD cycles 7
iter   5  #CD cycles 2
iter   6  #CD cycles 1
iter   7  #CD cycles 11
iter   8  #CD cycles 2
iter   9  #CD cycles 1
iter  10  #CD cycles 14
iter  11  #CD cycles 5
iter  12  #CD cycles 1
iter  13  #CD cycles 49
iter  14  #CD cycles 37
iter  15  #CD cycles 1
iter  16  #CD cycles 348
iter  17  #CD cycles 4
iter  18  #CD cycles 3
iter  19  #CD cycles 23
iter  20  #CD cycles 76
iter  21  #CD cycles 74
iter  22  #CD cycles 6
iter  23  #CD cycles 8
iter  24  #CD cycles 66
iter  25  #CD cycles 1
WARNING: reaching max number of inner iterations
iter  26  #CD cycles 1000
iter  27  #CD cycles 677
iter  28  #CD cycles 99
iter  29  #CD cycles 99
iter  30  #CD cycles 2
iter  31  #CD cycles 7
iter  32  #CD cycles 4
iter  33  #CD cycles 254
iter  34  #CD cycles 22
iter  35  #CD cycles 103
iter  36  #CD cycles 32
iter  37  #CD cycles 9
iter  38  #CD cycles 138
iter  39  #CD cycles 26
iter  40  #CD cycles 20
iter  41  #CD cycles 7
iter  42  #CD cycles 4
iter  43  #CD cycles 35
iter  44  #CD cycles 6
iter  45  #CD cycles 8
iter  46  #CD cycles 2
iter  47  #CD cycles 9
iter  48  #CD cycles 3
iter  49  #CD cycles 16
iter  50  #CD cycles 3
iter  51  #CD cycles 8
iter  52  #CD cycles 62
iter  53  #CD cycles 10
iter  54  #CD cycles 5
iter  55  #CD cycles 7
iter  56  #CD cycles 20
iter  57  #CD cycles 1
WARNING: reaching max number of inner iterations
iter  58  #CD cycles 1000
iter  59  #CD cycles 848
=========================
optimization finished, #iter = 59
Objective value = 23161327.040475
#nonzeros/#features = 253/261
[LibLinear]iter   1  #CD cycles 1
=========================
optimization finished, #iter = 1
Objective value = 0.282249
#nonzeros/#features = 1/261
[LibLinear]iter   1  #CD cycles 1
iter   2  #CD cycles 4
iter   3  #CD cycles 1
=========================
optimization finished, #iter = 3
Objective value = 2.163172
#nonzeros/#features = 1/261
[LibLinear]iter   1  #CD cycles 1
iter   2  #CD cycles 1
iter   3  #CD cycles 12
iter   4  #CD cycles 1
iter   5  #CD cycles 3
iter   6  #CD cycles 1
iter   7  #CD cycles 10
iter   8  #CD cycles 4
iter   9  #CD cycles 1
iter  10  #CD cycles 4
iter  11  #CD cycles 1
iter  12  #CD cycles 10
iter  13  #CD cycles 1
iter  14  #CD cycles 11
=========================
optimization finished, #iter = 14
Objective value = 16.656407
#nonzeros/#features = 2/261
[LibLinear]iter   1  #CD cycles 1
iter   2  #CD cycles 1
iter   3  #CD cycles 11
iter   4  #CD cycles 1
iter   5  #CD cycles 9
iter   6  #CD cycles 1
iter   7  #CD cycles 7
iter   8  #CD cycles 6
iter   9  #CD cycles 1
iter  10  #CD cycles 41
iter  11  #CD cycles 70
iter  12  #CD cycles 4
iter  13  #CD cycles 16
iter  14  #CD cycles 13
iter  15  #CD cycles 1
iter  16  #CD cycles 9
iter  17  #CD cycles 29
iter  18  #CD cycles 1
iter  19  #CD cycles 56
iter  20  #CD cycles 37
iter  21  #CD cycles 23
iter  22  #CD cycles 1
iter  23  #CD cycles 151
iter  24  #CD cycles 1
iter  25  #CD cycles 248
=========================
optimization finished, #iter = 25
Objective value = 125.373820
#nonzeros/#features = 14/261
[LibLinear]iter   1  #CD cycles 1
iter   2  #CD cycles 3
iter   3  #CD cycles 1
iter   4  #CD cycles 9
iter   5  #CD cycles 3
iter   6  #CD cycles 1
iter   7  #CD cycles 15
iter   8  #CD cycles 1
iter   9  #CD cycles 63
iter  10  #CD cycles 1
iter  11  #CD cycles 99
iter  12  #CD cycles 66
iter  13  #CD cycles 3
iter  14  #CD cycles 18
iter  15  #CD cycles 8
iter  16  #CD cycles 1
iter  17  #CD cycles 178
iter  18  #CD cycles 22
iter  19  #CD cycles 8
iter  20  #CD cycles 5
iter  21  #CD cycles 3
iter  22  #CD cycles 16
iter  23  #CD cycles 6
iter  24  #CD cycles 2
iter  25  #CD cycles 7
iter  26  #CD cycles 2
iter  27  #CD cycles 1
iter  28  #CD cycles 42
iter  29  #CD cycles 2
iter  30  #CD cycles 12
iter  31  #CD cycles 48
iter  32  #CD cycles 45
iter  33  #CD cycles 7
iter  34  #CD cycles 19
iter  35  #CD cycles 5
iter  36  #CD cycles 13
iter  37  #CD cycles 1
iter  38  #CD cycles 107
=========================
optimization finished, #iter = 38
Objective value = 913.979522
#nonzeros/#features = 70/261
[LibLinear]iter   1  #CD cycles 1
iter   2  #CD cycles 2
iter   3  #CD cycles 1
iter   4  #CD cycles 10
iter   5  #CD cycles 1
iter   6  #CD cycles 22
iter   7  #CD cycles 2
iter   8  #CD cycles 1
iter   9  #CD cycles 109
iter  10  #CD cycles 5
iter  11  #CD cycles 2
iter  12  #CD cycles 5
iter  13  #CD cycles 15
iter  14  #CD cycles 10
iter  15  #CD cycles 1
iter  16  #CD cycles 138
iter  17  #CD cycles 2
iter  18  #CD cycles 13
iter  19  #CD cycles 20
iter  20  #CD cycles 4
iter  21  #CD cycles 2
iter  22  #CD cycles 5
iter  23  #CD cycles 2
iter  24  #CD cycles 2
iter  25  #CD cycles 1
iter  26  #CD cycles 368
iter  27  #CD cycles 1
iter  28  #CD cycles 115
iter  29  #CD cycles 54
iter  30  #CD cycles 28
iter  31  #CD cycles 6
iter  32  #CD cycles 19
iter  33  #CD cycles 30
iter  34  #CD cycles 15
iter  35  #CD cycles 9
iter  36  #CD cycles 13
iter  37  #CD cycles 36
iter  38  #CD cycles 15
iter  39  #CD cycles 6
iter  40  #CD cycles 2
iter  41  #CD cycles 2
iter  42  #CD cycles 2
iter  43  #CD cycles 3
iter  44  #CD cycles 3
iter  45  #CD cycles 1
iter  46  #CD cycles 121
=========================
optimization finished, #iter = 46
Objective value = 6632.705149
#nonzeros/#features = 188/261
[LibLinear]iter   1  #CD cycles 1
iter   2  #CD cycles 2
iter   3  #CD cycles 2
iter   4  #CD cycles 1
iter   5  #CD cycles 6
iter   6  #CD cycles 2
iter   7  #CD cycles 1
iter   8  #CD cycles 13
iter   9  #CD cycles 2
iter  10  #CD cycles 2
iter  11  #CD cycles 1
iter  12  #CD cycles 122
iter  13  #CD cycles 9
iter  14  #CD cycles 11
iter  15  #CD cycles 1
iter  16  #CD cycles 297
iter  17  #CD cycles 19
iter  18  #CD cycles 4
iter  19  #CD cycles 9
iter  20  #CD cycles 2
iter  21  #CD cycles 4
iter  22  #CD cycles 3
iter  23  #CD cycles 1
iter  24  #CD cycles 156
iter  25  #CD cycles 284
iter  26  #CD cycles 3
iter  27  #CD cycles 1
iter  28  #CD cycles 81
iter  29  #CD cycles 99
iter  30  #CD cycles 118
iter  31  #CD cycles 24
iter  32  #CD cycles 1
iter  33  #CD cycles 996
=========================
optimization finished, #iter = 33
Objective value = 49991.024312
#nonzeros/#features = 244/261
[LibLinear]iter   1  #CD cycles 1
iter   2  #CD cycles 3
iter   3  #CD cycles 1
iter   4  #CD cycles 8
iter   5  #CD cycles 1
iter   6  #CD cycles 13
iter   7  #CD cycles 1
iter   8  #CD cycles 76
iter   9  #CD cycles 20
iter  10  #CD cycles 24
iter  11  #CD cycles 11
iter  12  #CD cycles 1
iter  13  #CD cycles 204
iter  14  #CD cycles 50
iter  15  #CD cycles 22
iter  16  #CD cycles 26
iter  17  #CD cycles 2
iter  18  #CD cycles 22
iter  19  #CD cycles 3
iter  20  #CD cycles 7
iter  21  #CD cycles 7
iter  22  #CD cycles 12
iter  23  #CD cycles 2
iter  24  #CD cycles 2
iter  25  #CD cycles 3
iter  26  #CD cycles 3
iter  27  #CD cycles 3
iter  28  #CD cycles 1
iter  29  #CD cycles 373
iter  30  #CD cycles 57
iter  31  #CD cycles 4
iter  32  #CD cycles 2
iter  33  #CD cycles 2
iter  34  #CD cycles 8
iter  35  #CD cycles 73
iter  36  #CD cycles 2
iter  37  #CD cycles 8
iter  38  #CD cycles 5
iter  39  #CD cycles 11
iter  40  #CD cycles 3
iter  41  #CD cycles 10
iter  42  #CD cycles 2
iter  43  #CD cycles 2
iter  44  #CD cycles 12
iter  45  #CD cycles 4
iter  46  #CD cycles 1
iter  47  #CD cycles 577
iter  48  #CD cycles 160
iter  49  #CD cycles 432
iter  50  #CD cycles 44
iter  51  #CD cycles 139
iter  52  #CD cycles 53
iter  53  #CD cycles 10
iter  54  #CD cycles 5
iter  55  #CD cycles 17
iter  56  #CD cycles 45
iter  57  #CD cycles 47
iter  58  #CD cycles 103
iter  59  #CD cycles 12
iter  60  #CD cycles 132
iter  61  #CD cycles 2
iter  62  #CD cycles 13
iter  63  #CD cycles 82
iter  64  #CD cycles 2
=========================
optimization finished, #iter = 64
Objective value = 384339.748682
#nonzeros/#features = 255/261
[LibLinear]iter   1  #CD cycles 1
iter   2  #CD cycles 3
iter   3  #CD cycles 1
iter   4  #CD cycles 7
iter   5  #CD cycles 2
iter   6  #CD cycles 1
iter   7  #CD cycles 14
iter   8  #CD cycles 2
iter   9  #CD cycles 2
iter  10  #CD cycles 1
iter  11  #CD cycles 65
iter  12  #CD cycles 5
iter  13  #CD cycles 3
iter  14  #CD cycles 32
iter  15  #CD cycles 13
iter  16  #CD cycles 3
iter  17  #CD cycles 6
iter  18  #CD cycles 15
iter  19  #CD cycles 4
iter  20  #CD cycles 7
iter  21  #CD cycles 2
iter  22  #CD cycles 4
iter  23  #CD cycles 1
iter  24  #CD cycles 94
iter  25  #CD cycles 38
iter  26  #CD cycles 17
iter  27  #CD cycles 14
iter  28  #CD cycles 12
iter  29  #CD cycles 3
iter  30  #CD cycles 2
iter  31  #CD cycles 5
iter  32  #CD cycles 7
iter  33  #CD cycles 3
iter  34  #CD cycles 2
iter  35  #CD cycles 3
iter  36  #CD cycles 18
iter  37  #CD cycles 4
iter  38  #CD cycles 6
iter  39  #CD cycles 10
iter  40  #CD cycles 4
iter  41  #CD cycles 6
iter  42  #CD cycles 2
iter  43  #CD cycles 1
iter  44  #CD cycles 279
iter  45  #CD cycles 15
iter  46  #CD cycles 7
iter  47  #CD cycles 86
iter  48  #CD cycles 21
iter  49  #CD cycles 133
iter  50  #CD cycles 27
iter  51  #CD cycles 11
iter  52  #CD cycles 2
iter  53  #CD cycles 21
iter  54  #CD cycles 1
WARNING: reaching max number of inner iterations
iter  55  #CD cycles 1000
iter  56  #CD cycles 627
iter  57  #CD cycles 14
iter  58  #CD cycles 49
iter  59  #CD cycles 5
iter  60  #CD cycles 10
iter  61  #CD cycles 73
iter  62  #CD cycles 27
iter  63  #CD cycles 14
iter  64  #CD cycles 4
iter  65  #CD cycles 135
iter  66  #CD cycles 153
iter  67  #CD cycles 2
iter  68  #CD cycles 2
iter  69  #CD cycles 9
iter  70  #CD cycles 5
iter  71  #CD cycles 1
WARNING: reaching max number of inner iterations
iter  72  #CD cycles 1000
=========================
optimization finished, #iter = 72
Objective value = 2971749.158650
#nonzeros/#features = 256/261
[LibLinear]iter   1  #CD cycles 1
iter   2  #CD cycles 4
iter   3  #CD cycles 1
iter   4  #CD cycles 3
iter   5  #CD cycles 2
iter   6  #CD cycles 3
iter   7  #CD cycles 1
iter   8  #CD cycles 11
iter   9  #CD cycles 1
iter  10  #CD cycles 88
iter  11  #CD cycles 21
iter  12  #CD cycles 3
iter  13  #CD cycles 22
iter  14  #CD cycles 13
iter  15  #CD cycles 4
iter  16  #CD cycles 1
iter  17  #CD cycles 113
iter  18  #CD cycles 22
iter  19  #CD cycles 30
iter  20  #CD cycles 29
iter  21  #CD cycles 17
iter  22  #CD cycles 5
iter  23  #CD cycles 17
iter  24  #CD cycles 9
iter  25  #CD cycles 10
iter  26  #CD cycles 11
iter  27  #CD cycles 7
iter  28  #CD cycles 15
iter  29  #CD cycles 3
iter  30  #CD cycles 5
iter  31  #CD cycles 2
iter  32  #CD cycles 5
iter  33  #CD cycles 7
iter  34  #CD cycles 1
iter  35  #CD cycles 44
iter  36  #CD cycles 173
iter  37  #CD cycles 17
iter  38  #CD cycles 32
iter  39  #CD cycles 21
iter  40  #CD cycles 55
iter  41  #CD cycles 5
iter  42  #CD cycles 21
iter  43  #CD cycles 13
iter  44  #CD cycles 2
iter  45  #CD cycles 30
iter  46  #CD cycles 8
iter  47  #CD cycles 52
iter  48  #CD cycles 15
iter  49  #CD cycles 3
iter  50  #CD cycles 1
WARNING: reaching max number of inner iterations
iter  51  #CD cycles 1000
iter  52  #CD cycles 498
iter  53  #CD cycles 132
iter  54  #CD cycles 10
iter  55  #CD cycles 11
iter  56  #CD cycles 38
iter  57  #CD cycles 16
iter  58  #CD cycles 87
iter  59  #CD cycles 5
iter  60  #CD cycles 15
iter  61  #CD cycles 4
=========================
optimization finished, #iter = 61
Objective value = 23004524.715121
#nonzeros/#features = 256/261
[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:    7.3s finished

Let’s look at the regularization parameter chosen and the non-zero coefficients.

# plots illustrating regularization parameter choice
scores=lasso_mod.scores_[1.0].mean(axis=0)
logpenalties=np.log(lasso_mod.Cs_)
nnonzero=(np.abs(lasso_mod.coefs_paths_[1.0])>1e-6).sum(axis=2).mean(axis=0)
colors=plt.rcParams["axes.prop_cycle"].by_key()["color"]
fig, ax1 = plt.subplots()
ax1.plot(logpenalties,scores, color=colors[0])
ax1.set_ylabel("CV log likelihood", color=colors[0])
ax1.set_xlabel("-log(penalty)")
ax1.tick_params('y', colors=colors[0])
ax2 = ax1.twinx()
ax2.plot(logpenalties,nnonzero, color=colors[1])
ax2.set_ylabel("nonzero coefficients", color=colors[1])
ax2.tick_params('y', colors=colors[1])
ax2.grid(b=None);
/tmp/ipykernel_3702/1066459780.py:15: MatplotlibDeprecationWarning: The 'b' parameter of grid() has been renamed 'visible' since Matplotlib 3.5; support for the old name will be dropped two minor releases later.
  ax2.grid(b=None);
../_images/recidivism_67_1.png

Let’s also look at the nonzero coefficients. We should be careful about interpreting these, since relatively strong assumptions are needed for lasso to produce consistent coefficient estimates.

Note

Lasso gives accurate predictions under weaker assumptions than needed for consistent coefficient estimates.

# table of nonzero coefficients
coef = pd.DataFrame(index = X.design_info.column_names, columns=["Value"])
coef.Value = np.transpose(lasso_mod.coef_)
print(sum(np.abs(coef.Value)>1.0e-8))
with pd.option_context('display.max_rows', None):
    display(coef[np.abs(coef.Value)>1.0e-8])
82
Value
Intercept 0.034562
sex[T.Male] 0.724336
C(priors_count)[T.2] 0.325187
C(priors_count)[T.3] 0.612248
C(priors_count)[T.4] 0.749704
C(priors_count)[T.5] 0.732791
C(priors_count)[T.6] 0.792307
C(priors_count)[T.7] 1.032687
C(priors_count)[T.8] 1.322572
C(priors_count)[T.9] 1.286944
C(priors_count)[T.10] 1.318460
C(priors_count)[T.11] 1.236914
C(priors_count)[T.12] 0.985709
C(priors_count)[T.13] 1.528307
C(priors_count)[T.14] 0.406049
C(priors_count)[T.15] 0.781426
C(priors_count)[T.16] 1.231386
C(priors_count)[T.17] 0.371057
C(priors_count)[T.18] 1.524271
C(priors_count)[T.19] 1.143294
C(priors_count)[T.20] 0.116828
C(priors_count)[T.21] 1.298536
C(priors_count)[T.22] 1.676826
C(priors_count)[T.23] 1.332186
C(priors_count)[T.25] 0.481942
C(priors_count)[T.26] 0.526392
c_charge_degree[T.M] -0.228168
charge_cat[T.Aggrav Battery w/Deadly Weapon] -0.152874
charge_cat[T.Aggravated Assault W/dead Weap] -0.457750
charge_cat[T.Aggravated Battery] -0.426220
charge_cat[T.Battery] -0.008872
charge_cat[T.Battery on Law Enforc Officer] 0.312835
charge_cat[T.Burglary Unoccupied Dwelling] -0.151922
charge_cat[T.Criminal Mischief] 0.441666
charge_cat[T.DUI Level 0.15 Or Minor In Veh] -0.043785
charge_cat[T.DUI Property Damage/Injury] -0.957441
charge_cat[T.Driving License Suspended] -0.062539
charge_cat[T.Driving Under The Influence] -0.141177
charge_cat[T.Felony Driving While Lic Suspd] 0.182810
charge_cat[T.Felony Petit Theft] 0.697405
charge_cat[T.Grand Theft (Motor Vehicle)] 0.134825
charge_cat[T.Operating W/O Valid License] -0.084301
charge_cat[T.Petit Theft] 0.574941
charge_cat[T.Petit Theft $100- $300] 0.019334
charge_cat[T.Poss Pyrrolidinovalerophenone] 2.135473
charge_cat[T.Poss3,4 Methylenedioxymethcath] -0.186720
charge_cat[T.Possession of Cannabis] -0.394881
charge_cat[T.Possession of Cocaine] 0.595590
charge_cat[T.Viol Injunct Domestic Violence] 0.135645
charge_cat[T.Viol Pretrial Release Dom Viol] 0.231013
charge_cat[T.arrest case no charge] -0.181091
charge_cat[T.other] -0.196711
sex[T.Male]:C(priors_count)[T.1] 0.034431
sex[T.Male]:C(priors_count)[T.3] 0.087797
sex[T.Male]:C(priors_count)[T.6] 0.036040
sex[T.Male]:C(priors_count)[T.10] 0.200290
sex[T.Male]:C(priors_count)[T.14] 0.709745
sex[T.Male]:C(priors_count)[T.16] 0.076917
sex[T.Male]:C(priors_count)[T.17] 1.115131
sex[T.Male]:C(priors_count)[T.26] 0.381267
sex[T.Male]:c_charge_degree[T.M] 0.163765
sex[T.Male]:charge_cat[T.Aggravated Battery / Pregnant] 0.024316
sex[T.Male]:charge_cat[T.Burglary Conveyance Unoccup] 0.516827
sex[T.Male]:charge_cat[T.Corrupt Public Servant] 0.717912
sex[T.Male]:charge_cat[T.Crimin Mischief Damage $1000+] 0.044232
sex[T.Male]:charge_cat[T.Driving License Suspended] -0.000923
sex[T.Male]:charge_cat[T.Driving Under The Influence] -0.035057
sex[T.Male]:charge_cat[T.Driving While License Revoked] -0.094700
sex[T.Male]:charge_cat[T.False Ownership Info/Pawn Item] 0.480690
sex[T.Male]:charge_cat[T.Felony Battery w/Prior Convict] -0.096456
sex[T.Male]:charge_cat[T.Felony Driving While Lic Suspd] 0.114954
sex[T.Male]:charge_cat[T.Grand Theft in the 3rd Degree] 0.277302
sex[T.Male]:charge_cat[T.Petit Theft] 0.199618
sex[T.Male]:charge_cat[T.Possession of Cannabis] -0.051146
sex[T.Male]:charge_cat[T.Resist/Obstruct W/O Violence] 0.272859
sex[T.Male]:charge_cat[T.Viol Injunct Domestic Violence] 0.325807
age -0.024240
sex[T.Male]:age -0.014782
juv_fel_count 0.302101
sex[T.Male]:juv_misd_count 0.181027
juv_other_count 0.149146
sex[T.Male]:juv_other_count -0.019165

Now, let’s look at calibration and balance using similar tables and figures as we did above.

output, given_outcome, given_pred =cm_tables(
    lasso_mod.predict(X_test),
    y_test,
    df_test
)
display(output)
display(given_pred)
display(given_outcome)

calibration_plot(lasso_mod.predict_proba(X_test)[:,1],y_test, df_test)
balance_threshold_plot(lasso_mod.predict_proba(X_test)[:,1],y_test, df_test);
overall African-American Caucasian Hispanic Female Male
Portion_of_NoRecid_and_LowRisk 0.411903 0.332272 0.490000 0.593548 0.596542 0.364444
Portion_of_Recid_and_LowRisk 0.141426 0.161359 0.121667 0.096774 0.048991 0.165185
Portion_of_NoRecid_and_HighRisk 0.190925 0.182590 0.201667 0.200000 0.250720 0.175556
Portion_of_Recid_and_HighRisk 0.255745 0.323779 0.186667 0.109677 0.103746 0.294815
overall African-American Caucasian Hispanic Female Male
P(NoRecid|LowRisk) 0.683284 0.645361 0.708434 0.747967 0.704082 0.674897
P(NoRecid|HighRisk) 0.356083 0.332604 0.394595 0.468750 0.320755 0.359098
P(Recid|LowRisk) 0.316716 0.354639 0.291566 0.252033 0.295918 0.325103
P(Recid|HighRisk) 0.643917 0.667396 0.605405 0.531250 0.679245 0.640902
overall African-American Caucasian Hispanic Female Male
P(LowRisk|NoRecid) 0.744409 0.673118 0.801090 0.859813 0.924107 0.688112
P(HighRisk|NoRecid) 0.255591 0.326882 0.198910 0.140187 0.075893 0.311888
P(LowRisk|Recid) 0.427441 0.360587 0.519313 0.645833 0.707317 0.373228
P(HighRisk|Recid) 0.572559 0.639413 0.480687 0.354167 0.292683 0.626772
../_images/recidivism_71_3.png ../_images/recidivism_71_4.png

As with COMPAS score, our predictions are well-calibrated, but the false negative and false positive rates are not well balanced across racial groups.

Exercise

See exercise 3 in the exercise list.

Regularizing to Maximize Balance

Trying to improve balance by ad-hoc modifications will be difficult.

Let’s try to do it more systematically.

We usually select models and choose regularization to minimize prediction errors.

We can just as well select models and regularization parameters to optimize some other criteria.

Let’s choose the regularization parameter for lasso to maximize balance.

# define a custom CV criteria to maximize
def balance_scorer(y_true, prob, df, weights):
    ind = df.isin(y_true.index)
    df_cv = df.loc[y_true.index.values,:]
    b = df_cv.race=="African-American"
    w = df_cv.race=="Caucasian"
    y_pred = 1*(prob>0.5)
    fprb = np.mean(y_pred[(y_true==0) & b])
    fprw = np.mean(y_pred[(y_true==0) & w])
    fnrb = np.mean(y_pred[(y_true==1) & b]==0)
    fnrw = np.mean(y_pred[(y_true==1) & w]==0)
    return(-weights[0]*(fprb-fprw)**2 +
           -weights[1]*(fnrb-fnrw)**2 +
           -weights[2]*(metrics.log_loss(y_true, prob, normalize=True)))

score_params = {"df": df_train, "weights": [10.0, 1.0, 0.0]}
scorer = metrics.make_scorer(balance_scorer, **score_params, needs_proba=True)
grid_cv = model_selection.GridSearchCV(
    estimator=linear_model.LogisticRegression(penalty="l1",
                                              max_iter=100,
                                              solver="liblinear"),
    scoring=scorer,
    cv=5,
    param_grid={'C':
    np.exp(np.linspace(-10,10,10))},
    return_train_score=True,
    verbose=True,
    refit=True,)

balance_mod=grid_cv.fit(X_train,y_train)
Fitting 5 folds for each of 10 candidates, totalling 50 fits
# plots illustrating regularization parameter choice
def grid_cv_plot(mod, ylabel=""):
    scores=mod.cv_results_["mean_test_score"]
    Cdict=mod.cv_results_["params"]
    logpenalties=np.log([d['C'] for d in Cdict])
    colors=plt.rcParams["axes.prop_cycle"].by_key()["color"]
    fig, ax1 = plt.subplots()
    ax1.plot(logpenalties,scores, color=colors[0])
    ax1.set_ylabel(ylabel, color=colors[0])
    ax1.set_xlabel("-log(penalty)")
    ax1.tick_params('y', colors=colors[0]);
grid_cv_plot(balance_mod,"CV balance score")
../_images/recidivism_74_0.png

We can be perfectly balanced by making the regularization parameter very large.

Unfortunately, this makes all the predictions identical, so these predictions are not so useful.

output, given_outcome, given_pred =cm_tables(
    balance_mod.best_estimator_.predict(X_test),
    y_test,
    df_test
)
display(output)
display(given_pred)
display(given_outcome)
/tmp/ipykernel_3702/1996610596.py:44: RuntimeWarning: invalid value encountered in true_divide
  pcm = pcm/pcm.sum(axis=axis, keepdims=True)
overall African-American Caucasian Hispanic Female Male
Portion_of_NoRecid_and_LowRisk 0.553329 0.493631 0.611667 0.690323 0.645533 0.52963
Portion_of_Recid_and_LowRisk 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000
Portion_of_NoRecid_and_HighRisk 0.446671 0.506369 0.388333 0.309677 0.354467 0.47037
Portion_of_Recid_and_HighRisk 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000
overall African-American Caucasian Hispanic Female Male
P(NoRecid|LowRisk) 0.553329 0.493631 0.611667 0.690323 0.645533 0.52963
P(NoRecid|HighRisk) NaN NaN NaN NaN NaN NaN
P(Recid|LowRisk) 0.446671 0.506369 0.388333 0.309677 0.354467 0.47037
P(Recid|HighRisk) NaN NaN NaN NaN NaN NaN
overall African-American Caucasian Hispanic Female Male
P(LowRisk|NoRecid) 1.0 1.0 1.0 1.0 1.0 1.0
P(HighRisk|NoRecid) 0.0 0.0 0.0 0.0 0.0 0.0
P(LowRisk|Recid) 1.0 1.0 1.0 1.0 1.0 1.0
P(HighRisk|Recid) 0.0 0.0 0.0 0.0 0.0 0.0

What if we change our CV scoring function to care about both prediction and balance?

score_params = {"df": df_train, "weights": [10.0, 1.0, 5.0]}
grid_cv.set_params(scoring=metrics.make_scorer(balance_scorer, **score_params, needs_proba=True))
bf_mod=grid_cv.fit(X_train,y_train)
grid_cv_plot(bf_mod,"CV balance & fit")

output, given_outcome, given_pred =cm_tables(
    bf_mod.best_estimator_.predict(X_test),
    y_test,
    df_test
)
display(output)
display(given_pred)
display(given_outcome)
calibration_plot(bf_mod.best_estimator_.predict_proba(X_test)[:,1],y_test, df_test)
balance_threshold_plot(bf_mod.best_estimator_.predict_proba(X_test)[:,1],y_test, df_test);
Fitting 5 folds for each of 10 candidates, totalling 50 fits
overall African-American Caucasian Hispanic Female Male
Portion_of_NoRecid_and_LowRisk 0.404243 0.325902 0.483333 0.574194 0.556196 0.365185
Portion_of_Recid_and_LowRisk 0.149087 0.167728 0.128333 0.116129 0.089337 0.164444
Portion_of_NoRecid_and_HighRisk 0.190336 0.181529 0.201667 0.200000 0.242075 0.177037
Portion_of_Recid_and_HighRisk 0.256335 0.324841 0.186667 0.109677 0.112392 0.293333
overall African-American Caucasian Hispanic Female Male
P(NoRecid|LowRisk) 0.679881 0.642259 0.705596 0.741667 0.696751 0.673497
P(NoRecid|HighRisk) 0.367733 0.340517 0.407407 0.514286 0.442857 0.359223
P(Recid|LowRisk) 0.320119 0.357741 0.294404 0.258333 0.303249 0.326503
P(Recid|HighRisk) 0.632267 0.659483 0.592593 0.485714 0.557143 0.640777
overall African-American Caucasian Hispanic Female Male
P(LowRisk|NoRecid) 0.730564 0.660215 0.790191 0.831776 0.861607 0.689510
P(HighRisk|NoRecid) 0.269436 0.339785 0.209809 0.168224 0.138393 0.310490
P(LowRisk|Recid) 0.426121 0.358491 0.519313 0.645833 0.682927 0.376378
P(HighRisk|Recid) 0.573879 0.641509 0.480687 0.354167 0.317073 0.623622
../_images/recidivism_78_4.png ../_images/recidivism_78_5.png ../_images/recidivism_78_6.png

Exercise

See exercise 4 in the exercise list.

Tradeoffs are Inevitable

We could try to tweak our predictions further to improve balance.

However, motivated in part by this COMPAS example, [recidKLL+17] proved that it is impossible for any prediction algorithm to be both perfectly balanced and well-calibrated.

Improvements in balance necessarily make calibration worse.

References

recidDMB16

William Dieterich, Christina Mendoza, and Tim Brennan. Compas risk scales: demonstrating accuracy equity and predictive parity. Northpoint Inc, 2016.

recidKLL+17(1,2,3)

Jon Kleinberg, Himabindu Lakkaraju, Jure Leskovec, Jens Ludwig, and Sendhil Mullainathan. Human Decisions and Machine Predictions*. The Quarterly Journal of Economics, 133(1):237–293, 08 2017. URL: https://dx.doi.org/10.1093/qje/qjx032, arXiv:http://oup.prod.sis.lan/qje/article-pdf/133/1/237/24246094/qjx032.pdf, doi:10.1093/qje/qjx032.

Exercises

Exercise 1

Can you develop a model that performs better at mimicking their risk scores?

(back to text)

Exercise 2

We made our calibration plot using a held-out test sample. What do you think would happen if made the calibration plot using the training sample? Check and see.

# Create calibration plot using training data

(back to text)

Exercise 3

Try to improve balance and/or calibration by creating an alternative prediction.

# Fit your prediction model and plot calibration and balance

(back to text)

Exercise 4

Modify the cross-validation scoring function to see how it affects calibration and balance.

(back to text)