Case Study: Recidivism#
Co-authors
Prerequisites
Outcomes
See an end-to-end data science exercise
Application of regression
# Uncomment following line to install on colab
#! pip install fiona geopandas xgboost gensim folium pyLDAvis descartes
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from sklearn import (
linear_model, metrics, neural_network, pipeline, preprocessing, model_selection
)
%matplotlib inline
Introduction to Recidivism#
Recidivism is the tendency for an individual who has previously committed a crime to commit another crime in the future.
One key input to a judge’s sentencing decision is how likely a given convict is to re-offend, or recidivate.
In an effort to assist the legal system with sentencing guidelines, data scientists have attempted to predict an individual’s risk of recidivism from known observables.
Some are concerned that this process may exhibit prejudice, either through biased inputs or through statistical discrimination.
For example,
Biased inputs: Imagine that a judge often writes harsher sentences to people of a particular race or gender. If an algorithm is trained to reproduce the sentences of this judge, the bias will be propagated by the algorithm.
Statistical discrimination: Imagine that two variables (say race and income) are correlated, and one of them (say income) is correlated with the risk of recidivism. If income is unobserved, then an otherwise unbiased method would discriminate based on race, even if race has nothing to say about recidivism after controlling for income.
This has given rise to serious discussions about the moral obligations data scientists have to those who are affected by their tools.
We will not take a stance today on our moral obligations, but we believe this is an important precursor to any statistical work with public policy applications.
One predictive tool used by various courts in the United States is called COMPAS (Correctional Offender Management Profiling for Alternative Sanctions).
We will be following a Pro Publica article that analyzes the output of COMPAS.
The findings of the article include:
Black defendants were often predicted to be at a higher risk of recidivism than they actually were.
White defendants were often predicted to be less risky than they were.
Black defendants were twice as likely as white defendants to be misclassified as being a higher risk of violent recidivism.
Even when controlling for prior crimes, future recidivism, age, and gender, black defendants were 77 percent more likely to be assigned higher risk scores than white defendants.
Data Description#
The authors of this article filed a public records request with the Broward County Sheriff’s office in Florida.
Luckily for us, they did a significant amount of the legwork which is described in this methodology article.
We download the data below.
data_url = "https://raw.githubusercontent.com/propublica/compas-analysis"
data_url += "/master/compas-scores-two-years.csv"
df = pd.read_csv(data_url)
df.head()
id | name | first | last | compas_screening_date | sex | dob | age | age_cat | race | ... | v_decile_score | v_score_text | v_screening_date | in_custody | out_custody | priors_count.1 | start | end | event | two_year_recid | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | miguel hernandez | miguel | hernandez | 2013-08-14 | Male | 1947-04-18 | 69 | Greater than 45 | Other | ... | 1 | Low | 2013-08-14 | 2014-07-07 | 2014-07-14 | 0 | 0 | 327 | 0 | 0 |
1 | 3 | kevon dixon | kevon | dixon | 2013-01-27 | Male | 1982-01-22 | 34 | 25 - 45 | African-American | ... | 1 | Low | 2013-01-27 | 2013-01-26 | 2013-02-05 | 0 | 9 | 159 | 1 | 1 |
2 | 4 | ed philo | ed | philo | 2013-04-14 | Male | 1991-05-14 | 24 | Less than 25 | African-American | ... | 3 | Low | 2013-04-14 | 2013-06-16 | 2013-06-16 | 4 | 0 | 63 | 0 | 1 |
3 | 5 | marcu brown | marcu | brown | 2013-01-13 | Male | 1993-01-21 | 23 | Less than 25 | African-American | ... | 6 | Medium | 2013-01-13 | NaN | NaN | 1 | 0 | 1174 | 0 | 0 |
4 | 6 | bouthy pierrelouis | bouthy | pierrelouis | 2013-03-26 | Male | 1973-01-22 | 43 | 25 - 45 | Other | ... | 1 | Low | 2013-03-26 | NaN | NaN | 2 | 0 | 1102 | 0 | 0 |
5 rows × 53 columns
We summarize some of the variables that we will use.
first
: An individual’s first namelast
: An individual’s last namesex
: An individual’s sexage
: An individual’s agerace
: An individual’s race. It takes values of Caucasian, Hispanic, African-American, Native American, Asian, or Otherpriors_count
: Number of previous arrestsdecile_score
: The COMPAS risk scoretwo_year_recid
: Whether the individual had been jailed for a new crime in next two years
Descriptive Statistics#
The first thing we do with our data is to drop any classes without “enough” observations.
One of our focuses will be on inter-race differences in scores and recidivism, so we only keep data on races with at least 500 observations in our data set.
Just be aware that this kind of seemingly and even genuinely benign or “technical” decision can still perpetuate inequality by exclusion.
For example, Asians are a small minority, so they’re not really present in the data, and therefore they’re absent from the policy discussion — we have no inferential knowledge on how COMPAS scores work for them.
race_count = df.groupby(["race"])["name"].count()
at_least_500 = list(race_count[race_count > 500].index)
print("The following race have at least 500 observations:", at_least_500)
df = df.loc[df["race"].isin(at_least_500), :]
The following race have at least 500 observations: ['African-American', 'Caucasian', 'Hispanic']
Next, we explore the remaining data using plots and tables.
Age, Sex, and Race#
Let’s look at how the dataset is broken down into age, sex, and race.
def create_groupcount_barplot(df, group_col, figsize, **kwargs):
"call df.groupby(group_col), then count number of records and plot"
counts = df.groupby(group_col)["name"].count().sort_index()
fig, ax = plt.subplots(figsize=figsize)
counts.plot(kind="bar", **kwargs)
ax.spines["right"].set_visible(False)
ax.spines["top"].set_visible(False)
ax.set_xlabel("")
ax.set_ylabel("")
return fig, ax
age_cs = ["Less than 25", "25 - 45", "Greater than 45"]
df["age_cat"] = pd.Categorical(df["age_cat"], categories=age_cs, ordered=True)
fig, ax = create_groupcount_barplot(df, "age_cat", (14, 8), color="DarkBlue", rot=0)

sex_cs = ["Female", "Male"]
df["sex"] = pd.Categorical(df["sex"], categories=sex_cs, ordered=True)
create_groupcount_barplot(df, "sex", (6, 8), color="DarkBlue", rot=0)
(<Figure size 600x800 with 1 Axes>, <AxesSubplot: >)

race_cs = ["African-American", "Caucasian", "Hispanic"]
df["race"] = pd.Categorical(df["race"], categories=race_cs, ordered=True)
create_groupcount_barplot(df, "race", (12, 8), color="DarkBlue", rot=0)
(<Figure size 1200x800 with 1 Axes>, <AxesSubplot: >)

From this, we learn that our population is mostly between 25-45, male, and is mostly African-American or Caucasian.
Recidivism#
We now look into how recidivism is split across groups.
recid = df.groupby(["age_cat", "sex", "race"])["two_year_recid"].mean().unstack(level="race")
recid
race | African-American | Caucasian | Hispanic | |
---|---|---|---|---|
age_cat | sex | |||
Less than 25 | Female | 0.449704 | 0.310345 | 0.411765 |
Male | 0.645806 | 0.541254 | 0.536364 | |
25 - 45 | Female | 0.382278 | 0.423948 | 0.333333 |
Male | 0.533074 | 0.433699 | 0.375000 | |
Greater than 45 | Female | 0.227273 | 0.239766 | 0.217391 |
Male | 0.425101 | 0.289157 | 0.216667 |
In the table, we see that the young have higher recidivism rates than the old, except for among Caucasian females.
Also, African-American males are at a particularly high risk of recidivism even as they get older.
Risk Scores#
Each individual in the dataset was assigned a decile_score
ranging from 1 to 10.
This score represents the perceived risk of recidivism with 1 being the lowest risk and 10 being the highest.
We show a bar plot of all decile scores below.
create_groupcount_barplot(df, "decile_score", (12, 8), color="DarkBlue", rot=0)
(<Figure size 1200x800 with 1 Axes>, <AxesSubplot: >)

How do these scores differ by race?
dfgb = df.groupby("race")
race_count = df.groupby("race")["name"].count()
fig, ax = plt.subplots(3, figsize=(14, 8))
for (i, race) in enumerate(["African-American", "Caucasian", "Hispanic"]):
(
(dfgb
.get_group(race)
.groupby("decile_score")["name"].count() / race_count[race]
)
.plot(kind="bar", ax=ax[i], color="#353535")
)
ax[i].set_ylabel(race)
ax[i].set_xlabel("")
# set equal y limit for visual comparison
ax[i].set_ylim(0, 0.32)
fig.suptitle("Score Frequency by Race")
Text(0.5, 0.98, 'Score Frequency by Race')

While Caucasians and Hispanics both see the majority of their score distribution on low values, African-Americans are almost equally likely to receive any score.
Risk Scores and Recidivism#
Now we can explore the relationship between the risk score and actual two year recidivism.
The first measure we look at is the frequency of recidivism by decile score – these numbers tell us what percentage of people assigned a particular risk score committed a new crime within two years of being released.
df.groupby("decile_score")["two_year_recid"].mean()
decile_score
1 0.220392
2 0.309112
3 0.375887
4 0.426593
5 0.478723
6 0.564228
7 0.590988
8 0.681363
9 0.698795
10 0.770889
Name: two_year_recid, dtype: float64
Let’s also look at the correlation.
df[["decile_score", "two_year_recid"]].corr()
decile_score | two_year_recid | |
---|---|---|
decile_score | 1.000000 | 0.346797 |
two_year_recid | 0.346797 | 1.000000 |
As the risk score increases, the percentage of people committing a new crime does as well, with a positive correlation (~0.35).
This is good news – it means that the score is producing at least some signal about an individual’s recidivism risk.
One of the key critiques from Pro Publica, though, was that the inaccuracies were nonuniform — that is, the tool was systematically wrong about certain populations.
Let’s now separate the correlations by race and see what happens.
recid_rates = df.pivot_table(index="decile_score", columns="race", values="two_year_recid")
recid_rates
race | African-American | Caucasian | Hispanic |
---|---|---|---|
decile_score | |||
1 | 0.228643 | 0.208517 | 0.244898 |
2 | 0.302799 | 0.313019 | 0.318584 |
3 | 0.419075 | 0.340659 | 0.313953 |
4 | 0.459740 | 0.396491 | 0.346154 |
5 | 0.482192 | 0.460581 | 0.538462 |
6 | 0.559896 | 0.572165 | 0.567568 |
7 | 0.592500 | 0.615385 | 0.470588 |
8 | 0.682451 | 0.719298 | 0.500000 |
9 | 0.707895 | 0.693878 | 0.550000 |
10 | 0.793706 | 0.703125 | 0.666667 |
Or, in plotted form,
fig, ax = plt.subplots(3, sharex="all")
for (i, _race) in enumerate(["African-American", "Caucasian", "Hispanic"]):
_rr_vals = recid_rates[_race].values
ax[i].bar(np.arange(1, 11), _rr_vals, color="#c60000")
ax[i].bar(np.arange(1, 11), 1 - _rr_vals, bottom=_rr_vals, color="#353535")
ax[i].set_ylabel(_race)
ax[i].spines["left"].set_visible(False)
ax[i].spines["right"].set_visible(False)
ax[i].spines["top"].set_visible(False)
ax[i].spines["bottom"].set_visible(False)
ax[i].yaxis.tick_right()
ax[i].xaxis.set_ticks_position("none")
fig.suptitle("Recidivism Rates by Race")
Text(0.5, 0.98, 'Recidivism Rates by Race')

Regression#
In what follows, we will be doing something slightly different than what was done in the Pro Publica article.
First, we will explore what happens when we try to predict the COMPAS risk scores using the observable data that we have.
Second, we will use binary probability models to predict whether an individual is at risk of recidivism.
We will do this first using the COMPAS risk scores, and then afterwards we will try to write our own model based on raw observables, like age, race and sex.
Preprocessing#
We would like to use some features that are inherently non-numerical such as sex, age group, and race in our model.
Before we can do that, we need to encode these string values as numerical values so our machine learning algorithms can understand them – an econometrician would call this, creating dummy variables.
sklearn
can automatically do this for us using OneHotEncoder
.
Essentially, we make one column for each possible value of a categorical variable and then we set just one of these columns equal to a 1 if the observation has that column’s category, and set all other columns to 0.
Let’s do an example.
Imagine we have the array below.
sex = np.array([["Male"], ["Female"], ["Male"], ["Male"], ["Female"]])
The way to encode this would be to create the array below.
sex_encoded = np.array([
[0.0, 1.0],
[1.0, 0.0],
[0.0, 1.0],
[0.0, 1.0],
[1.0, 0.0]
])
Using sklearn
it would be:
ohe = preprocessing.OneHotEncoder(sparse=False)
sex_ohe = ohe.fit_transform(sex)
# This should shows 0s!
sex_ohe - sex_encoded
/usr/share/miniconda3/envs/lecture-datascience/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py:808: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.
warnings.warn(
array([[0., 0.],
[0., 0.],
[0., 0.],
[0., 0.],
[0., 0.]])
We will use this encoding trick below as we create our data.
Predicting COMPAS Scores#
First, we proceed by creating the X
and y
inputs into a manageable format.
We encode the categorical variables using the OneHotEncoder
described above, and then merge that with the non-categorical data.
Finally, we split the data into training and validation (test) subsets.
def prep_data(df, continuous_variables, categories, y_var, test_size=0.15):
ohe = preprocessing.OneHotEncoder(sparse=False)
y = df[y_var].values
X = np.zeros((y.size, 0))
# Add continuous variables if exist
if len(continuous_variables) > 0:
X = np.hstack([X, df[continuous_variables].values])
if len(categories) > 0:
X = np.hstack([X, ohe.fit_transform(df[categories])])
X_train, X_test, y_train, y_test = model_selection.train_test_split(
X, y, test_size=test_size, random_state=42
)
return X_train, X_test, y_train, y_test
As we proceed, our goal will be to see which variables are most important for predicting the COMPAS scores.
As we estimate these models, one of our metrics for success will be mean absolute error (MAE).
def fit_and_report_maes(mod, X_train, X_test, y_train, y_test, y_transform=None, y_inv_transform=None):
if y_transform is not None:
mod.fit(X_train, y_transform(y_train))
else:
mod.fit(X_train, y_train)
yhat_train = mod.predict(X_train)
yhat_test = mod.predict(X_test)
if y_transform is not None:
yhat_train = y_inv_transform(yhat_train)
yhat_test = y_inv_transform(yhat_test)
return dict(
mae_train=metrics.mean_absolute_error(y_train, yhat_train),
mae_test=metrics.mean_absolute_error(y_test, yhat_test)
)
Let’s begin with a simple linear model which uses just prior arrests.
X_train, X_test, y_train, y_test = prep_data(
df, ["priors_count"], [], "decile_score"
)
fit_and_report_maes(linear_model.LinearRegression(), X_train, X_test, y_train, y_test)
{'mae_train': 2.162527833108664, 'mae_test': 2.191754484529134}
This simple model obtains a MAE of about 2 for both the test data and training data.
This means, on average, that our model can predict the COMPAS score (which ranges from 1-10) within about 2 points.
While the MAE is about 2, knowing what the errors on our prediction model look like is often very useful.
Below, we create a histogram which shows the distribution of these errors. In our case, we take the difference between predicted value and actual value, so a positive value means that we overpredicted the COMPAS score and a negative value means we underpredicted it.
lr_model = linear_model.LinearRegression()
lr_model.fit(X_train, y_train)
yhat_train = lr_model.predict(X_train)
yhat_test = lr_model.predict(X_test)
fig, ax = plt.subplots(1, 2, figsize=(12, 4), sharey="all")
ax[0].hist(yhat_train - y_train, density=True)
ax[0].set_title("Training Data")
ax[1].hist(yhat_test - y_test, density=True)
ax[1].set_title("Test Data")
Text(0.5, 1.0, 'Test Data')

In both cases, the long left tails of errors suggest the existence of relevant features which would improve our model.
The first thing we might consider investigating is whether there are non-linearities in how the number of priors enters the COMPAS score.
First, we try using polynomial features in our exogenous variables.
X_train, X_test, y_train, y_test = prep_data(
df, ["priors_count"], [], "decile_score"
)
# Transform data to quadratic
pf = preprocessing.PolynomialFeatures(2, include_bias=False)
X_train = pf.fit_transform(X_train)
X_test = pf.fit_transform(X_test)
fit_and_report_maes(linear_model.LinearRegression(), X_train, X_test, y_train, y_test)
{'mae_train': 2.120405801755292, 'mae_test': 2.1179838134597335}
We don’t see a very significant increase in performance, so we also try using log on the endogenous variables.
X_train, X_test, y_train, y_test = prep_data(
df, ["priors_count"], [], "decile_score"
)
fit_and_report_maes(
linear_model.LinearRegression(), X_train, X_test, y_train, y_test,
y_transform=np.log, y_inv_transform=np.exp
)
{'mae_train': 2.2550821558610115, 'mae_test': 2.3332184125647917}
Still no improvement… The next natural thing is to add more features to our regression.
X_train, X_test, y_train, y_test = prep_data(
df, ["priors_count"], ["age_cat", "race", "sex"], "decile_score"
)
fit_and_report_maes(linear_model.LinearRegression(), X_train, X_test, y_train, y_test)
/usr/share/miniconda3/envs/lecture-datascience/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py:808: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.
warnings.warn(
{'mae_train': 1.8076563650603426, 'mae_test': 1.8277010173497898}
By allowing for indicator variables on age, race, and sex, we are able to slightly improve the MAE. The errors also seem to have a less extreme tail.
X_train, X_test, y_train, y_test = prep_data(
df, ["priors_count"], ["age_cat", "race", "sex"], "decile_score"
)
lr_model = linear_model.LinearRegression()
lr_model.fit(X_train, y_train)
yhat_train = lr_model.predict(X_train)
yhat_test = lr_model.predict(X_test)
fig, ax = plt.subplots(1, 2, figsize=(12, 4), sharey="all")
ax[0].hist(yhat_train - y_train, density=True)
ax[0].set_title("Training Data")
ax[1].hist(yhat_test - y_test, density=True)
ax[1].set_title("Test Data")
/usr/share/miniconda3/envs/lecture-datascience/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py:808: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.
warnings.warn(
Text(0.5, 1.0, 'Test Data')

The coefficients are listed below:
names = [
"priors_count", "Less than 25", "25-45", "Greater than 45", "African-American",
"Caucasian", "Hispanic", "Female", "Male"
]
for (_name, _coef) in zip(names, lr_model.coef_):
print(_name, ": ", _coef)
priors_count : 0.27999230496845784
Less than 25 : -0.10262755807946655
25-45 : -1.7189018948181352
Greater than 45 : 1.8215294528975976
African-American : 0.6102522379830796
Caucasian : -0.10156939689442249
Hispanic : -0.5086828410886558
Female : 0.042935824274662374
Male : -0.04293582427466229
What stands out to you about these coefficients?
Exercise
See exercise 1 in the exercise list.
Binary Probability Models#
Binary probability models are used to model “all or nothing” outcomes, like the occurrence of an event.
Their output is the probability that an event of interest occurs.
With this probability in hand, the researcher chooses an acceptable cutoff (perhaps 0.5) above which the event is predicted to occur.
Note
Binary probability models can be thought of as a special case of classification.
In classification, we are given a set of features and asked to predict one of a finite number of discrete labels.
We will learn more about classification in an upcoming lecture!
In our example, we will be interested in how the COMPAS scores do at predicting recidivism and how their ability to predict depends on race or sex.
To assist us in evaluating the performance of various models we will use a new metric called the confusion matrix.
Scikit-learn knows how to compute this metric and also provides a good description of what is computed.
Let’s see what they have to say.
help(metrics.confusion_matrix)
Help on function confusion_matrix in module sklearn.metrics._classification:
confusion_matrix(y_true, y_pred, *, labels=None, sample_weight=None, normalize=None)
Compute confusion matrix to evaluate the accuracy of a classification.
By definition a confusion matrix :math:`C` is such that :math:`C_{i, j}`
is equal to the number of observations known to be in group :math:`i` and
predicted to be in group :math:`j`.
Thus in binary classification, the count of true negatives is
:math:`C_{0,0}`, false negatives is :math:`C_{1,0}`, true positives is
:math:`C_{1,1}` and false positives is :math:`C_{0,1}`.
Read more in the :ref:`User Guide <confusion_matrix>`.
Parameters
----------
y_true : array-like of shape (n_samples,)
Ground truth (correct) target values.
y_pred : array-like of shape (n_samples,)
Estimated targets as returned by a classifier.
labels : array-like of shape (n_classes), default=None
List of labels to index the matrix. This may be used to reorder
or select a subset of labels.
If ``None`` is given, those that appear at least once
in ``y_true`` or ``y_pred`` are used in sorted order.
sample_weight : array-like of shape (n_samples,), default=None
Sample weights.
.. versionadded:: 0.18
normalize : {'true', 'pred', 'all'}, default=None
Normalizes confusion matrix over the true (rows), predicted (columns)
conditions or all the population. If None, confusion matrix will not be
normalized.
Returns
-------
C : ndarray of shape (n_classes, n_classes)
Confusion matrix whose i-th row and j-th
column entry indicates the number of
samples with true label being i-th class
and predicted label being j-th class.
See Also
--------
ConfusionMatrixDisplay.from_estimator : Plot the confusion matrix
given an estimator, the data, and the label.
ConfusionMatrixDisplay.from_predictions : Plot the confusion matrix
given the true and predicted labels.
ConfusionMatrixDisplay : Confusion Matrix visualization.
References
----------
.. [1] `Wikipedia entry for the Confusion matrix
<https://en.wikipedia.org/wiki/Confusion_matrix>`_
(Wikipedia and other references may use a different
convention for axes).
Examples
--------
>>> from sklearn.metrics import confusion_matrix
>>> y_true = [2, 0, 2, 2, 0, 1]
>>> y_pred = [0, 0, 2, 2, 0, 2]
>>> confusion_matrix(y_true, y_pred)
array([[2, 0, 0],
[0, 0, 1],
[1, 0, 2]])
>>> y_true = ["cat", "ant", "cat", "cat", "ant", "bird"]
>>> y_pred = ["ant", "ant", "cat", "cat", "ant", "cat"]
>>> confusion_matrix(y_true, y_pred, labels=["ant", "bird", "cat"])
array([[2, 0, 0],
[0, 0, 1],
[1, 0, 2]])
In the binary case, we can extract true positives, etc as follows:
>>> tn, fp, fn, tp = confusion_matrix([0, 1, 0, 1], [1, 1, 1, 0]).ravel()
>>> (tn, fp, fn, tp)
(0, 2, 1, 1)
def report_cm(mod, X_train, X_test, y_train, y_test):
return dict(
cm_train=metrics.confusion_matrix(y_train, mod.predict(X_train)),
cm_test=metrics.confusion_matrix(y_test, mod.predict(X_test))
)
We will start by using logistic regression using only decile_score
as a feature and then examine how the confusion matrices differ by
race and sex.
from patsy import dmatrices
groups = [
"overall", "African-American", "Caucasian", "Hispanic", "Female", "Male"
]
ind = [
"Portion_of_NoRecid_and_LowRisk", "Portion_of_Recid_and_LowRisk",
"Portion_of_NoRecid_and_HighRisk", "Portion_of_Recid_and_HighRisk"
]
fmla = "two_year_recid ~ C(decile_score)"
y,X = dmatrices(fmla, df)
X_train, X_test, y_train, y_test, df_train, df_test = model_selection.train_test_split(
X,y.reshape(-1),df, test_size=0.25, random_state=42
)
decile_mod = linear_model.LogisticRegression(solver="lbfgs").fit(X_train,y_train)
def cm_tables(pred, y, df):
output = pd.DataFrame(index=ind, columns=groups)
for group in groups:
if group in ["African-American", "Caucasian", "Hispanic"]:
subset=(df.race==group)
elif group in ["Female", "Male"]:
subset=(df.sex==group)
else:
subset=np.full(y.shape, True)
y_sub = y[subset]
pred_sub = pred[subset]
cm = metrics.confusion_matrix(y_sub, pred_sub)
# Compute fraction for which the guess is correct
total = cm.sum()
vals = np.array(cm/total)
output.loc[:, group] = vals.reshape(-1)
def cond_probs(col, axis):
d=int(np.sqrt(len(col)))
pcm = np.array(col).reshape(d,d)
pcm = pcm/pcm.sum(axis=axis, keepdims=True)
return(pcm.reshape(-1))
given_outcome = output.copy()
given_outcome.index = ["P(LowRisk|NoRecid)","P(HighRisk|NoRecid)","P(LowRisk|Recid)","P(HighRisk|Recid)"]
given_outcome=given_outcome.apply(lambda c: cond_probs(c,1))
given_pred = output.copy()
given_pred.index = ["P(NoRecid|LowRisk)","P(NoRecid|HighRisk)","P(Recid|LowRisk)","P(Recid|HighRisk)"]
given_pred=given_pred.apply(lambda c: cond_probs(c,0))
return(output,given_outcome, given_pred)
output, given_outcome, given_pred =cm_tables(decile_mod.predict(X_test),
y_test, df_test)
output
/tmp/ipykernel_3647/1996610596.py:38: FutureWarning: In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
output.loc[:, group] = vals.reshape(-1)
/tmp/ipykernel_3647/1996610596.py:38: FutureWarning: In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
output.loc[:, group] = vals.reshape(-1)
/tmp/ipykernel_3647/1996610596.py:38: FutureWarning: In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
output.loc[:, group] = vals.reshape(-1)
/tmp/ipykernel_3647/1996610596.py:38: FutureWarning: In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
output.loc[:, group] = vals.reshape(-1)
/tmp/ipykernel_3647/1996610596.py:38: FutureWarning: In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
output.loc[:, group] = vals.reshape(-1)
/tmp/ipykernel_3647/1996610596.py:38: FutureWarning: In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
output.loc[:, group] = vals.reshape(-1)
overall | African-American | Caucasian | Hispanic | Female | Male | |
---|---|---|---|---|---|---|
Portion_of_NoRecid_and_LowRisk | 0.361815 | 0.263270 | 0.475000 | 0.522581 | 0.438040 | 0.342222 |
Portion_of_Recid_and_LowRisk | 0.191514 | 0.230361 | 0.136667 | 0.167742 | 0.207493 | 0.187407 |
Portion_of_NoRecid_and_HighRisk | 0.152033 | 0.140127 | 0.168333 | 0.161290 | 0.141210 | 0.154815 |
Portion_of_Recid_and_HighRisk | 0.294638 | 0.366242 | 0.220000 | 0.148387 | 0.213256 | 0.315556 |
output
contains information on the percent of true negatives, false negatives, false positives,
and true positives.
What do you see?
The joint probabilities (of prediction and outcome given race or sex) in the above table are a bit hard to interpret.
Conditional probabilities can be easier to think about.
Let’s look at the probability of outcomes given the prediction as well as race or sex.
given_pred
overall | African-American | Caucasian | Hispanic | Female | Male | |
---|---|---|---|---|---|---|
P(NoRecid|LowRisk) | 0.704128 | 0.652632 | 0.738342 | 0.764151 | 0.756219 | 0.688525 |
P(NoRecid|HighRisk) | 0.393939 | 0.386121 | 0.383178 | 0.530612 | 0.493151 | 0.372607 |
P(Recid|LowRisk) | 0.295872 | 0.347368 | 0.261658 | 0.235849 | 0.243781 | 0.311475 |
P(Recid|HighRisk) | 0.606061 | 0.613879 | 0.616822 | 0.469388 | 0.506849 | 0.627393 |
As you can see, the distribution of outcomes conditional on predictions does not vary too much with race.
Moreover, if anything, it discriminates in favor of African-Americans.
The algorithm does appear to overpredict recidivism for women compared to men.
This is an important concern.
We will not discuss it too much though because (1) we will see below that when fairness is looked at in another way, women are favored over men, and (2) the company that produces COMPAS also produces a separate questionnaire and risk score designed only for women.
False Positive and Negative Rates#
What if we flip this around and look at the distributions of predictions conditional on outcomes?
Why look at these probabilities?
One reason is that in law, it’s traditionally far worse to punish innocents than let the guilty free. This idea goes at least back to 1760 and Blackstone’s ratio.
It is better that ten guilty persons escape than that one innocent suffer. -William Blackstone
Blackstone’s ratio says that we should be particularly concerned about P(HighRisk | NoRecid).
This probability is also called the false positive rate.
given_outcome
overall | African-American | Caucasian | Hispanic | Female | Male | |
---|---|---|---|---|---|---|
P(LowRisk|NoRecid) | 0.653887 | 0.533333 | 0.776567 | 0.757009 | 0.678571 | 0.646154 |
P(HighRisk|NoRecid) | 0.346113 | 0.466667 | 0.223433 | 0.242991 | 0.321429 | 0.353846 |
P(LowRisk|Recid) | 0.340369 | 0.276730 | 0.433476 | 0.520833 | 0.398374 | 0.329134 |
P(HighRisk|Recid) | 0.659631 | 0.723270 | 0.566524 | 0.479167 | 0.601626 | 0.670866 |
Now we see some large disparities by race in the false positive rate (and false negative rate). This is one of the main findings of the Pro Publica article.
In response to Pro Publica, Northpointe, the company that produces COMPAS, argued that COMPAS is not biased because the probabilities of outcomes conditional on predictions (like P(NoRecid|LowRisk)) are approximately equal across races [recidDMB16].
Following [recidKLL+17], we will call a prediction algorithm with this property well-calibrated.
Being well-calibrated is one criteria for fairness of a prediction algorithm.
Pro Publica’s critique focuses on a different criteria – that the the probability of predicted categories conditional on true outcomes should be equal across groups (i.e. P(HighRisk|NoRecid) should be equal across races).
[recidKLL+17] calls a prediction algorithm with this property balanced.
Visualizing Calibration and Balance#
We can get a slightly more detailed look at calibration and balance by recognizing that prediction algorithms typically compute a predicted probability, not just a discrete predicted outcome.
The predicted outcome will typically be assigned to the category with the highest predicted probability.
We can examine calibration graphically by plotting the P(recidivism | predicted probability)
import scipy
def calibration_plot(pred, y, df, bins=20):
fig,ax = plt.subplots(3,2, figsize=(12,6), sharey=True, sharex=True)
for (g,group) in enumerate(groups):
if group in ["African-American", "Caucasian", "Hispanic"]:
subset=(df.race==group)
elif group in ["Female", "Male"]:
subset=(df.sex==group)
else:
subset=np.full(y.shape,True)
_ax = ax[np.unravel_index(g, ax.shape)]
y_sub = y[subset]
pred_sub = pred[subset]
mu, edges, n=scipy.stats.binned_statistic(pred_sub,y_sub,'mean',bins=bins)
se, edges,n=scipy.stats.binned_statistic(pred_sub,y_sub,
lambda x: np.std(x)/np.sqrt(len(x)),bins=bins)
midpts = (edges[0:-1]+edges[1:])/2
_ax.errorbar(midpts, mu, yerr=1.64*se, fmt='o')
_ax.set_title(group)
_ax.set_ylabel("Observed recidivism")
_ax.set_xlabel("Predicted P(recidivism)")
x = np.linspace(*_ax.get_xlim())
_ax.plot(x, x)
_ax.set_xlim(0.0,1.0)
fig.tight_layout()
return(fig,ax)
calibration_plot(decile_mod.predict_proba(X_test)[:,1],
df_test["two_year_recid"],
df_test);

This figure is one way to visualize how well-calibrated these predictions are.
The dots are binned averages of observed recidivism, conditional on predicted recidivism being in some range.
The error bars represent a 90% confidence interval.
A perfectly calibrated prediction would have these dots all lie along the 45 degree line.
For dots below the 45 degree line, the algorithm is overpredicting recidivism.
Exercise
See exercise 2 in the exercise list.
The algorithm appears fairly well-calibrated.
It does not seem to be making systematic errors in one direction based on any particular race– but it does appear to be systematic overprediction for females compared to males.
Now, let’s create a figure to examine balance.
Balance is about the distribution of predictions conditional on outcomes, so we will plot histograms of predicted probabilities conditional on realized outcomes.
import seaborn as sns
def balance_hist_plot(pred, y, df, bins=20):
fig,ax = plt.subplots(3,2, figsize=(12,6), sharey=True, sharex=True)
for (g,group) in enumerate(groups):
if group in ["African-American", "Caucasian", "Hispanic"]:
subset=(df.race==group)
elif group in ["Female", "Male"]:
subset=(df.sex==group)
else:
subset=np.full(y.shape,True)
_ax = ax[np.unravel_index(g, ax.shape)]
y_sub = y[subset]
pred_sub = pred[subset]
sns.distplot(pred_sub[y_sub==0], hist=True, bins=bins, kde=False, ax=_ax,
label="No recidivate", norm_hist=True, axlabel="Predicted Probability")
sns.distplot(pred_sub[y_sub==1], hist=True, bins=bins, kde=False, ax=_ax,
label="Yes recidivate", norm_hist=True, axlabel="Predicted Probability")
_ax.set_title(group)
plt.legend()
fig.tight_layout()
return(fig,ax)
balance_hist_plot(decile_mod.predict_proba(X_test)[:,1],
df_test["two_year_recid"],
df_test);
/tmp/ipykernel_3647/1257276985.py:14: UserWarning:
`distplot` is a deprecated function and will be removed in seaborn v0.14.0.
Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `histplot` (an axes-level function for histograms).
For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751
sns.distplot(pred_sub[y_sub==0], hist=True, bins=bins, kde=False, ax=_ax,
/tmp/ipykernel_3647/1257276985.py:16: UserWarning:
`distplot` is a deprecated function and will be removed in seaborn v0.14.0.
Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `histplot` (an axes-level function for histograms).
For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751
sns.distplot(pred_sub[y_sub==1], hist=True, bins=bins, kde=False, ax=_ax,

This figure is somewhat useful, but not for depicting balance especially clearly, so let’s try something else.
To get false positive and false negative rates, we must assign the predicted probabilities to outcomes.
The most common choice would be to predict recidivism if the predicted probability is greater than 0.5.
However, if we want to adjust the false positive and false negative rates, we might want to choose some other threshold and predict recidivism if the predicted probability exceeds this threshold.
Different thresholds will lead to different false negative and false positive rates, so let’s plot these rates as functions of the threshold.
def balance_threshold_plot(pred, y, df, bins=20):
fig,ax = plt.subplots(2,2, figsize=(12,6), sharey=True,
sharex=True)
x = np.linspace(min(pred), max(pred), bins)
# get colors defined by theme
colors=plt.rcParams["axes.prop_cycle"].by_key()["color"]
for (g, group) in enumerate(groups):
if group in ["African-American", "Caucasian", "Hispanic"]:
subset=(df.race==group)
r = 0
elif group in ["Female", "Male"]:
subset=(df.sex==group)
r = 1
else:
continue
y_sub = y[subset]
pred_sub = pred[subset]
_ax = ax[r,0]
fn = np.array([np.mean(pred_sub[y_sub==1]<xi) for xi in x])
c1 = sum(y_sub==1)
sen = np.sqrt(fn*(1-fn)/c1)
fp = np.array([np.mean(pred_sub[y_sub==0]>xi) for xi in x])
c0 = sum(y_sub==0)
sep = np.sqrt(fp*(1-fp)/c0)
p=_ax.plot(x, fn, color=colors[g])
_ax.fill_between(x, fn-1.64*sen, fn+1.64*sen, alpha=0.25, color=colors[g])
_ax.annotate(group, (x[bins//7*g], fn[bins//7*g]), color=colors[g])
_ax.set_ylabel("False +/- Rate")
_ax.set_xlabel("Threshold")
_ax.set_title("False Negative Rate")
_ax = ax[r,1]
p=_ax.plot(x, fp, color=colors[g])
_ax.fill_between(x, fp-1.64*sep, fp+1.64*sep, alpha=0.25, color=colors[g])
_ax.set_xlabel("Threshold")
_ax.set_title("False Positive Rate")
fig.tight_layout()
return(fig,ax)
balance_threshold_plot(decile_mod.predict_proba(X_test)[:,1],
df_test["two_year_recid"],
df_test);

From this, we can more easily see the balance problem — regardless of which threshold we choose, African-Americans will have a higher false positive rate than Caucasians.
We have seen that COMPAS scores are well-calibrated conditional on race, but not balanced.
Can we create an alternative prediction that is both well-calibrated and balanced?
Creating an Alternative Prediction#
As a starting exercise, let’s predict recidivism using the variables in this dataset other than race and COMPAS score.
Almost all variables in this data are categorical.
Any function of categorical variables can be represented as a linear function of indicator variables and their interactions.
Given that linearity in indicators does not impose any substantiative restriction here, a penalized linear model like lasso seems like a good choice for prediction.
To keep the computation time reasonable, we do not include all interaction and indicator terms here.
To ensure that predicted probabilities are between 0 and 1, we fit a logistic regression with an \(\ell-1\) penalty.
from sklearn import model_selection, linear_model
from patsy import dmatrices
# charge_desc has many values with one observations, we will
# combine these descriptions into a single "other." This could
# be improved upon by looking at the text of descriptions and
# combining.
df.c_charge_desc = df.c_charge_desc.fillna("")
df["charge_cat"] = df.c_charge_desc
cnt = df.c_charge_desc.value_counts()[df.c_charge_desc]
cnt.index = df.index
df.loc[cnt<10,"charge_cat"] = "other"
df.charge_cat = df.charge_cat.astype('category')
df.sex = df.sex.astype('category')
fmla = "two_year_recid ~ sex*(age + juv_fel_count + juv_misd_count + juv_other_count + C(priors_count) + c_charge_degree + charge_cat)"
y,X = dmatrices(fmla, df)
print("There are {} features".format(X.shape[1]))
X_train, X_test, y_train, y_test, df_train, df_test = model_selection.train_test_split(
X,pd.Series(y.reshape(-1),index=df.index),df, test_size=0.25, random_state=42
)
lasso_mod=linear_model.LogisticRegressionCV(cv=5,verbose=True,
Cs=10, penalty='l1',
max_iter=100,
scoring="neg_log_loss",
solver="liblinear").fit(X_train, y_train)
There are 260 features
[LibLinear]iter 1 #CD cycles 1
=========================
optimization finished, #iter = 1
Objective value = 0.282248
#nonzeros/#features = 1/261
[LibLinear]iter 1 #CD cycles 1
iter 2 #CD cycles 1
=========================
optimization finished, #iter = 2
Objective value = 2.163071
#nonzeros/#features = 1/261
[LibLinear]iter 1 #CD cycles 1
iter 2 #CD cycles 1
iter 3 #CD cycles 3
iter 4 #CD cycles 1
iter 5 #CD cycles 13
iter 6 #CD cycles 1
iter 7 #CD cycles 6
iter 8 #CD cycles 1
iter 9 #CD cycles 8
iter 10 #CD cycles 4
iter 11 #CD cycles 1
iter 12 #CD cycles 3
iter 13 #CD cycles 1
iter 14 #CD cycles 11
=========================
optimization finished, #iter = 14
Objective value = 16.656307
#nonzeros/#features = 2/261
[LibLinear]iter 1 #CD cycles 1
iter 2 #CD cycles 3
iter 3 #CD cycles 1
iter 4 #CD cycles 8
iter 5 #CD cycles 1
iter 6 #CD cycles 6
iter 7 #CD cycles 1
iter 8 #CD cycles 2
iter 9 #CD cycles 9
iter 10 #CD cycles 1
iter 11 #CD cycles 24
iter 12 #CD cycles 19
iter 13 #CD cycles 6
iter 14 #CD cycles 4
iter 15 #CD cycles 6
iter 16 #CD cycles 9
iter 17 #CD cycles 7
iter 18 #CD cycles 13
iter 19 #CD cycles 4
iter 20 #CD cycles 5
iter 21 #CD cycles 4
iter 22 #CD cycles 10
iter 23 #CD cycles 11
iter 24 #CD cycles 5
iter 25 #CD cycles 1
iter 26 #CD cycles 13
iter 27 #CD cycles 23
iter 28 #CD cycles 3
iter 29 #CD cycles 1
iter 30 #CD cycles 155
iter 31 #CD cycles 100
iter 32 #CD cycles 2
iter 33 #CD cycles 7
iter 34 #CD cycles 37
iter 35 #CD cycles 7
iter 36 #CD cycles 24
iter 37 #CD cycles 22
iter 38 #CD cycles 6
iter 39 #CD cycles 1
iter 40 #CD cycles 37
iter 41 #CD cycles 52
=========================
optimization finished, #iter = 41
Objective value = 125.461771
#nonzeros/#features = 14/261
[LibLinear]iter 1 #CD cycles 1
iter 2 #CD cycles 3
iter 3 #CD cycles 1
iter 4 #CD cycles 7
iter 5 #CD cycles 1
iter 6 #CD cycles 13
iter 7 #CD cycles 2
iter 8 #CD cycles 1
iter 9 #CD cycles 3
iter 10 #CD cycles 10
iter 11 #CD cycles 1
iter 12 #CD cycles 168
iter 13 #CD cycles 1
iter 14 #CD cycles 39
iter 15 #CD cycles 1
iter 16 #CD cycles 53
iter 17 #CD cycles 2
iter 18 #CD cycles 2
iter 19 #CD cycles 7
=========================
optimization finished, #iter = 19
Objective value = 918.402898
#nonzeros/#features = 72/261
[LibLinear]iter 1 #CD cycles 1
iter 2 #CD cycles 2
iter 3 #CD cycles 2
iter 4 #CD cycles 1
iter 5 #CD cycles 6
iter 6 #CD cycles 1
iter 7 #CD cycles 8
iter 8 #CD cycles 6
iter 9 #CD cycles 1
iter 10 #CD cycles 8
iter 11 #CD cycles 2
iter 12 #CD cycles 1
iter 13 #CD cycles 44
iter 14 #CD cycles 3
iter 15 #CD cycles 1
iter 16 #CD cycles 293
iter 17 #CD cycles 6
iter 18 #CD cycles 1
iter 19 #CD cycles 110
iter 20 #CD cycles 50
iter 21 #CD cycles 61
iter 22 #CD cycles 17
iter 23 #CD cycles 21
iter 24 #CD cycles 8
iter 25 #CD cycles 2
iter 26 #CD cycles 48
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
iter 27 #CD cycles 16
iter 28 #CD cycles 6
iter 29 #CD cycles 7
iter 30 #CD cycles 9
iter 31 #CD cycles 3
iter 32 #CD cycles 5
iter 33 #CD cycles 2
iter 34 #CD cycles 7
iter 35 #CD cycles 3
iter 36 #CD cycles 3
iter 37 #CD cycles 4
iter 38 #CD cycles 1
iter 39 #CD cycles 122
=========================
optimization finished, #iter = 39
Objective value = 6675.623143
#nonzeros/#features = 180/261
[LibLinear]iter 1 #CD cycles 1
iter 2 #CD cycles 3
iter 3 #CD cycles 1
iter 4 #CD cycles 9
iter 5 #CD cycles 1
iter 6 #CD cycles 5
iter 7 #CD cycles 3
iter 8 #CD cycles 1
iter 9 #CD cycles 25
iter 10 #CD cycles 1
iter 11 #CD cycles 87
iter 12 #CD cycles 35
iter 13 #CD cycles 1
iter 14 #CD cycles 124
iter 15 #CD cycles 23
iter 16 #CD cycles 24
iter 17 #CD cycles 8
iter 18 #CD cycles 5
iter 19 #CD cycles 1
iter 20 #CD cycles 96
iter 21 #CD cycles 4
iter 22 #CD cycles 2
iter 23 #CD cycles 2
=========================
optimization finished, #iter = 23
Objective value = 50324.830494
#nonzeros/#features = 245/261
[LibLinear]iter 1 #CD cycles 1
iter 2 #CD cycles 4
iter 3 #CD cycles 1
iter 4 #CD cycles 6
iter 5 #CD cycles 2
iter 6 #CD cycles 2
iter 7 #CD cycles 1
iter 8 #CD cycles 7
iter 9 #CD cycles 1
iter 10 #CD cycles 7
iter 11 #CD cycles 5
iter 12 #CD cycles 1
iter 13 #CD cycles 198
iter 14 #CD cycles 11
iter 15 #CD cycles 1
iter 16 #CD cycles 219
iter 17 #CD cycles 44
iter 18 #CD cycles 3
iter 19 #CD cycles 21
iter 20 #CD cycles 1
iter 21 #CD cycles 772
iter 22 #CD cycles 754
iter 23 #CD cycles 234
iter 24 #CD cycles 1
WARNING: reaching max number of inner iterations
iter 25 #CD cycles 1000
iter 26 #CD cycles 59
=========================
optimization finished, #iter = 26
Objective value = 386592.521312
#nonzeros/#features = 250/261
[LibLinear]iter 1 #CD cycles 1
iter 2 #CD cycles 4
iter 3 #CD cycles 1
iter 4 #CD cycles 9
iter 5 #CD cycles 1
iter 6 #CD cycles 9
iter 7 #CD cycles 1
iter 8 #CD cycles 19
iter 9 #CD cycles 1
iter 10 #CD cycles 55
iter 11 #CD cycles 51
iter 12 #CD cycles 4
iter 13 #CD cycles 12
iter 14 #CD cycles 13
iter 15 #CD cycles 5
iter 16 #CD cycles 5
iter 17 #CD cycles 3
iter 18 #CD cycles 6
iter 19 #CD cycles 5
iter 20 #CD cycles 4
iter 21 #CD cycles 6
iter 22 #CD cycles 2
iter 23 #CD cycles 15
iter 24 #CD cycles 1
iter 25 #CD cycles 274
iter 26 #CD cycles 1
WARNING: reaching max number of inner iterations
iter 27 #CD cycles 1000
WARNING: reaching max number of inner iterations
iter 28 #CD cycles 1000
iter 29 #CD cycles 423
iter 30 #CD cycles 1
WARNING: reaching max number of inner iterations
iter 31 #CD cycles 1000
=========================
optimization finished, #iter = 31
Objective value = 2988413.358125
#nonzeros/#features = 246/261
[LibLinear]iter 1 #CD cycles 1
iter 2 #CD cycles 1
iter 3 #CD cycles 12
iter 4 #CD cycles 2
iter 5 #CD cycles 1
iter 6 #CD cycles 7
iter 7 #CD cycles 3
iter 8 #CD cycles 1
iter 9 #CD cycles 13
iter 10 #CD cycles 6
iter 11 #CD cycles 1
iter 12 #CD cycles 104
iter 13 #CD cycles 4
iter 14 #CD cycles 8
iter 15 #CD cycles 17
iter 16 #CD cycles 1
iter 17 #CD cycles 236
iter 18 #CD cycles 6
iter 19 #CD cycles 58
iter 20 #CD cycles 32
iter 21 #CD cycles 18
iter 22 #CD cycles 9
iter 23 #CD cycles 7
iter 24 #CD cycles 11
iter 25 #CD cycles 3
iter 26 #CD cycles 3
iter 27 #CD cycles 1
WARNING: reaching max number of inner iterations
iter 28 #CD cycles 1000
WARNING: reaching max number of inner iterations
iter 29 #CD cycles 1000
iter 30 #CD cycles 57
iter 31 #CD cycles 3
iter 32 #CD cycles 3
iter 33 #CD cycles 61
iter 34 #CD cycles 11
iter 35 #CD cycles 26
iter 36 #CD cycles 17
iter 37 #CD cycles 14
=========================
optimization finished, #iter = 37
Objective value = 23132733.741303
#nonzeros/#features = 249/261
[LibLinear]iter 1 #CD cycles 1
=========================
optimization finished, #iter = 1
Objective value = 0.282249
#nonzeros/#features = 1/261
[LibLinear]iter 1 #CD cycles 1
iter 2 #CD cycles 3
iter 3 #CD cycles 1
iter 4 #CD cycles 1
iter 5 #CD cycles 1
iter 6 #CD cycles 1
iter 7 #CD cycles 6
iter 8 #CD cycles 2
iter 9 #CD cycles 1
iter 10 #CD cycles 5
iter 11 #CD cycles 2
iter 12 #CD cycles 2
iter 13 #CD cycles 1
iter 14 #CD cycles 6
=========================
optimization finished, #iter = 14
Objective value = 2.163544
#nonzeros/#features = 2/261
[LibLinear]iter 1 #CD cycles 1
iter 2 #CD cycles 1
iter 3 #CD cycles 9
iter 4 #CD cycles 1
iter 5 #CD cycles 5
iter 6 #CD cycles 4
iter 7 #CD cycles 1
iter 8 #CD cycles 5
iter 9 #CD cycles 1
iter 10 #CD cycles 7
iter 11 #CD cycles 2
iter 12 #CD cycles 1
iter 13 #CD cycles 9
iter 14 #CD cycles 1
iter 15 #CD cycles 5
=========================
optimization finished, #iter = 15
Objective value = 16.642208
#nonzeros/#features = 2/261
[LibLinear]iter 1 #CD cycles 1
iter 2 #CD cycles 3
iter 3 #CD cycles 1
iter 4 #CD cycles 10
iter 5 #CD cycles 1
iter 6 #CD cycles 4
iter 7 #CD cycles 1
iter 8 #CD cycles 13
iter 9 #CD cycles 1
iter 10 #CD cycles 5
iter 11 #CD cycles 1
iter 12 #CD cycles 14
iter 13 #CD cycles 2
iter 14 #CD cycles 2
iter 15 #CD cycles 1
iter 16 #CD cycles 17
iter 17 #CD cycles 13
iter 18 #CD cycles 16
iter 19 #CD cycles 14
iter 20 #CD cycles 11
iter 21 #CD cycles 8
iter 22 #CD cycles 1
iter 23 #CD cycles 52
=========================
optimization finished, #iter = 23
Objective value = 125.892096
#nonzeros/#features = 12/261
[LibLinear]iter 1 #CD cycles 1
iter 2 #CD cycles 3
iter 3 #CD cycles 1
iter 4 #CD cycles 6
iter 5 #CD cycles 3
iter 6 #CD cycles 1
iter 7 #CD cycles 10
iter 8 #CD cycles 1
iter 9 #CD cycles 16
iter 10 #CD cycles 1
iter 11 #CD cycles 1
iter 12 #CD cycles 229
iter 13 #CD cycles 1
iter 14 #CD cycles 10
iter 15 #CD cycles 10
iter 16 #CD cycles 3
iter 17 #CD cycles 6
iter 18 #CD cycles 2
iter 19 #CD cycles 3
=========================
optimization finished, #iter = 19
Objective value = 924.400399
#nonzeros/#features = 68/261
[LibLinear]iter 1 #CD cycles 1
iter 2 #CD cycles 2
iter 3 #CD cycles 1
iter 4 #CD cycles 11
iter 5 #CD cycles 1
iter 6 #CD cycles 12
iter 7 #CD cycles 1
iter 8 #CD cycles 17
iter 9 #CD cycles 6
iter 10 #CD cycles 1
iter 11 #CD cycles 29
iter 12 #CD cycles 5
iter 13 #CD cycles 1
iter 14 #CD cycles 58
iter 15 #CD cycles 1
iter 16 #CD cycles 6
iter 17 #CD cycles 42
iter 18 #CD cycles 17
iter 19 #CD cycles 5
iter 20 #CD cycles 159
iter 21 #CD cycles 2
iter 22 #CD cycles 3
iter 23 #CD cycles 43
iter 24 #CD cycles 4
iter 25 #CD cycles 5
iter 26 #CD cycles 1
iter 27 #CD cycles 65
iter 28 #CD cycles 134
=========================
optimization finished, #iter = 28
Objective value = 6720.449048
#nonzeros/#features = 188/261
[LibLinear]iter 1 #CD cycles 1
iter 2 #CD cycles 3
iter 3 #CD cycles 1
iter 4 #CD cycles 6
iter 5 #CD cycles 2
iter 6 #CD cycles 2
iter 7 #CD cycles 1
iter 8 #CD cycles 11
iter 9 #CD cycles 1
iter 10 #CD cycles 29
iter 11 #CD cycles 9
iter 12 #CD cycles 13
iter 13 #CD cycles 7
iter 14 #CD cycles 1
iter 15 #CD cycles 66
iter 16 #CD cycles 36
iter 17 #CD cycles 3
iter 18 #CD cycles 1
iter 19 #CD cycles 112
iter 20 #CD cycles 85
iter 21 #CD cycles 1
iter 22 #CD cycles 135
iter 23 #CD cycles 33
=========================
optimization finished, #iter = 23
Objective value = 50693.773324
#nonzeros/#features = 240/261
[LibLinear]iter 1 #CD cycles 1
iter 2 #CD cycles 4
iter 3 #CD cycles 1
iter 4 #CD cycles 8
iter 5 #CD cycles 1
iter 6 #CD cycles 7
iter 7 #CD cycles 3
iter 8 #CD cycles 2
iter 9 #CD cycles 1
iter 10 #CD cycles 12
iter 11 #CD cycles 7
iter 12 #CD cycles 4
iter 13 #CD cycles 3
iter 14 #CD cycles 2
iter 15 #CD cycles 6
iter 16 #CD cycles 2
iter 17 #CD cycles 1
iter 18 #CD cycles 152
iter 19 #CD cycles 3
iter 20 #CD cycles 27
iter 21 #CD cycles 7
iter 22 #CD cycles 4
iter 23 #CD cycles 13
iter 24 #CD cycles 2
iter 25 #CD cycles 1
iter 26 #CD cycles 148
iter 27 #CD cycles 6
iter 28 #CD cycles 3
iter 29 #CD cycles 5
iter 30 #CD cycles 1
WARNING: reaching max number of inner iterations
iter 31 #CD cycles 1000
iter 32 #CD cycles 308
iter 33 #CD cycles 177
=========================
optimization finished, #iter = 33
Objective value = 389825.250854
#nonzeros/#features = 247/261
[LibLinear]iter 1 #CD cycles 1
iter 2 #CD cycles 3
iter 3 #CD cycles 1
iter 4 #CD cycles 7
iter 5 #CD cycles 1
iter 6 #CD cycles 9
iter 7 #CD cycles 2
iter 8 #CD cycles 2
iter 9 #CD cycles 3
iter 10 #CD cycles 1
iter 11 #CD cycles 16
iter 12 #CD cycles 12
iter 13 #CD cycles 3
iter 14 #CD cycles 3
iter 15 #CD cycles 6
iter 16 #CD cycles 2
iter 17 #CD cycles 2
iter 18 #CD cycles 2
iter 19 #CD cycles 2
iter 20 #CD cycles 1
iter 21 #CD cycles 73
iter 22 #CD cycles 2
iter 23 #CD cycles 2
iter 24 #CD cycles 2
iter 25 #CD cycles 6
iter 26 #CD cycles 2
iter 27 #CD cycles 16
iter 28 #CD cycles 1
iter 29 #CD cycles 244
iter 30 #CD cycles 1
WARNING: reaching max number of inner iterations
iter 31 #CD cycles 1000
WARNING: reaching max number of inner iterations
iter 32 #CD cycles 1000
iter 33 #CD cycles 71
iter 34 #CD cycles 4
iter 35 #CD cycles 2
iter 36 #CD cycles 493
=========================
optimization finished, #iter = 36
Objective value = 3014294.233231
#nonzeros/#features = 249/261
[LibLinear]iter 1 #CD cycles 1
iter 2 #CD cycles 2
iter 3 #CD cycles 2
iter 4 #CD cycles 1
iter 5 #CD cycles 7
iter 6 #CD cycles 2
iter 7 #CD cycles 1
iter 8 #CD cycles 9
iter 9 #CD cycles 1
iter 10 #CD cycles 15
iter 11 #CD cycles 3
iter 12 #CD cycles 1
iter 13 #CD cycles 74
iter 14 #CD cycles 16
iter 15 #CD cycles 11
iter 16 #CD cycles 21
iter 17 #CD cycles 2
iter 18 #CD cycles 15
iter 19 #CD cycles 1
iter 20 #CD cycles 139
iter 21 #CD cycles 35
iter 22 #CD cycles 33
iter 23 #CD cycles 6
iter 24 #CD cycles 39
iter 25 #CD cycles 2
iter 26 #CD cycles 13
iter 27 #CD cycles 5
iter 28 #CD cycles 2
iter 29 #CD cycles 6
iter 30 #CD cycles 5
iter 31 #CD cycles 6
iter 32 #CD cycles 4
iter 33 #CD cycles 2
iter 34 #CD cycles 13
iter 35 #CD cycles 2
iter 36 #CD cycles 1
WARNING: reaching max number of inner iterations
iter 37 #CD cycles 1000
iter 38 #CD cycles 925
iter 39 #CD cycles 101
iter 40 #CD cycles 12
iter 41 #CD cycles 5
iter 42 #CD cycles 9
iter 43 #CD cycles 4
iter 44 #CD cycles 8
iter 45 #CD cycles 26
iter 46 #CD cycles 5
iter 47 #CD cycles 147
iter 48 #CD cycles 15
iter 49 #CD cycles 104
iter 50 #CD cycles 1
WARNING: reaching max number of inner iterations
iter 51 #CD cycles 1000
WARNING: reaching max number of inner iterations
iter 52 #CD cycles 1000
WARNING: reaching max number of inner iterations
iter 53 #CD cycles 1000
WARNING: reaching max number of inner iterations
iter 54 #CD cycles 1000
=========================
optimization finished, #iter = 54
Objective value = 23332835.709867
#nonzeros/#features = 249/261
[LibLinear]=========================
optimization finished, #iter = 0
Objective value = 0.282250
#nonzeros/#features = 0/261
[LibLinear]iter 1 #CD cycles 1
iter 2 #CD cycles 1
=========================
optimization finished, #iter = 2
Objective value = 2.165028
#nonzeros/#features = 1/261
[LibLinear]iter 1 #CD cycles 1
iter 2 #CD cycles 1
iter 3 #CD cycles 6
iter 4 #CD cycles 4
iter 5 #CD cycles 1
iter 6 #CD cycles 4
iter 7 #CD cycles 3
iter 8 #CD cycles 2
iter 9 #CD cycles 1
iter 10 #CD cycles 3
iter 11 #CD cycles 2
iter 12 #CD cycles 2
iter 13 #CD cycles 1
iter 14 #CD cycles 7
iter 15 #CD cycles 2
iter 16 #CD cycles 1
iter 17 #CD cycles 6
iter 18 #CD cycles 1
iter 19 #CD cycles 5
=========================
optimization finished, #iter = 19
[LibLinear]Objective value = 16.659641
#nonzeros/#features = 2/261
iter 1 #CD cycles 1
iter 2 #CD cycles 3
iter 3 #CD cycles 1
iter 4 #CD cycles 5
iter 5 #CD cycles 3
iter 6 #CD cycles 1
iter 7 #CD cycles 4
iter 8 #CD cycles 2
iter 9 #CD cycles 3
iter 10 #CD cycles 1
iter 11 #CD cycles 5
iter 12 #CD cycles 4
iter 13 #CD cycles 2
iter 14 #CD cycles 1
iter 15 #CD cycles 13
iter 16 #CD cycles 15
iter 17 #CD cycles 3
iter 18 #CD cycles 17
iter 19 #CD cycles 2
iter 20 #CD cycles 3
iter 21 #CD cycles 4
iter 22 #CD cycles 5
iter 23 #CD cycles 1
iter 24 #CD cycles 91
iter 25 #CD cycles 2
iter 26 #CD cycles 23
iter 27 #CD cycles 11
iter 28 #CD cycles 7
iter 29 #CD cycles 19
iter 30 #CD cycles 6
iter 31 #CD cycles 5
iter 32 #CD cycles 6
iter 33 #CD cycles 2
iter 34 #CD cycles 7
iter 35 #CD cycles 17
iter 36 #CD cycles 12
iter 37 #CD cycles 4
iter 38 #CD cycles 5
iter 39 #CD cycles 4
iter 40 #CD cycles 1
iter 41 #CD cycles 82
iter 42 #CD cycles 70
iter 43 #CD cycles 19
iter 44 #CD cycles 9
iter 45 #CD cycles 11
iter 46 #CD cycles 12
iter 47 #CD cycles 3
iter 48 #CD cycles 4
iter 49 #CD cycles 7
iter 50 #CD cycles 8
iter 51 #CD cycles 8
iter 52 #CD cycles 4
iter 53 #CD cycles 4
iter 54 #CD cycles 2
iter 55 #CD cycles 7
iter 56 #CD cycles 2
iter 57 #CD cycles 6
iter 58 #CD cycles 2
iter 59 #CD cycles 6
iter 60 #CD cycles 1
iter 61 #CD cycles 56
=========================
optimization finished, #iter = 61
Objective value = 126.264110
#nonzeros/#features = 13/261
[LibLinear]iter 1 #CD cycles 1
iter 2 #CD cycles 3
iter 3 #CD cycles 1
iter 4 #CD cycles 10
iter 5 #CD cycles 1
iter 6 #CD cycles 10
iter 7 #CD cycles 3
iter 8 #CD cycles 1
iter 9 #CD cycles 15
iter 10 #CD cycles 2
iter 11 #CD cycles 1
iter 12 #CD cycles 11
iter 13 #CD cycles 6
iter 14 #CD cycles 4
iter 15 #CD cycles 2
iter 16 #CD cycles 1
iter 17 #CD cycles 39
iter 18 #CD cycles 6
iter 19 #CD cycles 7
iter 20 #CD cycles 3
iter 21 #CD cycles 1
iter 22 #CD cycles 64
iter 23 #CD cycles 30
iter 24 #CD cycles 26
iter 25 #CD cycles 13
iter 26 #CD cycles 16
iter 27 #CD cycles 6
iter 28 #CD cycles 2
iter 29 #CD cycles 8
iter 30 #CD cycles 3
iter 31 #CD cycles 25
iter 32 #CD cycles 1
iter 33 #CD cycles 123
=========================
optimization finished, #iter = 33
Objective value = 928.734284
#nonzeros/#features = 74/261
[LibLinear]iter 1 #CD cycles 1
iter 2 #CD cycles 3
iter 3 #CD cycles 1
iter 4 #CD cycles 9
iter 5 #CD cycles 1
iter 6 #CD cycles 12
iter 7 #CD cycles 1
iter 8 #CD cycles 68
iter 9 #CD cycles 5
iter 10 #CD cycles 4
iter 11 #CD cycles 1
iter 12 #CD cycles 118
iter 13 #CD cycles 6
iter 14 #CD cycles 4
iter 15 #CD cycles 1
iter 16 #CD cycles 82
iter 17 #CD cycles 90
iter 18 #CD cycles 4
iter 19 #CD cycles 12
iter 20 #CD cycles 3
iter 21 #CD cycles 11
iter 22 #CD cycles 5
iter 23 #CD cycles 3
iter 24 #CD cycles 7
iter 25 #CD cycles 4
iter 26 #CD cycles 4
iter 27 #CD cycles 2
iter 28 #CD cycles 4
iter 29 #CD cycles 1
iter 30 #CD cycles 232
iter 31 #CD cycles 3
iter 32 #CD cycles 30
iter 33 #CD cycles 35
iter 34 #CD cycles 6
iter 35 #CD cycles 9
iter 36 #CD cycles 1
iter 37 #CD cycles 133
=========================
optimization finished, #iter = 37
[LibLinear]Objective value = 6776.722781
#nonzeros/#features = 187/261
iter 1 #CD cycles 1
iter 2 #CD cycles 1
iter 3 #CD cycles 8
iter 4 #CD cycles 2
iter 5 #CD cycles 1
iter 6 #CD cycles 10
iter 7 #CD cycles 5
iter 8 #CD cycles 1
iter 9 #CD cycles 78
iter 10 #CD cycles 8
iter 11 #CD cycles 1
iter 12 #CD cycles 142
iter 13 #CD cycles 38
iter 14 #CD cycles 1
iter 15 #CD cycles 148
iter 16 #CD cycles 88
iter 17 #CD cycles 3
iter 18 #CD cycles 2
iter 19 #CD cycles 15
iter 20 #CD cycles 2
iter 21 #CD cycles 2
iter 22 #CD cycles 3
iter 23 #CD cycles 11
iter 24 #CD cycles 7
iter 25 #CD cycles 3
iter 26 #CD cycles 1
iter 27 #CD cycles 560
iter 28 #CD cycles 1
iter 29 #CD cycles 370
iter 30 #CD cycles 119
iter 31 #CD cycles 211
=========================
optimization finished, #iter = 31
Objective value = 51111.072028
#nonzeros/#features = 244/261
[LibLinear]
iter 1 #CD cycles 1
iter 2 #CD cycles 3
iter 3 #CD cycles 1
iter 4 #CD cycles 7
iter 5 #CD cycles 1
iter 6 #CD cycles 10
iter 7 #CD cycles 1
iter 8 #CD cycles 31
iter 9 #CD cycles 3
iter 10 #CD cycles 5
iter 11 #CD cycles 3
iter 12 #CD cycles 2
iter 13 #CD cycles 2
iter 14 #CD cycles 4
iter 15 #CD cycles 1
iter 16 #CD cycles 160
iter 17 #CD cycles 20
iter 18 #CD cycles 2
iter 19 #CD cycles 11
iter 20 #CD cycles 1
iter 21 #CD cycles 222
iter 22 #CD cycles 15
iter 23 #CD cycles 5
iter 24 #CD cycles 34
iter 25 #CD cycles 3
iter 26 #CD cycles 6
iter 27 #CD cycles 2
iter 28 #CD cycles 2
iter 29 #CD cycles 4
iter 30 #CD cycles 2
iter 31 #CD cycles 1
WARNING: reaching max number of inner iterations
iter 32 #CD cycles 1000
iter 33 #CD cycles 50
iter 34 #CD cycles 244
iter 35 #CD cycles 15
iter 36 #CD cycles 16
iter 37 #CD cycles 211
iter 38 #CD cycles 5
iter 39 #CD cycles 1
WARNING: reaching max number of inner iterations
iter 40 #CD cycles 1000
=========================
optimization finished, #iter = 40
Objective value = 392994.690466
#nonzeros/#features = 250/261
[LibLinear]iter 1 #CD cycles 1
iter 2 #CD cycles 3
iter 3 #CD cycles 1
iter 4 #CD cycles 7
iter 5 #CD cycles 1
iter 6 #CD cycles 10
iter 7 #CD cycles 3
iter 8 #CD cycles 1
iter 9 #CD cycles 27
iter 10 #CD cycles 18
iter 11 #CD cycles 8
iter 12 #CD cycles 5
iter 13 #CD cycles 5
iter 14 #CD cycles 2
iter 15 #CD cycles 7
iter 16 #CD cycles 3
iter 17 #CD cycles 4
iter 18 #CD cycles 1
iter 19 #CD cycles 81
iter 20 #CD cycles 36
iter 21 #CD cycles 10
iter 22 #CD cycles 1
iter 23 #CD cycles 140
iter 24 #CD cycles 59
iter 25 #CD cycles 91
iter 26 #CD cycles 30
iter 27 #CD cycles 52
iter 28 #CD cycles 1
WARNING: reaching max number of inner iterations
iter 29 #CD cycles 1000
WARNING: reaching max number of inner iterations
iter 30 #CD cycles 1000
iter 31 #CD cycles 213
iter 32 #CD cycles 62
iter 33 #CD cycles 2
iter 34 #CD cycles 185
iter 35 #CD cycles 351
=========================
optimization finished, #iter = 35
Objective value = 3038651.748749
#nonzeros/#features = 252/261
[LibLinear]iter 1 #CD cycles 1
iter 2 #CD cycles 3
iter 3 #CD cycles 1
iter 4 #CD cycles 7
iter 5 #CD cycles 2
iter 6 #CD cycles 1
iter 7 #CD cycles 9
iter 8 #CD cycles 1
iter 9 #CD cycles 38
iter 10 #CD cycles 17
iter 11 #CD cycles 21
iter 12 #CD cycles 5
iter 13 #CD cycles 1
iter 14 #CD cycles 28
iter 15 #CD cycles 31
iter 16 #CD cycles 10
iter 17 #CD cycles 14
iter 18 #CD cycles 13
iter 19 #CD cycles 6
iter 20 #CD cycles 6
iter 21 #CD cycles 19
iter 22 #CD cycles 16
iter 23 #CD cycles 6
iter 24 #CD cycles 3
iter 25 #CD cycles 23
iter 26 #CD cycles 5
iter 27 #CD cycles 2
iter 28 #CD cycles 10
iter 29 #CD cycles 1
iter 30 #CD cycles 158
iter 31 #CD cycles 21
iter 32 #CD cycles 65
iter 33 #CD cycles 4
iter 34 #CD cycles 6
iter 35 #CD cycles 14
iter 36 #CD cycles 1
WARNING: reaching max number of inner iterations
iter 37 #CD cycles 1000
iter 38 #CD cycles 814
iter 39 #CD cycles 68
iter 40 #CD cycles 540
iter 41 #CD cycles 49
iter 42 #CD cycles 73
iter 43 #CD cycles 38
iter 44 #CD cycles 206
iter 45 #CD cycles 30
iter 46 #CD cycles 41
iter 47 #CD cycles 9
iter 48 #CD cycles 72
iter 49 #CD cycles 17
iter 50 #CD cycles 25
iter 51 #CD cycles 44
iter 52 #CD cycles 38
iter 53 #CD cycles 9
iter 54 #CD cycles 95
iter 55 #CD cycles 14
iter 56 #CD cycles 2
iter 57 #CD cycles 66
=========================
optimization finished, #iter = 57
Objective value = 23521794.004158
#nonzeros/#features = 252/261
[LibLinear]iter 1 #CD cycles 1
=========================
optimization finished, #iter = 1
Objective value = 0.282249
#nonzeros/#features = 1/261
[LibLinear]iter 1 #CD cycles 1
iter 2 #CD cycles 2
iter 3 #CD cycles 1
iter 4 #CD cycles 1
iter 5 #CD cycles 1
iter 6 #CD cycles 1
iter 7 #CD cycles 3
iter 8 #CD cycles 1
iter 9 #CD cycles 6
iter 10 #CD cycles 1
iter 11 #CD cycles 4
=========================
optimization finished, #iter = 11
Objective value = 2.163670
#nonzeros/#features = 2/261
[LibLinear]iter 1 #CD cycles 1
iter 2 #CD cycles 1
iter 3 #CD cycles 8
iter 4 #CD cycles 1
iter 5 #CD cycles 11
iter 6 #CD cycles 2
iter 7 #CD cycles 1
iter 8 #CD cycles 8
iter 9 #CD cycles 2
iter 10 #CD cycles 1
iter 11 #CD cycles 3
iter 12 #CD cycles 1
iter 13 #CD cycles 7
iter 14 #CD cycles 1
iter 15 #CD cycles 8
iter 16 #CD cycles 1
iter 17 #CD cycles 8
=========================
optimization finished, #iter = 17
Objective value = 16.644976
#nonzeros/#features = 2/261
[LibLinear]iter 1 #CD cycles 1
iter 2 #CD cycles 3
iter 3 #CD cycles 3
iter 4 #CD cycles 1
iter 5 #CD cycles 1
iter 6 #CD cycles 9
iter 7 #CD cycles 3
iter 8 #CD cycles 1
iter 9 #CD cycles 4
iter 10 #CD cycles 6
iter 11 #CD cycles 1
iter 12 #CD cycles 18
iter 13 #CD cycles 2
iter 14 #CD cycles 2
iter 15 #CD cycles 5
iter 16 #CD cycles 1
iter 17 #CD cycles 30
iter 18 #CD cycles 23
iter 19 #CD cycles 12
iter 20 #CD cycles 6
iter 21 #CD cycles 5
iter 22 #CD cycles 10
iter 23 #CD cycles 11
iter 24 #CD cycles 12
iter 25 #CD cycles 4
iter 26 #CD cycles 10
iter 27 #CD cycles 11
iter 28 #CD cycles 4
iter 29 #CD cycles 6
iter 30 #CD cycles 3
iter 31 #CD cycles 9
iter 32 #CD cycles 6
iter 33 #CD cycles 1
iter 34 #CD cycles 40
iter 35 #CD cycles 93
iter 36 #CD cycles 12
iter 37 #CD cycles 9
iter 38 #CD cycles 12
iter 39 #CD cycles 5
iter 40 #CD cycles 14
iter 41 #CD cycles 10
iter 42 #CD cycles 2
iter 43 #CD cycles 1
iter 44 #CD cycles 74
=========================
optimization finished, #iter = 44
Objective value = 125.655760
#nonzeros/#features = 15/261
[LibLinear]iter 1 #CD cycles 1
iter 2 #CD cycles 3
iter 3 #CD cycles 1
iter 4 #CD cycles 4
iter 5 #CD cycles 5
iter 6 #CD cycles 1
iter 7 #CD cycles 19
iter 8 #CD cycles 1
iter 9 #CD cycles 11
iter 10 #CD cycles 1
iter 11 #CD cycles 5
iter 12 #CD cycles 3
iter 13 #CD cycles 2
iter 14 #CD cycles 3
iter 15 #CD cycles 9
iter 16 #CD cycles 2
iter 17 #CD cycles 1
iter 18 #CD cycles 250
iter 19 #CD cycles 3
iter 20 #CD cycles 1
iter 21 #CD cycles 35
iter 22 #CD cycles 2
iter 23 #CD cycles 2
iter 24 #CD cycles 1
iter 25 #CD cycles 44
=========================
optimization finished, #iter = 25
Objective value = 921.272524
#nonzeros/#features = 71/261
[LibLinear]iter 1 #CD cycles 1
iter 2 #CD cycles 3
iter 3 #CD cycles 1
iter 4 #CD cycles 10
iter 5 #CD cycles 1
iter 6 #CD cycles 11
iter 7 #CD cycles 1
iter 8 #CD cycles 27
iter 9 #CD cycles 6
iter 10 #CD cycles 1
iter 11 #CD cycles 53
iter 12 #CD cycles 1
iter 13 #CD cycles 68
iter 14 #CD cycles 6
iter 15 #CD cycles 1
iter 16 #CD cycles 132
iter 17 #CD cycles 75
iter 18 #CD cycles 13
iter 19 #CD cycles 20
iter 20 #CD cycles 17
iter 21 #CD cycles 2
iter 22 #CD cycles 17
iter 23 #CD cycles 2
iter 24 #CD cycles 6
iter 25 #CD cycles 13
iter 26 #CD cycles 12
iter 27 #CD cycles 14
iter 28 #CD cycles 6
iter 29 #CD cycles 1
iter 30 #CD cycles 138
=========================
optimization finished, #iter = 30
Objective value = 6693.204470
#nonzeros/#features = 188/261
[LibLinear]iter 1 #CD cycles 1
iter 2 #CD cycles 4
iter 3 #CD cycles 1
iter 4 #CD cycles 7
iter 5 #CD cycles 1
iter 6 #CD cycles 12
iter 7 #CD cycles 1
iter 8 #CD cycles 19
iter 9 #CD cycles 9
iter 10 #CD cycles 1
iter 11 #CD cycles 93
iter 12 #CD cycles 26
iter 13 #CD cycles 1
iter 14 #CD cycles 200
iter 15 #CD cycles 38
iter 16 #CD cycles 34
iter 17 #CD cycles 4
iter 18 #CD cycles 19
iter 19 #CD cycles 2
iter 20 #CD cycles 2
iter 21 #CD cycles 2
iter 22 #CD cycles 3
iter 23 #CD cycles 3
iter 24 #CD cycles 1
iter 25 #CD cycles 166
iter 26 #CD cycles 57
iter 27 #CD cycles 18
iter 28 #CD cycles 8
iter 29 #CD cycles 27
iter 30 #CD cycles 28
iter 31 #CD cycles 14
iter 32 #CD cycles 10
iter 33 #CD cycles 138
iter 34 #CD cycles 15
iter 35 #CD cycles 18
iter 36 #CD cycles 5
iter 37 #CD cycles 6
iter 38 #CD cycles 6
iter 39 #CD cycles 1
iter 40 #CD cycles 894
=========================
optimization finished, #iter = 40
Objective value = 50375.113680
#nonzeros/#features = 247/261
[LibLinear]iter 1 #CD cycles 1
iter 2 #CD cycles 3
iter 3 #CD cycles 1
iter 4 #CD cycles 5
iter 5 #CD cycles 4
iter 6 #CD cycles 2
iter 7 #CD cycles 1
iter 8 #CD cycles 10
iter 9 #CD cycles 2
iter 10 #CD cycles 1
iter 11 #CD cycles 2
iter 12 #CD cycles 12
iter 13 #CD cycles 2
iter 14 #CD cycles 1
iter 15 #CD cycles 57
iter 16 #CD cycles 47
iter 17 #CD cycles 10
iter 18 #CD cycles 3
iter 19 #CD cycles 3
iter 20 #CD cycles 15
iter 21 #CD cycles 1
iter 22 #CD cycles 270
iter 23 #CD cycles 149
iter 24 #CD cycles 12
iter 25 #CD cycles 12
iter 26 #CD cycles 80
iter 27 #CD cycles 38
iter 28 #CD cycles 1
WARNING: reaching max number of inner iterations
iter 29 #CD cycles 1000
iter 30 #CD cycles 437
iter 31 #CD cycles 106
iter 32 #CD cycles 2
iter 33 #CD cycles 45
iter 34 #CD cycles 7
iter 35 #CD cycles 100
iter 36 #CD cycles 5
iter 37 #CD cycles 2
iter 38 #CD cycles 4
iter 39 #CD cycles 378
=========================
optimization finished, #iter = 39
Objective value = 387054.378788
#nonzeros/#features = 252/261
[LibLinear]iter 1 #CD cycles 1
iter 2 #CD cycles 3
iter 3 #CD cycles 1
iter 4 #CD cycles 7
iter 5 #CD cycles 3
iter 6 #CD cycles 2
iter 7 #CD cycles 1
iter 8 #CD cycles 7
iter 9 #CD cycles 1
iter 10 #CD cycles 15
iter 11 #CD cycles 2
iter 12 #CD cycles 7
iter 13 #CD cycles 3
iter 14 #CD cycles 4
iter 15 #CD cycles 1
iter 16 #CD cycles 36
iter 17 #CD cycles 20
iter 18 #CD cycles 7
iter 19 #CD cycles 6
iter 20 #CD cycles 3
iter 21 #CD cycles 9
iter 22 #CD cycles 15
iter 23 #CD cycles 6
iter 24 #CD cycles 14
iter 25 #CD cycles 6
iter 26 #CD cycles 9
iter 27 #CD cycles 7
iter 28 #CD cycles 4
iter 29 #CD cycles 4
iter 30 #CD cycles 6
iter 31 #CD cycles 2
iter 32 #CD cycles 6
iter 33 #CD cycles 2
iter 34 #CD cycles 4
iter 35 #CD cycles 2
iter 36 #CD cycles 6
iter 37 #CD cycles 3
iter 38 #CD cycles 8
iter 39 #CD cycles 1
iter 40 #CD cycles 513
iter 41 #CD cycles 33
iter 42 #CD cycles 39
iter 43 #CD cycles 7
iter 44 #CD cycles 5
iter 45 #CD cycles 86
iter 46 #CD cycles 3
iter 47 #CD cycles 71
iter 48 #CD cycles 3
iter 49 #CD cycles 19
iter 50 #CD cycles 7
iter 51 #CD cycles 1
WARNING: reaching max number of inner iterations
iter 52 #CD cycles 1000
iter 53 #CD cycles 429
iter 54 #CD cycles 69
iter 55 #CD cycles 271
iter 56 #CD cycles 149
iter 57 #CD cycles 119
iter 58 #CD cycles 1
WARNING: reaching max number of inner iterations
iter 59 #CD cycles 1000
WARNING: reaching max number of inner iterations
[LibLinear]iter 60 #CD cycles 1000
=========================
optimization finished, #iter = 60
Objective value = 2992163.872093
#nonzeros/#features = 253/261
iter 1 #CD cycles 1
iter 2 #CD cycles 3
iter 3 #CD cycles 2
iter 4 #CD cycles 1
iter 5 #CD cycles 6
iter 6 #CD cycles 2
iter 7 #CD cycles 1
iter 8 #CD cycles 11
iter 9 #CD cycles 1
iter 10 #CD cycles 6
iter 11 #CD cycles 6
iter 12 #CD cycles 3
iter 13 #CD cycles 1
iter 14 #CD cycles 33
iter 15 #CD cycles 11
iter 16 #CD cycles 18
iter 17 #CD cycles 4
iter 18 #CD cycles 4
iter 19 #CD cycles 4
iter 20 #CD cycles 2
iter 21 #CD cycles 16
iter 22 #CD cycles 1
iter 23 #CD cycles 265
iter 24 #CD cycles 110
iter 25 #CD cycles 20
iter 26 #CD cycles 5
iter 27 #CD cycles 45
iter 28 #CD cycles 1
WARNING: reaching max number of inner iterations
iter 29 #CD cycles 1000
WARNING: reaching max number of inner iterations
iter 30 #CD cycles 1000
iter 31 #CD cycles 11
iter 32 #CD cycles 94
iter 33 #CD cycles 124
iter 34 #CD cycles 197
iter 35 #CD cycles 60
iter 36 #CD cycles 11
iter 37 #CD cycles 29
iter 38 #CD cycles 4
iter 39 #CD cycles 3
iter 40 #CD cycles 77
iter 41 #CD cycles 106
iter 42 #CD cycles 2
iter 43 #CD cycles 17
iter 44 #CD cycles 2
iter 45 #CD cycles 2
iter 46 #CD cycles 79
iter 47 #CD cycles 5
=========================
optimization finished, #iter = 47
Objective value = 23161765.713181
#nonzeros/#features = 254/261
[LibLinear]iter 1 #CD cycles 1
=========================
optimization finished, #iter = 1
Objective value = 0.282249
#nonzeros/#features = 1/261
[LibLinear]iter 1 #CD cycles 1
iter 2 #CD cycles 1
=========================
optimization finished, #iter = 2
Objective value = 2.163172
#nonzeros/#features = 1/261
[LibLinear]iter 1 #CD cycles 1
iter 2 #CD cycles 5
iter 3 #CD cycles 1
iter 4 #CD cycles 3
iter 5 #CD cycles 1
iter 6 #CD cycles 8
iter 7 #CD cycles 2
iter 8 #CD cycles 2
iter 9 #CD cycles 1
iter 10 #CD cycles 4
iter 11 #CD cycles 1
iter 12 #CD cycles 6
iter 13 #CD cycles 1
iter 14 #CD cycles 10
iter 15 #CD cycles 3
iter 16 #CD cycles 2
iter 17 #CD cycles 1
iter 18 #CD cycles 2
iter 19 #CD cycles 1
iter 20 #CD cycles 11
=========================
optimization finished, #iter = 20
Objective value = 16.656407
#nonzeros/#features = 2/261
[LibLinear]iter 1 #CD cycles 1
iter 2 #CD cycles 4
iter 3 #CD cycles 1
iter 4 #CD cycles 7
iter 5 #CD cycles 1
iter 6 #CD cycles 9
iter 7 #CD cycles 1
iter 8 #CD cycles 10
iter 9 #CD cycles 1
iter 10 #CD cycles 45
iter 11 #CD cycles 17
iter 12 #CD cycles 5
iter 13 #CD cycles 7
iter 14 #CD cycles 34
iter 15 #CD cycles 5
iter 16 #CD cycles 2
iter 17 #CD cycles 12
iter 18 #CD cycles 1
iter 19 #CD cycles 4
iter 20 #CD cycles 7
iter 21 #CD cycles 17
iter 22 #CD cycles 2
iter 23 #CD cycles 8
iter 24 #CD cycles 5
iter 25 #CD cycles 2
iter 26 #CD cycles 10
iter 27 #CD cycles 5
iter 28 #CD cycles 2
iter 29 #CD cycles 9
iter 30 #CD cycles 6
iter 31 #CD cycles 7
iter 32 #CD cycles 12
iter 33 #CD cycles 4
iter 34 #CD cycles 1
iter 35 #CD cycles 65
iter 36 #CD cycles 12
iter 37 #CD cycles 24
iter 38 #CD cycles 29
iter 39 #CD cycles 17
iter 40 #CD cycles 7
iter 41 #CD cycles 17
iter 42 #CD cycles 5
iter 43 #CD cycles 5
iter 44 #CD cycles 5
iter 45 #CD cycles 3
iter 46 #CD cycles 7
iter 47 #CD cycles 4
iter 48 #CD cycles 3
iter 49 #CD cycles 4
iter 50 #CD cycles 1
iter 51 #CD cycles 26
iter 52 #CD cycles 6
iter 53 #CD cycles 18
iter 54 #CD cycles 24
=========================
optimization finished, #iter = 54
Objective value = 125.373822
#nonzeros/#features = 14/261
[LibLinear]iter 1 #CD cycles 1
iter 2 #CD cycles 1
iter 3 #CD cycles 11
iter 4 #CD cycles 4
iter 5 #CD cycles 1
iter 6 #CD cycles 16
iter 7 #CD cycles 1
iter 8 #CD cycles 35
iter 9 #CD cycles 1
iter 10 #CD cycles 70
iter 11 #CD cycles 3
iter 12 #CD cycles 1
iter 13 #CD cycles 273
iter 14 #CD cycles 59
iter 15 #CD cycles 5
iter 16 #CD cycles 4
iter 17 #CD cycles 3
iter 18 #CD cycles 4
iter 19 #CD cycles 1
iter 20 #CD cycles 75
iter 21 #CD cycles 45
iter 22 #CD cycles 22
iter 23 #CD cycles 1
iter 24 #CD cycles 129
=========================
optimization finished, #iter = 24
Objective value = 913.979524
#nonzeros/#features = 70/261
[LibLinear]iter 1 #CD cycles 1
iter 2 #CD cycles 3
iter 3 #CD cycles 1
iter 4 #CD cycles 4
iter 5 #CD cycles 3
iter 6 #CD cycles 1
iter 7 #CD cycles 16
iter 8 #CD cycles 1
iter 9 #CD cycles 97
iter 10 #CD cycles 15
iter 11 #CD cycles 2
iter 12 #CD cycles 1
iter 13 #CD cycles 154
iter 14 #CD cycles 10
iter 15 #CD cycles 1
iter 16 #CD cycles 381
iter 17 #CD cycles 12
iter 18 #CD cycles 3
iter 19 #CD cycles 3
iter 20 #CD cycles 10
iter 21 #CD cycles 9
iter 22 #CD cycles 2
iter 23 #CD cycles 10
iter 24 #CD cycles 1
iter 25 #CD cycles 176
iter 26 #CD cycles 30
iter 27 #CD cycles 8
iter 28 #CD cycles 16
iter 29 #CD cycles 13
iter 30 #CD cycles 4
iter 31 #CD cycles 3
iter 32 #CD cycles 3
iter 33 #CD cycles 2
iter 34 #CD cycles 1
iter 35 #CD cycles 171
=========================
optimization finished, #iter = 35
Objective value = 6632.705162
#nonzeros/#features = 188/261
[LibLinear]iter 1 #CD cycles 1
iter 2 #CD cycles 2
iter 3 #CD cycles 1
iter 4 #CD cycles 8
iter 5 #CD cycles 1
iter 6 #CD cycles 15
iter 7 #CD cycles 2
iter 8 #CD cycles 1
iter 9 #CD cycles 133
iter 10 #CD cycles 8
iter 11 #CD cycles 10
iter 12 #CD cycles 1
iter 13 #CD cycles 226
iter 14 #CD cycles 7
iter 15 #CD cycles 2
iter 16 #CD cycles 5
iter 17 #CD cycles 4
iter 18 #CD cycles 5
iter 19 #CD cycles 4
iter 20 #CD cycles 3
iter 21 #CD cycles 3
iter 22 #CD cycles 2
iter 23 #CD cycles 8
iter 24 #CD cycles 29
iter 25 #CD cycles 3
iter 26 #CD cycles 1
iter 27 #CD cycles 439
iter 28 #CD cycles 23
iter 29 #CD cycles 1
iter 30 #CD cycles 460
iter 31 #CD cycles 1
iter 32 #CD cycles 599
=========================
optimization finished, #iter = 32
Objective value = 49991.031555
#nonzeros/#features = 244/261
[LibLinear]iter 1 #CD cycles 1
iter 2 #CD cycles 3
iter 3 #CD cycles 1
iter 4 #CD cycles 6
iter 5 #CD cycles 1
iter 6 #CD cycles 13
iter 7 #CD cycles 1
iter 8 #CD cycles 108
iter 9 #CD cycles 12
iter 10 #CD cycles 1
iter 11 #CD cycles 292
iter 12 #CD cycles 39
iter 13 #CD cycles 8
iter 14 #CD cycles 2
iter 15 #CD cycles 5
iter 16 #CD cycles 7
iter 17 #CD cycles 5
iter 18 #CD cycles 6
iter 19 #CD cycles 4
iter 20 #CD cycles 8
iter 21 #CD cycles 1
iter 22 #CD cycles 509
iter 23 #CD cycles 57
iter 24 #CD cycles 103
iter 25 #CD cycles 8
iter 26 #CD cycles 8
iter 27 #CD cycles 6
iter 28 #CD cycles 3
iter 29 #CD cycles 1
iter 30 #CD cycles 902
iter 31 #CD cycles 10
iter 32 #CD cycles 237
iter 33 #CD cycles 15
iter 34 #CD cycles 4
iter 35 #CD cycles 10
iter 36 #CD cycles 4
iter 37 #CD cycles 4
iter 38 #CD cycles 64
iter 39 #CD cycles 7
iter 40 #CD cycles 56
iter 41 #CD cycles 1
WARNING: reaching max number of inner iterations
iter 42 #CD cycles 1000
WARNING: reaching max number of inner iterations
iter 43 #CD cycles 1000
=========================
optimization finished, #iter = 43
Objective value = 384337.472655
#nonzeros/#features = 254/261
[LibLinear]iter 1 #CD cycles 1
iter 2 #CD cycles 3
iter 3 #CD cycles 1
iter 4 #CD cycles 8
iter 5 #CD cycles 1
iter 6 #CD cycles 14
iter 7 #CD cycles 1
iter 8 #CD cycles 87
iter 9 #CD cycles 10
iter 10 #CD cycles 34
iter 11 #CD cycles 1
iter 12 #CD cycles 119
iter 13 #CD cycles 21
iter 14 #CD cycles 4
iter 15 #CD cycles 3
iter 16 #CD cycles 6
iter 17 #CD cycles 4
iter 18 #CD cycles 3
iter 19 #CD cycles 17
iter 20 #CD cycles 19
iter 21 #CD cycles 20
iter 22 #CD cycles 11
iter 23 #CD cycles 1
iter 24 #CD cycles 101
iter 25 #CD cycles 345
iter 26 #CD cycles 1
WARNING: reaching max number of inner iterations
iter 27 #CD cycles 1000
iter 28 #CD cycles 803
iter 29 #CD cycles 8
iter 30 #CD cycles 10
iter 31 #CD cycles 95
iter 32 #CD cycles 187
iter 33 #CD cycles 11
iter 34 #CD cycles 20
iter 35 #CD cycles 8
iter 36 #CD cycles 13
iter 37 #CD cycles 1
WARNING: reaching max number of inner iterations
iter 38 #CD cycles 1000
=========================
optimization finished, #iter = 38
Objective value = 2971751.337309
#nonzeros/#features = 255/261
[LibLinear]iter 1 #CD cycles 1
iter 2 #CD cycles 4
iter 3 #CD cycles 1
iter 4 #CD cycles 6
iter 5 #CD cycles 2
iter 6 #CD cycles 1
iter 7 #CD cycles 10
iter 8 #CD cycles 2
iter 9 #CD cycles 1
iter 10 #CD cycles 43
iter 11 #CD cycles 55
iter 12 #CD cycles 12
iter 13 #CD cycles 6
iter 14 #CD cycles 10
iter 15 #CD cycles 1
iter 16 #CD cycles 43
iter 17 #CD cycles 23
iter 18 #CD cycles 64
iter 19 #CD cycles 24
iter 20 #CD cycles 47
iter 21 #CD cycles 21
iter 22 #CD cycles 27
iter 23 #CD cycles 4
iter 24 #CD cycles 6
iter 25 #CD cycles 7
iter 26 #CD cycles 5
iter 27 #CD cycles 5
iter 28 #CD cycles 12
iter 29 #CD cycles 12
iter 30 #CD cycles 1
iter 31 #CD cycles 166
iter 32 #CD cycles 70
iter 33 #CD cycles 52
iter 34 #CD cycles 10
iter 35 #CD cycles 47
iter 36 #CD cycles 17
iter 37 #CD cycles 31
iter 38 #CD cycles 11
iter 39 #CD cycles 4
iter 40 #CD cycles 17
iter 41 #CD cycles 19
iter 42 #CD cycles 4
iter 43 #CD cycles 24
iter 44 #CD cycles 5
iter 45 #CD cycles 16
iter 46 #CD cycles 22
iter 47 #CD cycles 18
iter 48 #CD cycles 3
iter 49 #CD cycles 5
iter 50 #CD cycles 4
iter 51 #CD cycles 5
iter 52 #CD cycles 1
WARNING: reaching max number of inner iterations
iter 53 #CD cycles 1000
iter 54 #CD cycles 324
iter 55 #CD cycles 43
iter 56 #CD cycles 28
iter 57 #CD cycles 6
iter 58 #CD cycles 70
iter 59 #CD cycles 2
iter 60 #CD cycles 20
iter 61 #CD cycles 56
iter 62 #CD cycles 46
iter 63 #CD cycles 66
iter 64 #CD cycles 77
iter 65 #CD cycles 1
WARNING: reaching max number of inner iterations
iter 66 #CD cycles 1000
=========================
optimization finished, #iter = 66
Objective value = 23004120.828086
#nonzeros/#features = 255/261
[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 7.0s finished
Let’s look at the regularization parameter chosen and the non-zero coefficients.
# plots illustrating regularization parameter choice
scores=lasso_mod.scores_[1.0].mean(axis=0)
logpenalties=np.log(lasso_mod.Cs_)
nnonzero=(np.abs(lasso_mod.coefs_paths_[1.0])>1e-6).sum(axis=2).mean(axis=0)
colors=plt.rcParams["axes.prop_cycle"].by_key()["color"]
fig, ax1 = plt.subplots()
ax1.plot(logpenalties,scores, color=colors[0])
ax1.set_ylabel("CV log likelihood", color=colors[0])
ax1.set_xlabel("-log(penalty)")
ax1.tick_params('y', colors=colors[0])
ax2 = ax1.twinx()
ax2.plot(logpenalties,nnonzero, color=colors[1])
ax2.set_ylabel("nonzero coefficients", color=colors[1])
ax2.tick_params('y', colors=colors[1])
ax2.grid(b=None);
/tmp/ipykernel_3647/1066459780.py:15: MatplotlibDeprecationWarning: The 'b' parameter of grid() has been renamed 'visible' since Matplotlib 3.5; support for the old name will be dropped two minor releases later.
ax2.grid(b=None);

Let’s also look at the nonzero coefficients. We should be careful about interpreting these, since relatively strong assumptions are needed for lasso to produce consistent coefficient estimates.
Note
Lasso gives accurate predictions under weaker assumptions than needed for consistent coefficient estimates.
# table of nonzero coefficients
coef = pd.DataFrame(index = X.design_info.column_names, columns=["Value"])
coef.Value = np.transpose(lasso_mod.coef_)
print(sum(np.abs(coef.Value)>1.0e-8))
with pd.option_context('display.max_rows', None):
display(coef[np.abs(coef.Value)>1.0e-8])
81
Value | |
---|---|
sex[T.Male] | 0.724575 |
C(priors_count)[T.2] | 0.325190 |
C(priors_count)[T.3] | 0.612226 |
C(priors_count)[T.4] | 0.749692 |
C(priors_count)[T.5] | 0.732777 |
C(priors_count)[T.6] | 0.792317 |
C(priors_count)[T.7] | 1.032678 |
C(priors_count)[T.8] | 1.322562 |
C(priors_count)[T.9] | 1.286934 |
C(priors_count)[T.10] | 1.318435 |
C(priors_count)[T.11] | 1.236906 |
C(priors_count)[T.12] | 0.985704 |
C(priors_count)[T.13] | 1.528308 |
C(priors_count)[T.14] | 0.406007 |
C(priors_count)[T.15] | 0.781431 |
C(priors_count)[T.16] | 0.131662 |
C(priors_count)[T.17] | 0.371005 |
C(priors_count)[T.18] | 1.524261 |
C(priors_count)[T.19] | 1.143306 |
C(priors_count)[T.20] | 0.116828 |
C(priors_count)[T.21] | 1.298520 |
C(priors_count)[T.22] | 1.676819 |
C(priors_count)[T.23] | 1.332181 |
C(priors_count)[T.25] | 0.481923 |
C(priors_count)[T.26] | 0.229653 |
c_charge_degree[T.M] | -0.228174 |
charge_cat[T.Aggrav Battery w/Deadly Weapon] | -0.152888 |
charge_cat[T.Aggravated Assault W/dead Weap] | -0.457763 |
charge_cat[T.Aggravated Battery] | -0.426239 |
charge_cat[T.Battery] | -0.008881 |
charge_cat[T.Battery on Law Enforc Officer] | 0.312822 |
charge_cat[T.Burglary Unoccupied Dwelling] | -0.151930 |
charge_cat[T.Criminal Mischief] | 0.441646 |
charge_cat[T.DUI Level 0.15 Or Minor In Veh] | -0.043810 |
charge_cat[T.DUI Property Damage/Injury] | -0.957457 |
charge_cat[T.Driving License Suspended] | -0.062542 |
charge_cat[T.Driving Under The Influence] | -0.141248 |
charge_cat[T.Felony Driving While Lic Suspd] | 0.182793 |
charge_cat[T.Felony Petit Theft] | 0.697390 |
charge_cat[T.Grand Theft (Motor Vehicle)] | 0.134826 |
charge_cat[T.Operating W/O Valid License] | -0.084294 |
charge_cat[T.Petit Theft] | 0.574963 |
charge_cat[T.Petit Theft $100- $300] | 0.019326 |
charge_cat[T.Poss Pyrrolidinovalerophenone] | 2.135469 |
charge_cat[T.Poss3,4 Methylenedioxymethcath] | -0.186717 |
charge_cat[T.Possession of Cannabis] | -0.394867 |
charge_cat[T.Possession of Cocaine] | 0.595590 |
charge_cat[T.Viol Injunct Domestic Violence] | 0.135584 |
charge_cat[T.Viol Pretrial Release Dom Viol] | 0.231026 |
charge_cat[T.arrest case no charge] | -0.181103 |
charge_cat[T.other] | -0.196719 |
sex[T.Male]:C(priors_count)[T.1] | 0.034422 |
sex[T.Male]:C(priors_count)[T.3] | 0.087813 |
sex[T.Male]:C(priors_count)[T.6] | 0.036024 |
sex[T.Male]:C(priors_count)[T.10] | 0.200309 |
sex[T.Male]:C(priors_count)[T.14] | 0.709787 |
sex[T.Male]:C(priors_count)[T.16] | 1.176643 |
sex[T.Male]:C(priors_count)[T.17] | 1.115193 |
sex[T.Male]:C(priors_count)[T.26] | 0.678018 |
sex[T.Male]:c_charge_degree[T.M] | 0.163773 |
sex[T.Male]:charge_cat[T.Aggravated Battery / Pregnant] | 0.024302 |
sex[T.Male]:charge_cat[T.Burglary Conveyance Unoccup] | 0.516816 |
sex[T.Male]:charge_cat[T.Corrupt Public Servant] | 0.717907 |
sex[T.Male]:charge_cat[T.Crimin Mischief Damage $1000+] | 0.044230 |
sex[T.Male]:charge_cat[T.Driving License Suspended] | -0.000916 |
sex[T.Male]:charge_cat[T.Driving Under The Influence] | -0.034986 |
sex[T.Male]:charge_cat[T.Driving While License Revoked] | -0.094708 |
sex[T.Male]:charge_cat[T.False Ownership Info/Pawn Item] | 0.480690 |
sex[T.Male]:charge_cat[T.Felony Battery w/Prior Convict] | -0.096459 |
sex[T.Male]:charge_cat[T.Felony Driving While Lic Suspd] | 0.114968 |
sex[T.Male]:charge_cat[T.Grand Theft in the 3rd Degree] | 0.277290 |
sex[T.Male]:charge_cat[T.Petit Theft] | 0.199594 |
sex[T.Male]:charge_cat[T.Possession of Cannabis] | -0.051178 |
sex[T.Male]:charge_cat[T.Resist/Obstruct W/O Violence] | 0.272849 |
sex[T.Male]:charge_cat[T.Viol Injunct Domestic Violence] | 0.325852 |
age | -0.024234 |
sex[T.Male]:age | -0.014789 |
juv_fel_count | 0.302099 |
sex[T.Male]:juv_misd_count | 0.181025 |
juv_other_count | 0.149167 |
sex[T.Male]:juv_other_count | -0.019193 |
Now, let’s look at calibration and balance using similar tables and figures as we did above.
output, given_outcome, given_pred =cm_tables(
lasso_mod.predict(X_test),
y_test,
df_test
)
display(output)
display(given_pred)
display(given_outcome)
calibration_plot(lasso_mod.predict_proba(X_test)[:,1],y_test, df_test)
balance_threshold_plot(lasso_mod.predict_proba(X_test)[:,1],y_test, df_test);
/tmp/ipykernel_3647/1996610596.py:38: FutureWarning: In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
output.loc[:, group] = vals.reshape(-1)
/tmp/ipykernel_3647/1996610596.py:38: FutureWarning: In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
output.loc[:, group] = vals.reshape(-1)
/tmp/ipykernel_3647/1996610596.py:38: FutureWarning: In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
output.loc[:, group] = vals.reshape(-1)
/tmp/ipykernel_3647/1996610596.py:38: FutureWarning: In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
output.loc[:, group] = vals.reshape(-1)
/tmp/ipykernel_3647/1996610596.py:38: FutureWarning: In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
output.loc[:, group] = vals.reshape(-1)
/tmp/ipykernel_3647/1996610596.py:38: FutureWarning: In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
output.loc[:, group] = vals.reshape(-1)
overall | African-American | Caucasian | Hispanic | Female | Male | |
---|---|---|---|---|---|---|
Portion_of_NoRecid_and_LowRisk | 0.413082 | 0.334395 | 0.490000 | 0.593548 | 0.602305 | 0.364444 |
Portion_of_Recid_and_LowRisk | 0.140247 | 0.159236 | 0.121667 | 0.096774 | 0.043228 | 0.165185 |
Portion_of_NoRecid_and_HighRisk | 0.190925 | 0.182590 | 0.201667 | 0.200000 | 0.250720 | 0.175556 |
Portion_of_Recid_and_HighRisk | 0.255745 | 0.323779 | 0.186667 | 0.109677 | 0.103746 | 0.294815 |
overall | African-American | Caucasian | Hispanic | Female | Male | |
---|---|---|---|---|---|---|
P(NoRecid|LowRisk) | 0.683902 | 0.646817 | 0.708434 | 0.747967 | 0.706081 | 0.674897 |
P(NoRecid|HighRisk) | 0.354167 | 0.329670 | 0.394595 | 0.468750 | 0.294118 | 0.359098 |
P(Recid|LowRisk) | 0.316098 | 0.353183 | 0.291566 | 0.252033 | 0.293919 | 0.325103 |
P(Recid|HighRisk) | 0.645833 | 0.670330 | 0.605405 | 0.531250 | 0.705882 | 0.640902 |
overall | African-American | Caucasian | Hispanic | Female | Male | |
---|---|---|---|---|---|---|
P(LowRisk|NoRecid) | 0.746539 | 0.677419 | 0.801090 | 0.859813 | 0.933036 | 0.688112 |
P(HighRisk|NoRecid) | 0.253461 | 0.322581 | 0.198910 | 0.140187 | 0.066964 | 0.311888 |
P(LowRisk|Recid) | 0.427441 | 0.360587 | 0.519313 | 0.645833 | 0.707317 | 0.373228 |
P(HighRisk|Recid) | 0.572559 | 0.639413 | 0.480687 | 0.354167 | 0.292683 | 0.626772 |


As with COMPAS score, our predictions are well-calibrated, but the false negative and false positive rates are not well balanced across racial groups.
Exercise
See exercise 3 in the exercise list.
Regularizing to Maximize Balance#
Trying to improve balance by ad-hoc modifications will be difficult.
Let’s try to do it more systematically.
We usually select models and choose regularization to minimize prediction errors.
We can just as well select models and regularization parameters to optimize some other criteria.
Let’s choose the regularization parameter for lasso to maximize balance.
# define a custom CV criteria to maximize
def balance_scorer(y_true, prob, df, weights):
ind = df.isin(y_true.index)
df_cv = df.loc[y_true.index.values,:]
b = df_cv.race=="African-American"
w = df_cv.race=="Caucasian"
y_pred = 1*(prob>0.5)
fprb = np.mean(y_pred[(y_true==0) & b])
fprw = np.mean(y_pred[(y_true==0) & w])
fnrb = np.mean(y_pred[(y_true==1) & b]==0)
fnrw = np.mean(y_pred[(y_true==1) & w]==0)
return(-weights[0]*(fprb-fprw)**2 +
-weights[1]*(fnrb-fnrw)**2 +
-weights[2]*(metrics.log_loss(y_true, prob, normalize=True)))
score_params = {"df": df_train, "weights": [10.0, 1.0, 0.0]}
scorer = metrics.make_scorer(balance_scorer, **score_params, needs_proba=True)
grid_cv = model_selection.GridSearchCV(
estimator=linear_model.LogisticRegression(penalty="l1",
max_iter=100,
solver="liblinear"),
scoring=scorer,
cv=5,
param_grid={'C':
np.exp(np.linspace(-10,10,10))},
return_train_score=True,
verbose=True,
refit=True,)
balance_mod=grid_cv.fit(X_train,y_train)
Fitting 5 folds for each of 10 candidates, totalling 50 fits
# plots illustrating regularization parameter choice
def grid_cv_plot(mod, ylabel=""):
scores=mod.cv_results_["mean_test_score"]
Cdict=mod.cv_results_["params"]
logpenalties=np.log([d['C'] for d in Cdict])
colors=plt.rcParams["axes.prop_cycle"].by_key()["color"]
fig, ax1 = plt.subplots()
ax1.plot(logpenalties,scores, color=colors[0])
ax1.set_ylabel(ylabel, color=colors[0])
ax1.set_xlabel("-log(penalty)")
ax1.tick_params('y', colors=colors[0]);
grid_cv_plot(balance_mod,"CV balance score")

We can be perfectly balanced by making the regularization parameter very large.
Unfortunately, this makes all the predictions identical, so these predictions are not so useful.
output, given_outcome, given_pred =cm_tables(
balance_mod.best_estimator_.predict(X_test),
y_test,
df_test
)
display(output)
display(given_pred)
display(given_outcome)
/tmp/ipykernel_3647/1996610596.py:38: FutureWarning: In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
output.loc[:, group] = vals.reshape(-1)
/tmp/ipykernel_3647/1996610596.py:38: FutureWarning: In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
output.loc[:, group] = vals.reshape(-1)
/tmp/ipykernel_3647/1996610596.py:38: FutureWarning: In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
output.loc[:, group] = vals.reshape(-1)
/tmp/ipykernel_3647/1996610596.py:38: FutureWarning: In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
output.loc[:, group] = vals.reshape(-1)
/tmp/ipykernel_3647/1996610596.py:38: FutureWarning: In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
output.loc[:, group] = vals.reshape(-1)
/tmp/ipykernel_3647/1996610596.py:38: FutureWarning: In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
output.loc[:, group] = vals.reshape(-1)
/tmp/ipykernel_3647/1996610596.py:44: RuntimeWarning: invalid value encountered in divide
pcm = pcm/pcm.sum(axis=axis, keepdims=True)
overall | African-American | Caucasian | Hispanic | Female | Male | |
---|---|---|---|---|---|---|
Portion_of_NoRecid_and_LowRisk | 0.553329 | 0.493631 | 0.611667 | 0.690323 | 0.645533 | 0.52963 |
Portion_of_Recid_and_LowRisk | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.00000 |
Portion_of_NoRecid_and_HighRisk | 0.446671 | 0.506369 | 0.388333 | 0.309677 | 0.354467 | 0.47037 |
Portion_of_Recid_and_HighRisk | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.00000 |
overall | African-American | Caucasian | Hispanic | Female | Male | |
---|---|---|---|---|---|---|
P(NoRecid|LowRisk) | 0.553329 | 0.493631 | 0.611667 | 0.690323 | 0.645533 | 0.52963 |
P(NoRecid|HighRisk) | NaN | NaN | NaN | NaN | NaN | NaN |
P(Recid|LowRisk) | 0.446671 | 0.506369 | 0.388333 | 0.309677 | 0.354467 | 0.47037 |
P(Recid|HighRisk) | NaN | NaN | NaN | NaN | NaN | NaN |
overall | African-American | Caucasian | Hispanic | Female | Male | |
---|---|---|---|---|---|---|
P(LowRisk|NoRecid) | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 |
P(HighRisk|NoRecid) | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
P(LowRisk|Recid) | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 |
P(HighRisk|Recid) | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
What if we change our CV scoring function to care about both prediction and balance?
score_params = {"df": df_train, "weights": [10.0, 1.0, 5.0]}
grid_cv.set_params(scoring=metrics.make_scorer(balance_scorer, **score_params, needs_proba=True))
bf_mod=grid_cv.fit(X_train,y_train)
grid_cv_plot(bf_mod,"CV balance & fit")
output, given_outcome, given_pred =cm_tables(
bf_mod.best_estimator_.predict(X_test),
y_test,
df_test
)
display(output)
display(given_pred)
display(given_outcome)
calibration_plot(bf_mod.best_estimator_.predict_proba(X_test)[:,1],y_test, df_test)
balance_threshold_plot(bf_mod.best_estimator_.predict_proba(X_test)[:,1],y_test, df_test);
Fitting 5 folds for each of 10 candidates, totalling 50 fits
/tmp/ipykernel_3647/1996610596.py:38: FutureWarning: In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
output.loc[:, group] = vals.reshape(-1)
/tmp/ipykernel_3647/1996610596.py:38: FutureWarning: In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
output.loc[:, group] = vals.reshape(-1)
/tmp/ipykernel_3647/1996610596.py:38: FutureWarning: In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
output.loc[:, group] = vals.reshape(-1)
/tmp/ipykernel_3647/1996610596.py:38: FutureWarning: In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
output.loc[:, group] = vals.reshape(-1)
/tmp/ipykernel_3647/1996610596.py:38: FutureWarning: In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
output.loc[:, group] = vals.reshape(-1)
/tmp/ipykernel_3647/1996610596.py:38: FutureWarning: In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
output.loc[:, group] = vals.reshape(-1)
overall | African-American | Caucasian | Hispanic | Female | Male | |
---|---|---|---|---|---|---|
Portion_of_NoRecid_and_LowRisk | 0.405421 | 0.328025 | 0.483333 | 0.574194 | 0.561960 | 0.365185 |
Portion_of_Recid_and_LowRisk | 0.147908 | 0.165605 | 0.128333 | 0.116129 | 0.083573 | 0.164444 |
Portion_of_NoRecid_and_HighRisk | 0.189747 | 0.181529 | 0.200000 | 0.200000 | 0.239193 | 0.177037 |
Portion_of_Recid_and_HighRisk | 0.256924 | 0.324841 | 0.188333 | 0.109677 | 0.115274 | 0.293333 |
overall | African-American | Caucasian | Hispanic | Female | Male | |
---|---|---|---|---|---|---|
P(NoRecid|LowRisk) | 0.681188 | 0.643750 | 0.707317 | 0.741667 | 0.701439 | 0.673497 |
P(NoRecid|HighRisk) | 0.365357 | 0.337662 | 0.405263 | 0.514286 | 0.420290 | 0.359223 |
P(Recid|LowRisk) | 0.318812 | 0.356250 | 0.292683 | 0.258333 | 0.298561 | 0.326503 |
P(Recid|HighRisk) | 0.634643 | 0.662338 | 0.594737 | 0.485714 | 0.579710 | 0.640777 |
overall | African-American | Caucasian | Hispanic | Female | Male | |
---|---|---|---|---|---|---|
P(LowRisk|NoRecid) | 0.732694 | 0.664516 | 0.790191 | 0.831776 | 0.870536 | 0.689510 |
P(HighRisk|NoRecid) | 0.267306 | 0.335484 | 0.209809 | 0.168224 | 0.129464 | 0.310490 |
P(LowRisk|Recid) | 0.424802 | 0.358491 | 0.515021 | 0.645833 | 0.674797 | 0.376378 |
P(HighRisk|Recid) | 0.575198 | 0.641509 | 0.484979 | 0.354167 | 0.325203 | 0.623622 |



Exercise
See exercise 4 in the exercise list.
Tradeoffs are Inevitable#
We could try to tweak our predictions further to improve balance.
However, motivated in part by this COMPAS example, [recidKLL+17] proved that it is impossible for any prediction algorithm to be both perfectly balanced and well-calibrated.
Improvements in balance necessarily make calibration worse.
References#
- recidDMB16
William Dieterich, Christina Mendoza, and Tim Brennan. Compas risk scales: demonstrating accuracy equity and predictive parity. Northpoint Inc, 2016.
- recidKLL+17(1,2,3)
Jon Kleinberg, Himabindu Lakkaraju, Jure Leskovec, Jens Ludwig, and Sendhil Mullainathan. Human Decisions and Machine Predictions*. The Quarterly Journal of Economics, 133(1):237–293, 08 2017. URL: https://dx.doi.org/10.1093/qje/qjx032, arXiv:http://oup.prod.sis.lan/qje/article-pdf/133/1/237/24246094/qjx032.pdf, doi:10.1093/qje/qjx032.
Exercises#
Exercise 1#
Can you develop a model that performs better at mimicking their risk scores?
Exercise 2#
We made our calibration plot using a held-out test sample. What do you think would happen if made the calibration plot using the training sample? Check and see.
# Create calibration plot using training data
Exercise 3#
Try to improve balance and/or calibration by creating an alternative prediction.
# Fit your prediction model and plot calibration and balance
Exercise 4#
Modify the cross-validation scoring function to see how it affects calibration and balance.