毕业研究笔记2_机器学习公平性定义

This section is a summary of fairness definitions and approaches from week1 reference, as well as references listed in paper <Fairness Definition defined>. The content of this section will be updated each time new findings/thoughts appear.

Note: Since definitions are collected from different resources, examples/case analysis are not consistent through the following definitions.

1. Naïve approach – Fairness through blindness/Fairness through unawareness --- Naive

Concept: A classifier satisfies this definition if no sensitive attributes are explicitly used in the training processdecision-making process.[Ref5, def4.2] This naive approach to fairness come from the sense that how could a algorithm be discriminate if the algorithm simply doesn’t look at protected attributes such as race, color, religion, gender, disability, or family status. (Ref1 and Ref3-Data as a social mirror also mentioned this concept)

Representation: X: other attributes except sensitive attributes
d: prediction result (0 or 1 for binary classification)
X_i = X_j -> d_i = d_j

Problem of this approach is: “it fails due to the existence of redundant encodings. There are almost always ways of predicting unknown protected attributes from other seemingly innocuous features.” For example, if one deals with users from a shopping website, one can still predict users’ gender based on shopping history (a user always buying clothes is more likely to be a female user) even though gender information is removed.

Example: Someone proposed that gender-related feature, for example, can be removed together with gender feature. However, many gender-related features are not obvious for researchers. It is, again, a subjective work to identify gender-related features, and could be biased when picking these features.

One experiment: In paper <Fairness Definitions Explained>, they claimed that they removed all sensitive attributes-related features, and trained on this new dataset. The classifier is fair if the classification outcome are the same for applicants i and j who have the same attributes.

Conclusion: 1) Moritz’s claim on this concept: “In fact, when the protected attribute is correlated with a particular classification outcome, this is precisely what we should expect. There is no principled way to tell at which point such a correlation is worrisome and in what cases it is acceptable.” Therefore, one cannot simply remove sensitive attributes because they are important. 2) If one removed all sensitive attribute related features, the accuracy would drop as expected. Removing features/train with unawareness is not a good approach.

2. Demographic Parity[Ref1]/Group Fairness[Ref5.def3.1.1] --- Not good

Concept: Demographic parity requires that a decision—such as accepting or denying a loan application—be independent of the protected attribute.[Ref1] Or a classifier satisfies this definition if subjects in both protected and unprotected groups have equal probability of being assigned to the positive predicted class.[Ref5.def3.1]

Representation: G: gender attribute (m: male; f: female) (Let gender be sensitive feature)
d: prediction result(0 or 1 for binary classification)
P(d=1|G=m) = P(d=1|G=f) [Ref5]

Problem1: Demographic parity doesn’t ensure fairness[Ref1] / The sample size disparity[Ref3]

Let me take an example to illustrate this: if there is 100 male applicants and 10 male applicants, and 90 male applicants are predicted as positive (d=1) while 9 female applicants are predicted as positive (d = 1), then the fraction of applicants being predicted as positive are the same (90/100 = 9/10 = 90%). However, only 9 female applicants receive positive results comparing to 90 male applicants.

The above scenario can arise naturally when there is less training data available about a minority group. Therefore as Moritz claimed “I would argue that there’s a general tendency for automated decisions to favor those who belong to the statistically dominant groups”

Problem2: Demographic parity cripples machine learning[Ref1] / The competition between accuracy and fairness

In real situations, “the target variable Y usually has some positive or negative correlation with membership in the protected group A. This isn’t by itself a cause for concern as interests naturally vary from one group to another.” The above statement tells that it is reasonable in many cases that P(d=1|G=m) != P(d=1|G=f) as you would expect different results from different groups. Thus simply adopting demographic parity as a general measure for fairness is misaligned with the fundamental goal of achieving higher prediction accuracy. The general trend is that if the classifier learnt how to be fairness (how to fit minority groups), its accuracy would decrease.

To conclude, demographic parity is not suitable to be a general measure of fairness because demographic differences are expected and reasonable. Therefore, simply imposing this definition doesn’t improve fairness in real and logical sense, and also would hurt accuracy.

Example of problem 2: When predicting medical conditions, gender attribute is important to consider. The incidence of heart failure happens more often in men than in women, thus it is necessary to expect difference in two groups prediction outputs. It is neither realistic nor desirable to prevent all correlation between the predicted outcome and group membership.

One experiment: In paper <Fairness Definitions Explained>, they calculated the probability of male being predicted as 1 and the probability of female being predicted as 1, and check if this two probabilities are similar within a reasonable range or not.

Conclusion: This definition doesn’t ensure equal spread of positive results to different groups (as explained in problem1), and this definition cannot be applied to many real cases since different performance between different groups are expected.

3. Conditional statistical parity [Ref5.def3.1.2] -- An extension of group fairness (defintion2)

Concept: The principle is similar with definition 2, but the sampling group are shrunken by filtering out the samples dissatisfy legitimate factors L.

Representation: G: gender attribute (m: male; f: female) (Let gender be sensitive feature)
d: prediction result(0 or 1 for binary classification)
L: legitimate factors, a subset of non-sensitive attributes.
P(d=1|L=l, G=m) = P(d=1| L=l,G=f) [Ref5]

Problems: Same as problems of group fairness definition.

One experiment: In paper <Fairness Definitions Explained>, the legitimate factors are chosen to be credit amount, credit history, employment and age.

When to apply this definition? / Why we need legitimate factors L?

4. Confusion Matrix [Ref5. Def3.2]

Concept: A confusion matrix is a table with two rows and two columns that reports the number of false positives (FP), false negatives (FN), true positives (TP), and true negatives (TN). This is a specific table layout that allows visualization of the performance of an algorithm. With four basic metrics (TP, FP, FN, TN), 8 more metrics can be developed.

condusion matrix

For more detailed meaning of these 12 metrics, please check section3 – Statistical Metrics in paper <Fairness Definitions Explained> page 2 and page 3.

There are 7 fairness definitions and measurements defined with the help of confusion matrix, which are explained in section 3.2 in paper <Fairness Definitions Explained>. However, there are few definitions are worthy to be analyzed here.

Reference: Moritz Hardt, Eric Price, and Nati Srebro. 2016. Equality of Opportunity in Supervised Learning. In Advances in Neural Information Processing Systems

4.1 Equalized odds (Equal TPR and FPR)

Concept: Individuals, in either positive label group or negative label group, should have a similar classification, regardless of their sensitive feature.

** Representation**: d: prediction result (0 or 1 for binary classification)
G: gender attribute (m: male; f: female) (Let gender be sensitive feature)
Y: True label (0 or 1 for binary classification)
P(d = 1 |Y = y, G = f) = P(d = 1 |Y = y, G = m) y = 0 or 1

4.2 Equalized opportunity (Equal TPR)

Concept: We are often interested in positive groups (Y = 1), such as ‘not defaulting on a loan’, ‘receive promotion’, ‘admission’. This is a relaxation of Equalized odds which limit the base to positive outcome groups. A classifier is fair under equalized opportunity if the fractions of individuals in positive groups being predicted to be positive are the same, regardless of their sensitive feature. “This approach is the idea that individuals who qualify for a desirable outcome should have an equal chance of being correctly classified for this outcome.”[Moritz]

Conclusion: The above definitions, equalized odds and equalized opportunity, especially equalized opportunity, are very important concepts in model fairness. The advantage and disadvantage are mentioned in the conclusion of the paper < Equality of Opportunity in Supervised Learning >.

Advantage: 1. This measure is performed in a post-processing manner. Therefore, it is more simple and efficient than pre-processing measures, such as fairness through unawareness, where sensitive attributes should be removed before training. This property also ensure privacy-preserving.

2. From the representation above, a better classifier (better accuracy and fairness by equalized odds/opportunity) can be built by collecting features that more capture the target, and it is unrelated to its correlation with the protected attribute. Thus it is fully aligned with the ceral goal: building higher accuracy classifiers.

3. This measure avoids the conceptual shortcomings of demographic parity.

Disadvantage: 1. ‘label data’ is not always available. Thus, this measure is applied to supervise learning. However, the broad success of supervised learning demonstrates that this requirement is met in many important applications.

‘label data’ is not always reliable. The measurement of the target variable might in itself be unreliable or biased. Thus, this measure is more trustworthy when label dataset is more trustworthy.

5. Causal Discrimination [Ref5.def4.1]

Reference: Sainyam Galhotra, Yuriy Brun, and Alexandra Meliou. 2017. Fairness Testing: Testing Software for Discrimination. In Proc. of ESEC/FSE’17.

Concept: A classifier is fair under causal discrimination if it predicts the same classification results for any two subjects with the exact same attributes X. (X: all other attributes except sensitive ones)

Motivation: The work of <Fairness Testing: Testing Software for Discrimination> is motivated by the limitations of group discrimination.

Group discrimination: Fairness is satisfied with respect to an input characteristic (input features) when the distribution of outputs for each group are similar. For example, if 40% purple people are classified with positive outcome, and 40% green people are classified with positive outcome, then the result is fair. However, if the 40% purple people are chosen randomly while the 40% green people are chosen from the top of most-saving groups, then this fairness notion cannot detect this situation.

To address the limitations of group discrimination above, Sainyam suggested causal discrimination, which says that to be fair with respect to a set of characteristics (sensitive features), the classifier must produce the same output for every two individuals who differ only in sensitive features. For example, if two people are identically same except for race feature (one green, one purple), then switch their race feature and see if the outcome has changed with the change of sensitive feature. If the two outcomes are the same, then this classifier is the same.

Representation: X: other attributes except sensitive attributes
G: gender attribute (m: male; f: female)
d: prediction result (0 or 1 for binary classification)
( X_f = X_m ^ G_f != G_f ) -> d_f = d_m

6. Fairness through awareness [Ref5.def4.3]

Reference: Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness Through Awareness. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference.

Concept: This is an individual-based fairness. The central principle of this definition is “two individuals who are similar with respect to a particular task should be classified similarly” [Fairness Through Awareness]. The similarity between individuals are defined by a distance metric. The choice of distance metric is assumed to be ‘public and open to discussion and continual refinement’. And of course, the choice of distance metric is essential to the result of Fairness through awareness.

Representation: The formalization of this fairness condition is Lipschitz condition. In their approach, a classifier is a randomized mapping from individuals to outcomes, or a distributions over outcomes. The Lipschitz mapping is defined below:

Lipschitz mapping

With the help of Lipschitz mapping, the classifier is fair if the distance between M(x) and M(y) is smaller than individual distance d(x,y).

Advantage: This definition give a fairness metric, an individual-based fairness metric, which fills the blanks left by group fairness. Moreover, fairness through awareness can be extended to measure group fairness by giving conditions on the similarity metric.

Related open questions: Three open questions are stated in paper <Fairness Through Awareness>. Read the paper page 19 for details. I want to state one thing from first question in my research note. It would be nice if both individual samples i and j are from the same group, however, it could happen that individual sample i and j are from different groups (one from protected group, another one from unprotected group). When the second situation appear, “we may need human insight and domain information” to set distance metric.

最后编辑于：2018.12.13 18:08:19

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 204,293评论 6赞 478
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 85,604评论 2赞 381
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 150,958评论 0赞 337
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 54,729评论 1赞 277
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 63,719评论 5赞 366
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 48,630评论 1赞 281
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 38,000评论 3赞 397
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 36,665评论 0赞 258
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 40,909评论 1赞 299
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 35,646评论 2赞 321
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 37,726评论 1赞 330
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 33,400评论 4赞 321
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 38,986评论 3赞 307
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 29,959评论 0赞 19
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 31,197评论 1赞 260
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 44,996评论 2赞 349
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 42,481评论 2赞 342

毕业研究笔记2_机器学习公平性定义

1. Naïve approach – Fairness through blindness/Fairness through unawareness --- Naive

2. Demographic Parity[Ref1]/Group Fairness[Ref5.def3.1.1] --- Not good

3. Conditional statistical parity [Ref5.def3.1.2] -- An extension of group fairness (defintion2)

4. Confusion Matrix [Ref5. Def3.2]

推荐阅读更多精彩内容