WEEK 7: Bias and ethics
(3) Objective.–The term `objective’, when used with respect to statistical activities, means accurate, clear, complete, and unbiased.” Sec. 3562 Coordination and oversight of policies, The Foundations for Evidence-Based Policy Making Act (1)
- Introduce the concepts of bias and fairness
The role of objectives and measures
In the realm of data science, two concepts have become increasingly important: bias and ethics. Understanding these concepts is crucial for policy analysts working with data and artificial intelligence (AI). Bias refers to systematic errors that can distort analytical results. Ethics refers to a set of principles and rules that guide decision making; those should ideally be open and transparent. Since policy implementation is inherently about allocating resources from one group to another group, someone gains and someone loses. Since there are no absolute criteria of what is fair and what is not, the National Academy of Public Administration has identified the need for a clear mission and quantifiable metrics as the primary principles for public organizations tasked with enhancing policy outcomes. Indeed, both the corporate world and the public sector require a governance structure that is designed to achieve a clearly defined objective (2, 3) and establishes the measures to gauge the success in achieving that objective (4). READ MORE
Bias and data ethics in practice
Bias can creep in at every stage of the data collection and analysis. In the early stages, the process by which data are generated may reflect existing inequalities or data collection and processing may be biased. The training datasets that are developed for machine learning, which by definition reflect a sample of the population, as can the labelling of those datasets through human review (5, 6). And the final decisions that are made and also be biased(7). The biases take several forms. Measurement bias, for instance, surfaces when the data that is collected doesn’t accurately reflect the underlying concept, often due to issues in the measurement process itself or because the concept is complex (the measurement of employment is a good example(8)). Bias due to lack of representativeness can occur if the data that is collected is systematically different from the underlying population (311 calls about crime are likely to underrepresent poor neighborhoods). Survivorship bias, a subset of selection bias, only considers the “survivors” of a process, overlooking those who didn’t make it through (such as gender differences in publications(9)). Algorithmic bias is another critical issue, occurring when the algorithms used for data processing or analysis are biased, often due to biased training data or assumptions in the algorithm’s design.
Data ethics has been defined as “An area in AI ethics with focus on data practices with impact on people and society. It recommends responsible and sustainable data collection, processes, and storage practices. Additionally, it ensures ethical use of data”(10). It focuses on issues of ownership, data and confidentiality, consent, and transparency. Ethics issues are apparent in every aspect of policy analysis. For example, in machine learning, outcomes are typically predictions between zero and one. In the public policy case, a high threshold will typically result in fewer people being identified as recipients of a particular policy, with lower tax payer costs; a low threshold will be the reverse. How is that decision made, and by whom? Better data is necessary to produce more targeted policies, but that means a greater privacy risk. How can that decision be made, and by whom? Data on at risk populations is important to better serve those populations – and reduce the possibility of discrimination – but that also introduces risks. Again, who makes those decisions? READ MORE
Frameworks for objective evidence-building
There is long list of possible frameworks: contractual obligations, utilitarianism, rights-based, golden mean, veil of ignorance, categorical imperatives, to cite just a few(10). Congress requires that federal statistical agencies generate and disseminate pertinent and timely information, carry out credible, accurate, and objective statistical activities. They are also tasked with safeguarding the trust of information providers by ensuring the confidentiality and exclusive statistical use of their responses(11). Numerous National Academies reports (6) have also discussed many of the issues related to framing these responsibilities. The Evidence Act provided a very concrete requirement that agencies balance the value derived from data (utility) for all groups against the protection of privacy (risk of reidentification). It also required that recommendations be made for a governance structure that would be open and transparent as to how such decisions would be made(10, 11). Navigating the landscape of data-driven decision-making in the public sector involves grappling with several complex issues. For instance, the concept of informed consent presents a paradox, as it can either be comprehensive or comprehensible, but striking a balance between the two is challenging (7,8). The question of data ownership remains nebulous (2), further complicating the process. Privacy, another critical aspect, is not an absolute concept but a contextual one, with perceptions of privacy even varying based on survey placement (9,11). READ MORE
Balancing risk and utility
The Evidence Act makes it clear that agencies must define clear user needs and public benefits, the use of minimally intrusive data and tools, and access to data to ensure accountability and transparency in methods, while also ensuring security (2,12,13). The Act also emphasizes that access is central to developing trust in how data are combined, identifying coverage issues – especially for underrepresented or vulnerable groups, particularly in the absence of common identifiers and usable metadata (14). Transparency and accountability are particularly critical in this context (15). The ACDEB report’s Recommendation 1.6 suggests adopting a risk-utility framework as the basis for standards on sensitivity levels, access tiers, and risk evaluations. Section 202 (c) of the Evidence Act further encourages collaboration with non-Government entities, researchers, and the public to understand how data users value and use government data. It mandates agencies to engage the public in using public data assets and assist the public in expanding the use of these assets, thereby fostering a culture of openness and collaboration in the realm of public sector data. READ MORE
Introduce a practical tool-kit for evaluating bias and fairness
Fairness is a concept central to ethical decision-making, can be a complex notion to define and measure, particularly because it can be subjective and context-dependent. In essence, fairness in data science and AI refers to the idea that decisions made by these systems should be equitable and just, not favoring one group over another based on characteristics like gender, race or ethnicity.
While it’s true that reasonable people can have differing opinions on what constitutes an ethical decision, there should be consensus on what needs to be measured and incorporated into decision-making processes. As Simon Winchester points out: “All life depends to some extent on measurement, and in the very earliest days of social organization a clear indication of advancement and sophistication was the degree to which systems of measurement had been established, codified, agreed to and employed”(16). In this context, tools like the AEQUITAS toolkit can be invaluable. This toolkit provides a robust framework to measure the biases in any decisions that are made, offering a practical way to operationalize and quantify fairness. READ MORE
- Bias and Fairness, Chapter 11 in the textbook
- Why they’re worried Student paper for Data Science in Context
- Data Ethics Presentation to Advisory Committee on Data for Evidence Building
- Aequitas: An open source bias toolkit (as much as you want to read)
- US Congress, editor Foundations for Evidence-Based Policy Making Act of 2018. 115 th Congress HR; 2018.
- Hand DJ. Aspects of data ethics in a changing world: Where are we now? Big data. 2018;6(3):176-90.
- Paine LS, Srinivasan S. A guide to the big ideas and debates in corporate governance. Harvard Business Review. 2019:2-19.
- Karpoff JM. On a stakeholder model of corporate governance. Financial Management. 2021;50(2):321-43.
- Kerr S. On the folly of rewarding A, while hoping for B. Academy of Management journal. 1975;18(4):769-83.
- National Academies of Sciences E, Medicine. Federal statistics, multiple data sources, and privacy protection: next steps. 2018.
- Nissenbaum H. Contextual integrity up and down the data food chain. Theoretical Inquiries in Law. 2019;20(1):221-56.
- Nissenbaum H. A contextual approach to privacy online. Daedalus. 2011;140(4):32-48.
- Kreuter F, Haas G-C, Keusch F, Bähr S, Trappmann M. Collecting survey and smartphone sensor data with an app: Opportunities and challenges around privacy and informed consent. Social Science Computer Review. 2020;38(5):533-49.
- Japec L, Kreuter F, Berg M, Biemer P, Decker P, Lampe C, et al. AAPOR report on big data. American Association for Public Opinion Research. 2015.
- Sakshaug J, Tutz V, Kreuter F, editors. Placement, wording, and interviewers: Identifying correlates of consent to link survey and administrative data. Survey Research Methods; 2013.
- Hand DJ, Babb P, Zhang L-C, Allin P, Wallgren A, Wallgren B, et al. Statistical challenges of administrative and transaction data. Journal of the Royal Statistical Society Series A (Statistics in Society). 2018;181(3):555-605.
- Advisory Committee on Data for Evidence Building. Advisory Committee on Data for Evidence Building: Year 2 Report Washington DC2022.
- Chang W-Y, Garner M, Owen-Smith J, Weinberg B. A Linked Data Mosaic or Policy-Relevant Research on Science and Innovation: Value, Transparency, Rigor, and Community Harvard Data Science Review. 2022.
- Romer P, Lane J. Interview With Paul Romer. Harvard Data Science Review. 2022;4(2).
- Winchester S. The perfectionists: how precision engineers created the modern world: HarperLuxe; 2018.
- Saleiro P, Kuester B, Hinkson L, London J, Stevens A, Anisfeld A, et al. Aequitas: A bias and fairness audit toolkit. arXiv preprint arXiv:181105577. 2018.