Week 1: Data Understanding & Management
“One third of a data model usually consists of common constructs that are applicable to most organizations, on third of a data model is usually industry specific, and one-third of the model is specific to the organization” (1)
- Learn more about the structure of administrative data
- Learn how to explore and manage data
- Understand the basics of data protection and the impact on data quality
Understand how to think about data
The potential to do new and exciting empirical analysis is constantly growing as a result both of new sources and of new types of data(2, 3). Data are no longer just carefully produced by statistical agencies, but they’re derived from host of other sources, such as social media, credit card transactions, job listings, cameras and cell phones to cite just a few. Unfortunately, the potential to do really bad analysis is growing just as fast. READ MORE
Education to workforce
The education-workforce use case (MultiState PostSecondary Report) is a high priority issue for evidence based policy-making(2). It is one of the highlighted use cases in the report of the Advisory Committee for Evidence-based Policy making, which noted. “Unprecedented changes in labor markets have led to fundamental changes in skill demands. Both sets of changes underscore the need to strengthen the connection between employment services, post-secondary programs, and workforce outcomes. Building these links will help individuals decide what education paths best meet their needs and will encourage high return investments in skills that yield long-run economic security and mobility” p. 13 (3) READ MORE
There is a massive amount to learn about database management. We cannot hope to cover it all; this section is intended to identify the key ideas, make them concrete using the KY example, and point to additional resources. A database is a collection of data about entities. It can be more or less structured, depending on the type of database. It can include information about the relationships between entities. It is designed to: get and link data from multiple sources, store it efficiently so that we can analyze it easily, quickly, and efficiently and provide the results to other applications/people/systems. READ MORE
Fixing ideas in the education-workforce use case
In the case of education to workforce, there are many complex processes that generate data, and it can easily turn into a giant plate of spaghetti. This is particularly true in education data, where there are many different institutions providing many different certifications, and individuals move in and out of programs and across institutions over time. For the purposes of this work, a very simplified data schema of the process is provided. READ MORE
As discussed in the prework, the legal framework for data access requires ensuring that no individual’s information can be identified. As the Year 2 report of the Advisory Committee on Data for Evidence buliding points out (p11), the education and workforce example can be used to show the privacy/quality tradeoff associated with data access. It also points out (p89) that while synthetic data can be used to protect privacy – but there is a clear privacy and utility tradeoff: READ MORE
Exercise using the Jupyter notebook
- Census Bureau Money Income in the United States (Income Measurement)
- Bureau of Labor Statistics An Inventory of Employee Specific Data Collected on Unemployment Insurance Wage Records (page 5-9)
- Hawley Josh, Lisa Neilson, Erin Joyce, Ethan Joseph,“ADRF Education and Workforce Connections Postsecondary Education Data Model Report” working paper p12-16
Optional readings: READ MORE
- Silverston L, Agnew P. The data model resource book: universal patterns for data modeling: John Wiley & Sons; 2009.
- Cunningham J, Hui A, Lane J, Putnam G. A Value-Driven Approach to Building Data Infrastructures: The example of the MidWest Collaborative. Harvard Data Science Review. 2021.
- Advisory Committee on Data for Evidence Building. Advisory Committee on Data for Evidence Building: Year 2 Report Washington DC 2022.
- Kentucky Statistics. 2023 Multi State PostSecondary Report Technical Notes. 2023.
- Filippova A. Modern data spaghetti. Brave data decisions. Data work ROI. 2021 [Available from: https://roundup.getdbt.com/p/data-spaghetti-brave-decisions-data-roi accessed June 24, 2023.