Giving Voice to Digital Democracies

From Computer Laboratory Group Design Projects
Revision as of 09:40, 30 October 2020 by afb21 (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

GVDD is a project team based at CRASSH

Brief written, waiting to confirm client: Deliberative Social Media

Potential second project: Marcus Tomalin <mt126@cam.ac.uk> suggested:

How about one of the following?

  • Developing NLP-based Visualisation Tools for Data Statements -- i.e., taking the notion of data statements (e.g., Bender and Friedman 2018) as a starting point, develop a suite of NLP-based tools that would enable biases in language-based corpora to be displayed visually
  • Developing Interactive Data Statements -- -- i.e., taking the notion of data statements (e.g., Bender and Friedman 2018) as a starting point, develop an interactive version of a data statement that enables the person using the data to ask and receive answers to (a constrained set of) questions about the data

These are both ideas that we have discussed within the GVDD group, but we haven't focused on thse specific tasks yet (mainly because we were unable to hire a coder over the summer).

Both these projects could be constrained in ways that made them approachable for students, but they could also become as complex as the students wished.

Feedback:

I have already been discussing the broad area of dataset bias with a research fellow at Microsoft Research Cambridge, who is looking at global cultural and economic bias in training of machine vision systems. Are you familiar with “model cards”, which appear similar in their intention to "data statements” as advocated in the Bender and Friedman paper? A recent application of model card approach in response to the recent “white Obama” scandal is described here:

https://thegradient.pub/pulse-lessons/

I think there are a number of potential approaches to anticipating, illustrating and correcting data set bias, but this is a fairly active research area, and I suspect that the specific domain of application for the data set may produced considerable differences in the most appropriate design responses. Is there a particular area (with publicly available datasets) that you think might be appropriate for computer science undergraduates to work on?