Skip to main content

Alquimics

The algorithm behind MICS

A comprehensive assessment of the impact of citizen science requires considering hundreds of input features (see our Questions page). And the analysis of hundreds of input features requires sophisticated techniques. MICS rethinks impact assessment in citizen science from the bottom up and puts technology and data at the service of people.

In MICS, data produced by the users of the platform belong to the users. MICS believes data should be considered a public good.

AI, particularly machine-learning algorithms, now helps make decisions on many issues, including impact assessment. But important decisions are often happening in the dark. Programs are often unaccountable. To arrive at their choices, machine-learning algorithms automatically build complex models based on some data sets used as input for learning, so that even the people using them may not be able to explain why or how a conclusion is reached. They are a black box.

Some software owners often claim that increasing transparency could risk revealing their intellectual property. This is not the case in MICS, where all software is open source. But being open-source is not the most critical feature for many. It might not be necessary at all. If a project has a low positive impact, the project participants do not necessarily care so much about how the algorithm works. They want to know why the impact is low and have some guidance on how to improve. So, the project implemented Alquimics, a new impact-assessment feature using machine learning and a rule-based engine in MICS's open-source web application. And Alquimics includes personalised recommendations on how to improve impact.

Alquimics is a significant step forward in transparency and accountability on the intersection between citizen science and AI.

However, transparency in the impact-assessment process is only one side of the coin of accountability. It gives some idea about why a decision has been made in a particular way, but it does not mean the decision is justified or legitimate. One problem, sometimes, is a lack of transparency around what data are being used to train impact-assessment algorithms in the first place. In many cases, algorithms are building assumptions about citizens and their activities based on variables they are unaware of. Organisations should be required to demonstrate that the data points they use to measure impact or make other decisions are relevant and lead to accurate predictions.

Under GDPR, the European data protection regulations implemented in 2018, everyone in the European Union has the right to access and amend any data an organisation holds about them. We also have the right to be forgotten. In MICS, the same applies to project data. As well as access to project data, project coordinators can ask for and correct any assumptions MICS has made about their projects based on whatever content has been collected. This is the logical counterpart to the right to be forgotten. Project participants also have a right on if and how their project is seen.

As mentioned earlier, Alquimics is the algorithm behind the MICS platform. It has been created through part handcrafting (a labour-intensive technique for programming that involves writing explicit rules and templates) and part machine learning (a type of AI that learns to perform a task by analysing patterns in data).

From the start, the team has faced a defining question: Which parts of the platform's brain should be handcrafted and which should employ machine learning?

Handcrafting is the more traditional approach, in which scientists painstakingly write extensive sets of rules to guide AI's understanding and assessments. Statistically-driven machine-learning systems, by contrast, have the computer teach itself to assess impact by learning from data. Machine learning is a superior method for tackling so-called classification problems, in which neural networks find unifying patterns in noisy data. But machine learning has a long way to go when it comes to translating input features characterising citizen-science projects into impact assessment. And the team, like the tech world at large, struggled to find the best balance between the two approaches.

To help Alquimics automatically generate impact assessments, the team had only nine complete sets of instances of about 200 input features at its disposal. So the MICS team wrote extensive assessment-guiding rules. The team created five "impact domains": environment, economy, governance, science, and society. The MICS system has been engineered to know the five domains' core elements. And the team divided the platform's brain into a committee of smaller rule-based algorithms, each with a speciality of its own.

In addition, a machine-learning approach (a neural network) has been developed.

It is especially tough for a machine-learning system to assess impact, because there usually is not a verifiably correct way to evaluate impact.

Neural networks work best when there is a clear goal, like winning at the game of Go, that the system can find the optimal strategy to reach through trial and error on a massive scale. Impact assessment has no well-defined goal. In the rule-based system, the MICS team started guessing how much to weigh each input feature. With the help of the evaluation by the neural network, weights will be adjusted so that the two sides of the algorithm (rule-based and machine-learning—based) will stay aligned, in a middle-of-the-road approach to mixing rules-based programming and machine learning into the system.

In summary, the edge of the MICS platform with respect to existing systems is the most extensive (to date) set of input features used (features and values), and an interface and an interaction that people will (we hope) enjoy.

The system not only asks questions about input features but provides interesting, conversational feedback, so the users can learn more about their projects and how to improve them to achieve a more substantial impact.