One of iNaturalist's core goals is generating high-quality biodiversity data to advance science and conservation. We are launching some experiments to better understand the accuracy of these data. Here’s how they will work:
Step 1 Generate the sample
We draw a random sample of observations from the iNaturalist database of observations.
Step 2 Find potential validators and distribute sample
We choose potential validators and distribute the sample among them, considering their past activity identifying observations on iNaturalist (more details in the FAQ below). We assign the same observation to multiple validators to increase the odds that a large fraction of the sample will be reviewed.
Step 3 Contact potential validators with subsamples, instructions, and deadlines
We send emails to each validator with a link to their subsample loaded in the iNaturalist identify tool, instructions to identify each as best they can, and a deadline after which we will use the new identifications to assess the accuracy of the sample.
Step 4 Validators add new identifications to their subsamples
The instructions are for validators to add the finest identification they can to each observation. We’ve included the instructions in the FAQ below if you’re curious about the details. We know this means that some observations that are already Research Grade might get a flurry of redundant confirming identifications.
Step 5 Assess accuracy by comparing validator identifications to the previous identifications
The top level statistic we are aiming to estimate is Accuracy (the percent of the sample that is correctly identified). We will do this by assuming that the new identifications added by validators are accurate and comparing them to the observation taxon (more details in the FAQ below) to classify observations as correct, incorrect, or uncertain. We use these classifications to calculate high and low estimates of accuracy.
If the sample size is large enough, we may be able to understand variation in accuracy by dividing up the sample by geography, taxonomic group, quality (research grade etc.), and other characteristics.
Our first experiment
We’ll be piloting this protocol with our first experiment later this month (Experiment 1). We’ve already generated the sample (Step 1) and selected potential validators (Step 2). We plan on emailing the potential validators on January 17th (Step 3) with a deadline of January 31 to give validators two weeks to identify their subsamples (Step 4) before we share the results the first week of February (Step 5).
For this first experiment we generated a modest sized sample of just 1,000 observations. We distributed it among 1,219 potential validators attempting to assign each observation to at least 5 validators in order to increase the chance that the observation will be reviewed. Here are some characteristics of the observations in the sample from Experiment 1:
Thank you so much in advance if we contact you as a potential validator and you choose to participate. We couldn’t do this experiment without your help and expertise!
Frequently Asked Questions
How exactly are you selecting potential validators?
If an identifier had made at least 3 improving identifications on a taxon, we considered them qualified to validate that taxon. Improving identifications are the first suggestion of a taxon that the community subsequently agrees with.
For example, Identifier 1 adds an ID of Butterflies to an observation. If Identifier 2 later adds a leading ID of Red Admiral, Identifier 1's ID on Butterflies becomes an improving ID. If Identifier 3 later adds a supporting ID to Red Admiral, Identifier 2's ID in Red Admiral becomes an improving ID.
Note, we count both Identifiers 1 and 2 as having 1 improving ID on Butterflies (since Red Admiral is within Butterflies). Only Identifier 2 has an improving ID on Red Admiral.
How will I know if I was selected to be a validator?
You’ll receive an email from iNaturalist titled “Will you help us estimate the accuracy of iNaturalist observations?”
How large are the samples you’re sending to validators?
It varies. For Experiment 1, many validators are only being sent a single observation. No validator is being sent more than 100 observations.
What if I can’t identify an observation in my sample
Please add the finest identification you can add based on the evidence in the observation. Even if it’s ‘Birds’ or even ‘Life’ that’s ok. We won’t learn anything from non-disagreeing identifications that are coarser than the observation taxon but that’s ok. The only thing that will really hurt our assumptions is if you add an incorrect identification.
What if an observation in my subset has no photo or there are other issues like missing locations?
We've excluded observations without media (photos or sound) or with the no votes on the "Evidence of organism" data quality flag from subsamples. Observations with other data quality flags like missing locations may be included. Please do your best to identify them despite the issues.
I don’t want to add confirming identifications to observations that are already research grade.
We realize that this can be undesirable - e.g. some identifiers like to preserve their reputation of not “piling on” etc. But we need new identifications on all observations to estimate accuracy so we appreciate if your help with the assessment by adding a new identification in these cases. If it helps, feel free to mention that you’re participating in this experiment and linking to this blog post in your identification remarks.
What happens if multiple validators of the same observation give different results?
If the multiple validators all agree (i.e. if their identifications are of the same taxon or one is a coarser non-disagreement to another), we will choose the finest taxon as the “correct” answer. If multiple validators disagree (i.e. if their identifications are on different branches or one is a coarser disagreement to another) we will investigate these conflicts on a case-by-case basis to decide how to proceed. We’re hoping these conflicts will be rare.
What if I’m 100% sure of the genus but only 90% sure of the species? Which should I identify it as?
Unfortunately, there’s a degree of subjectivity involved in an identifier's particular comfort level. For example, some identifiers need to be able to see specific characters to feel confident enough to add a specific identification. Others are more comfortable using things like location as constraint (e.g. “I can’t positively rule out other members of this genus based on the photo but there’s only one species that occurs here”). Please identify as fine as you feel comfortable adding, but if you need a rule of thumb, add the finest identification that you think is 99.99% correct. In other words, if you think there’s less than a 0.01% chance that the out of range look-alike could have hitchhiked to the location then it's fine to choose the in range species even if you can’t see the diagnostic character.
What assumptions are you making when you estimate accuracy from this experiment?
We’re assuming that the sample is representative of the entire iNaturalist dataset and that validators do not add incorrect identifications. The larger the sample size, the stronger first assumption becomes. We've only selected validators with at least 3 improving identifications for the respective taxon, but that doesn't mean they never misidentify that taxon. We've added redundancy by attempting to select 5 validators for each sample, but some samples have no qualified validators and we know we wont' get a 100% response rate.
What happens if you don’t get much of the sample reviewed, either because no one participates or because no one can add correct identifications?
If we can’t get an observation in the sample reviewed or the validators can’t add an identification as fine as the previous observation taxon, we will code it as "Uncertain". We aren’t making any assumptions about uncertain observations, but they are increasing the uncertainty in our estimates. The worst case scenario is that we have so many Uncertain observations that the bounds on our accuracy estimates are too broad to be useful (e.g. accuracy estimates with a low of 50% and a high of 100%).
What if I’ve already identified an observation in my subsample?
Please review your old identification and if it is still relevant you can skip it. If you no longer think that your older identification is correct, please add a new identification.
What if your sample size isn’t large enough to get robust estimates
That’s possible. We won’t know until we get a sense for how much participation we get, the portion of Uncertain observations we’re left with, and how the community responds to this pilot. If the response is good, we can increase the sample size in future experiments. This will likely be necessary if we want robust estimates for under-represented regions and or taxa (e.g. African fish).
How will we get to see the results of the experiment?
We’ll post a report summarizing the results of the experiment to the iNaturalist blog a week after the experiment deadline. We'll comment here with a link when it's up.
What instructions are you sending to validators?
We’ve copied them below in case you’re curious (note: instructions will have sub samples tailored to each validator. The 10 here are just meant to serve as an example):
Dear loarie,
Will you help us with a study to estimate the accuracy of iNaturalist observations?
- Identify all observations in the link below as finely as you can, even if they are Research Grade, and even if your finest ID is at a higher taxonomic level (even kingdom).
- If you see the “Potential Disagreement” popup…
- and you are confident it’s mis-ID’d click the orange button,
- or if you’re uncertain beyond your ID, click the green button
- Do this by 2024-01-31
Here is the subset of 10 observations that we think you can identify based on your activity on iNat. Please add the finest ID you can to each of the observations before 2024-01-31.
We’ll calculate accuracy by comparing your ID to the Observation Taxon. You can skip observations where you’ve previously added an ID if that ID is still relevant. For more on how we will count agreements and disagreements, keep reading.
IDs equal to (A) or finer than (B) the Observation Taxon will be counted as Agreements.
IDs on different branches (C) or coarser than the Observation Taxon where you choose “No, but…” to the “Potential Disagreement” dialog (D) will be counted as Disagreements.
IDs coarser than the Observation Taxon where you choose “I don’t know but…” to the “Potential Disagreement” dialog (E) will be counted as Uncertain.
We’re so grateful for your help as an identifier on iNaturalist, and thank you very much for participating in this study. Please read this blog post to answer frequently asked questions about this experiment.
With gratitude,
The iNaturalist Team