Our first new Computer Vision Model (v2.10) for 2024 including 1,599 new taxa

We released a new computer vision model today. It has 83,622 taxa up from 82,023. This new model (v2.1-) was trained on data exported on November 26th.

Here's a graph of the models release schedule since early 2022 (segments extend from data export date to model release date) and how the number of species included in each model has increased over time.

Here is a sample of new species added to v2.10:

Posted on 2024년 01월 04일, 20시 19분 43초 UTC by loarie loarie

댓글

For what reasons may a taxon with around 100 observations be excluded?

Posted by eleodesthermopolis 약 1달전

No RG observations?
Zie header 'we changed a few things about how we generate training data'
https://www.inaturalist.org/blog/63931-the-latest-computer-vision-model-updates

Posted by optilete 약 1달전

In my case there are RG observations, the lowest being around 30. And I do believe that at least two had over 100 at the start of December.

Posted by eleodesthermopolis 약 1달전

If shot by the same observer or with the same camera/phone, there needs to be diversity.

Posted by marina_gorbunova 약 1달전

The inclusion threshold appears to be more closely linked to the number of images than the number of observations, perhaps specifically the number of images associated with verifiable (or potentially verifiable?) observations. Answers from iNat staff have been a little imprecise on this topic, but I get the impression that a taxon needs 200+ photos to become a candidate for CV model training.

Posted by rupertclayton 약 1달전

@rupertclayton minimum threshold excluding other factors like eg Marina mentioned is 100 photos

Posted by thebeachcomber 약 1달전

I've also read that it's 100 photos, and that did previously appear to be the case, but for the current iteration and previous several, spot checking suggests to me that a larger number of photos seems to be required.

Here is the newly added plant with fewest observations (60):

https://www.inaturalist.org/taxa/284587-Aristolochia-nelsonii

And this seems to currently have 209 photos. Here's another one with 65 observations:

https://www.inaturalist.org/taxa/587108-Heliophila-lactea/browse_photos

That one has 164 photos. Conversely, several species that I've worked to add identifications for have not been added to the CV model even when they get to 120 photos. And those certainly had a very varied set of observers. I guess I'm just saying that 100 photos is not the reliable criterion that I interpreted it to be!

Posted by rupertclayton 약 1달전

one thing to note is that current number of photos is not equal to number of photos when the latest model began training. In this case the data was exported on November 26th last year; how many photos did those two taxa have at that point in time? And indeed, how many photos did they have when the previous model was trained? Would be interesting to see the numbers from then

Posted by thebeachcomber 약 1달전

A nice haul from southern Africa. 86 species of plants, although the tail end suggests 13 of these (6 if filtered for verifiable only) are garden plants or escapes.
These range from 277 observations (Solanum lichtensteinii) to 50 observations (Fenestraria rhopalophylla) - with 2 species over 200 observations, 2 over 150 observations, 6 over 100 observations, 14 over 80 observations, 19 over 70 observations, 22 over 60 observations, 4 over 50 observations. A real tail-ender of 20 observations for southern Africa was augmented by 91 from mainly West Africa.
Interpreting this is complicated by the cut-off being during the Great Southern Bioblitz (November 24 - 27; https://www.inaturalist.org/projects/great-southern-bioblitz-2023-southern-africa-umbrella) where lots of observations were made, many of which were not identified until the following week.

My two questions are:
-1. When are subspecies and varieties going to be included? Because the CV does not include these, these species are seldom identified further, despite these being crucial for taxonomy, conservation planning, red listing and environmental impact assessments.
We have 135 plant species (over 250 taxa) in southern Africa with over 200 observations identified at the subspecies and variety level, which only get ID'd at the species level (https://www.inaturalist.org/observations?hrank=subspecies&lrank=variety&place_id=113055&subview=map&verifiable=any&view=species&iconic_taxa=Plantae). OK, some of these only have one taxon in southern Africa - but some have half a dozen - although in some cases ~90% of observations is for one taxon, but the total is 1,845 plant species with RG subspecific IDs. Having the CV help suggest these would be a big boon to getting identifications done.
-2. I understand the issues with hybrids being intermediate and confusing the CV. However, is it possible to nominate specific ones that do not? So Safari Sunset has 1,442 observations - it is utterly distinctive from its parents Leucadendron laureolum and salignum - but the CV does not even ID most of them as even Leucadendron, coming up with really spurious IDs in some cases. Almost all observations of this are cultivated, but it is the most planted cultivar by far, and Americans (mostly Californians - over 600 observations [way above the 100 cutoff!!]) insist on posting it on iNaturalist and getting the incorrect IDs.

Posted by tonyrebelo 약 1달전

@thebeachcomber: You're correct that the number of photos as of the data export date is important. But now that the model training takes only about 6 weeks, this doesn't vary much. For the two taxa I cited, all the observations were added before October 31, 2023. If we assume that adding photos to existing observations is an insignificant factor, then the number of photos eligible for training model 2.1 was probably the same as today (209 and 164). The cutoff date for CV model 2.9 was October 15, 2023. As best I can tell, those two taxa had 208 and 159 photos respectively at that date. So, I'm still puzzled as to the inclusion criteria...

@tonyrebelo: I wholeheartedly support your proposal to make infrataxa and hybrids eligible to be included in CV.

I know that there were problems with how CV suggested some hybrid bird taxa, but I think these special cases don't justify a blanket exclusion of all hybrids. Many cultivated plants (and therefore, quite a lot of invasive species) are hybrid taxa, e.g. Crocosmia × crocosmiiflora. Lots of people upload photos of these hybrid plants in cultivation or growing wild. CV doesn't have the option to suggest the hybrid, so most observers select another species. Fixing this requires attention from at least one knowledgeable identifier, and in the many cases where the observer doesn't respond, it requires three opposing votes. So, there's a huge amount of work involved to ensure that iNat (and GBIF exports) have accurate data and it could mostly be prevented if the CV model was allowed to include hybrid taxa. How about a "Hybrid exclusion list"? This could be updated by curators or staff as appropriate and would define branches of the taxonomy below which hybrids would not be eligible for CV. So, adding "Aves" (taxon_id=3) would prevent CV from considering any hybrid bird taxa.

As to infrataxa, I think iNat is missing a real scientific opportunity. I think we're all aware that today's subspecies is tomorrow's species and vice versa. Taxonomists put a lot of work into aligning names with the various species concepts, but new data and perspectives mean that revisions are constant. No problem there, except that iNat makes infrataxa much less prominent for observers and identifiers, especially through the CV engine. Certainly, varieties and subspecies are often distinguished by small details, and there are lots of observations where the relevant details are not visible. But there are plenty of infrataxa that can be reliably distinguished in iNat observations and quite a few identifiers willing to do that. An identifier who helps apply distinguish two similar species with a few hundred observations can expect that CV will pick up the difference and suggest better IDs to future observers. The same exercise for subspecies is a perennial fight against new CV suggestions.

Posted by rupertclayton 약 1달전

@loarie, Thanks a lot once again for this data log.

Posted by apseregin 약 1달전

I also wonder about the requirements for the CV to 'unlearn' a taxon.
After cleaning up, one species now has 86 observations, with 30 of them RG.
Curious whether this will drop out in the next round

Posted by carnifex 약 1달전

@rupertclayton I like the idea of an exclusion list. It could also apply to species (or organisms at any taxonomic level) that can't be identified to photos. It would greatly reduce the problems that Computer Vision can occasionally cause. I don't think I've seen a Feature Request for this on the forum, but if someone were to propose it, I'd vote for it

Posted by deboas 약 1달전

Always like checking the newly included species . Thanks again for the info!

Posted by ajott 약 1달전

I really enjoy browsing the newly added species and looking at what I might have contributed to, thanks for including it.

Posted by brnhn 약 1달전

The latest model seems to no longer recognise/suggest Melitaea athalia as being "seen nearby" in Kent, UK.
https://www.inaturalist.org/taxa/132875-Melitaea-athalia

Posted by bsteer 14일전

@bsteer: iNat failing to suggest Melitaea athalia as "expected nearby" is probably an artefact of the iNaturalist geomodel, rather than something specifically caused by the computer vision model. But you're right that the thresholded geomodel is not currently predicting Melitaea athalia in eastern Kent despite there being 91 iNaturalist observations in the Canterbury area.

@loarie: Is this discrepancy something you can feed into the process of tweaking future versions of the geomodel?

Posted by rupertclayton 14일전

Thanks – I remember seeing that and did wonder if that might alternatively the cause. Hopefully there is some means by which we can feedback to correct the system.

Posted by bsteer 13일전

February update coming soon?

Posted by dianastuder 1일전

yes - v2.11 will be out this coming week or the following week

Posted by loarie 1일전

awesome

Posted by apseregin 약 24시간전

댓글 추가

로그인 또는 가입하기 to add comments