Tagged: data mining

Mining eBird Data – Comparing Specie Count Records

Back in January we looked at the specie counts by county from available eBird records.  How does that data compare to other available records?

I’m not sure there is an “official” data set for this information but the best i could come up with comes from the East Cascades Audubon Society (ECAS).  Drilling into their Birding Sites page one will find access to County Checklists.  It is from these checklists that a data set was created.  Thankfully there are only 36 counties in Oregon because stripping data from a .pdf is notoriously difficult. (note to editors: a .csv file would make this data more accessible to data junkies.  Heck, even a .xls(x) file would be preferable.)

The official record keeper for Oregon is the Oregon Field Ornithologists, now Oregon Birding Association.  They do not keep records at the County level that i am aware of, and none of their data is accessible in any meaningful way for our purposes, again, all .pdf files.

The comparison between ECAS records and eBird records (as of January 2013) is presented below.  There are three choropleths; one for ECAS records, one for eBird records and one for the difference between the two,  and one bar graph with all three metrics.  The choropleths use a purple – blue – green scale, low to high, from a scale of 180 (eBird min.) to 410 (ECAS high).  The choropleths were generated using ggplot2 in  ‘R’ and the color scheme is from Color Brewer.  I threw in a fourth map with the County names.

Summary Statistics:

ECAS Records:

  • Min = 249
  • Mean = 318
  • Max = 410

eBird Records

  • Min = 182
  • Mean = 254
  • Max = 349

ECAS – eBird (delta)

  • Min = 17
  • Mean = 64
  • Max = 105

Of course this leads one to speculate on the causes for the discrepancies.

The most obvious is the time frame embedded in the two data sets.  I assume the ECAS records are based on historical records.  Although i don’t see California Condor listed in any of the County records so i’m not sure how far back they go, and no attributing reference is made to the source of the data set that i could find.  eBird is a relatively new data set.  Some effort has been made to enter historic records into eBird but im not sure if Oregon has been part of that effort.  Some existing eBird users have entered their data from the past, but i’m pretty sure that is not universal among users with Oregon data, i know that i haven’t done it.  So there’s that.

The other thing to ponder is the variation in the delta statistic.  I suspect some of the same factors considered back in January are in play:

    1. Number of eBirders in the County
    2. eBirder interest in specific locations within a County
    3. Accessibility of the County
    4. Proximity of the County to large population centers
    5. Interest within a County by “hyper-active” eBirders
    6. Amount of eBird recorded observation time spent vs ECAS
    7. others?

So here’s the data representations:

Mining eBird Data – continued.

Having compiled the weekly observation data to plot the growth of eBird usage in previous posts, we will now look at the composite weekly and monthly time series.

I took the median number of observations submitted to eBird for each week from 2007 through 2012.  I chose the median over the average to dampen the effect growth has on the distribution of this small sample size (n=6).

Below are four graphs; two for the weekly data and two for the monthly data.  The first in each set is a time series plot and the second is a ranking plot in descending order. (left click to enlarge)

I’m not sure what real conclusions can be made but speculations come easy.  I’ll leave those to you and only note, that while May was no surprise, September would not have been my guess for having the lowest number of records submitted.

Tom Auer has been looking at the same data for Rhode Island over on his blog – here.

He makes, what I think, is an important point:

One of the goals of eBird is consistent sampling effort. This is why all checklists matter as long as they’re complete, including the five minute backyard counts in urban areas. If the only checklists that were ever submitted were from birding hotspots at the peak of migration when birders were chasing vagrants, we’d have a very skewed view of the bird world. Ideally, the number of checklists submitted by week would be the same throughout the year.”

So – get out there in September and October.

Also while you’re at it, see if you can’t make it over to Wheeler, Gilliam or Baker Counties, they have the least number of records submitted – I’m pretty sure there are birds there.  In keeping with eBird’s call to “bird the road less traveled” we’ll dive a little deeper into the county data in a future post.