Biodiversity: A Data Science Point of View

Recently I was able to analyze some data from the National Parks Service on endangered species. This project had me using key functions of python to reshape data and plot graphs. Like any good data scientist being able to extract meaning from data is paramount, and as my quest to share knowledge grows, I hope these revelations can help point you to some helpful techniques to incorporate in your own journey.

For this project I imported pandas, and matplotlib as my module tools. Pandas to utilize it’s many functions, and matplotlib for the awesome graphs it provides.

I began by importing the two .cvs files (One for the observations. The second for the information on the species), and checking out the .dtypes. On the species table I decided to rename the ‘conservation_status’ to something simpler: ‘status’. I then looked at the type of statuses available.

I then created 4 series objects — one for each status, and reset the index using .reset_index() to help secure a sequential order. I then made two lists. One with the length of each status, and the second with the corresponding status so I could plug these variables into a pie graph. I noticed that species with a non-endangered status comprised the majority (98%-ish), and left it out so the graph could give me a better look at what species of concern I should have.

Next I wanted to see the category of species held in the table, so I used the .unique() function, and grouped them by their status. Since there were a lot of NaN statuses I filled in their column using the .fillna() function and the inplace=True attribute.

Before I go further I let me explain this attribute. When altering tables such as this using the .fillna() function python wants to create a copy of the table automatically, thus not actually altering the original table, but creating a copy series object. The default is inplace=False, so using inplace=True formats the original table, reducing the need to juggle multiple tables… unless that is your goal.

The less the better in my humble opinion though.

I then group by their statuses, and sorted by scientific_name, and of course reset their index. From there I used a bar plot to visual the conservation status of each species.

Great, now my curiosity is getting the best of me, and I want to know if a species is protected. I’ll use a lambda function and create a column called ‘is_protected’. I’ll then make a series object grouped by ‘category’, and ‘is_protected’. I’ll also add the ‘scientific_name’ column.

The new series object is good, but could be better. Using the .pivot() function I can adjust the table so I can better lay out the data in a more useful way, and make a new column representing the percentage of species protected.

From here I’m going to look closer at what is happening with sheep, and use the observation table. I always liked their fluffy fur, and think it’s funny we wear it, so why not. I’ll create a ‘is_sheep’ column but applying a lambda function on the ‘common_names’ column.

After looking closer I notice there are some non-mammal species with the name ‘Sheep’ in their common name, so I’ll make a series object that filters out non-mammal categories. Next, I’ll use pd.merge() on my new series object with the observation table to make a new table called ‘sheep_observation’.

I can then make another series object that groups by parks, and takes in the sum of observation. This way I can see how many times a sheep was observed at that park. Once I have that it’s one easy step to make a bar plot to visual the results.

Sometimes I feel like python just makes it too easy. With some regular practice using panda methods and pyplot a garbled up table can really reveal an interesting story.

Link on GitHub.




I’m a web developer, and data scientist by hobby. Yes, it can be a hobby. I blog about all things code.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Data Cleaning and EDA on Airbnb Dataset with Python Pandas and Seaborn

Datasets for Machine Learning in Autonomous Vehicles

Creating an Interactive Financial Chart in Python

We need a reset on how we think about the future — even with Biden and Harris in the White House

Data Analysis 2: Analyzing Tabular Data with Pandas

Pivot Tables

The Case of News Aggregators

Optimizing payment conversion rates using contextual multi-armed bandits

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Abel Garrido

Abel Garrido

I’m a web developer, and data scientist by hobby. Yes, it can be a hobby. I blog about all things code.

More from Medium

Data Wrangling Python

Data Type and Structure

Booming Data Science Libraries

Developing the Knowledge and Skills Required to Become a Professional Data Scientist