After several discussions between PAWS and our wonderful Code for Philly volunteer team, coding is underway! The work is focused on building the initial infrastructure to host the data pipeline (setting up containers for the data processing scripts and for the database), as well as processing scripts to ingest data from 3 sources (donors, volunteers and adopters) and fuzzy-match them across the datasets. Additionally, we are setting up proof-of-concept APIs to allow end users to upload and download datasets. The current project focus and tasks can be found here.
PAWS Data Pipeline
WHO IS PAWS - As the city's largest animal rescue partner and no-kill animal shelter, PAWS is working to make Philadelphia a place where every healthy and treatable pet is guaranteed a home. Since inception over 10 years ago, PAWS has rescued and placed 27,000+ animals in adoptive and foster homes, and has worked to prevent pet homelessness by providing 86,000+ low-cost spay/neuter services and affordable vet care to 227,000+ clinic patients. Just in 2018, 3,584 animals were rescued and 36,871 clinic patients were served. PAWS is funded 100% through donations, with 91 cents of every dollar collected going directly to the animals. Therefore, PAWS' rescue work (including 3 shelters and all rescue and animal care programs), administration and development efforts are coordinated by only about 70 staff members complemented by over 1500 volunteers.
DATA IS UNDERUTILIZED - Through this chain of operational and service activities, PAWS accumulates data regarding donations, adoptions, fosters, volunteers, merchandise sales, event attendees (to name a few), each in their own system and/or manual (Google Sheet) tally. This vital data that can drive insights remains siloed and is usually difficult to extract, manipulate, and analyze. Taking all of this data, making is readily available, and drawing inferences through analysis can drive many benefits: PAWS operations can be better informed and use data-driven decisions to guide programs and maximize effectiveness; supporters can be further engaged by suggesting additional opportunities for involvement based upon pattern analysis; multi-dimensional supporters can be consistently (and accurately) acknowledged for all the ways they support PAWS (i.e. a volunteer who donates and also fosters kittens), not to mention opportunities to further tap the potential of these enthusiastic supporters. And there are bound to be more leverage points as we get further into this project!
PROJECT MISSION - This project seeks to provide PAWS with an easy-to-use and easy-to-support tool to extract data from multiple source systems, confirm accuracy and appropriateness and process data where necessary (a data hygiene and wrangling step), and then load relevant data into one or more repositories to facilitate (1) a highly-accurate and rich 360-degree view of PAWS constituents (Salesforce is a likely candidate target system; already in use at PAWS) and (2) flexible ongoing data analysis and insights discovery (e.g. a data warehouse).
UPDATE FROM OCT 22 MEETUP
paws_data_pipeline is off to a great start! Thanks to all those who came out Tuesday night. Great contributions across the board.
THE TEAM AND WHAT WE NEED
We had great representation from people who can make it happen. We are in need of one or a few program manager / project manager / project lead type persons who love to keep things organized! Karla and Chris will be involved every step along the way but lack the bandwidth to do week by week, task by task planning and tracking. CAN YOU HELP HERE??? IF SO… PLEASE JOIN THE SLACK CHANNEL AND MAKE YOURSELF KNOWN!
WHAT WE DID
After the kickoff presentation and some Q&A, we broke into two groups. One group focused on ideating around data extraction and creation of the data lake. The other group looked ahead to data cleansing, matching, validating, and linking it back in to systems-of-record such as Salesforce. Key points from each discussion are below.
Data Lake - Identified our source systems, discussed extraction methods briefly, and considered data lake architecture and construction. Next steps are to dig into one or two data sources, figure out extraction methods (APIs are not widely available although .csv exports are), and inventory records, structure, and elements for movement into the data lake. Test out the process.
Cleansing/Matching etc - Discussed the options for matching data across sources and making sense of the data from multiple systems (eg estimating complexity of cleaning process required and potential data architecture challenges). Next steps are to 1) make a few data sources available (probably on the volunteers/donors integration stream) so the team can understand what variables exist and in what format, and 2) clarify data privacy issues for doing this work. Additionally, PAWS staff will assess whether there are additional data structures to be created (e.g. labels) as an alternative to the currently unstructured text notes.
We will plan a meet for (we think) next week, Tuesday. Or it might be on November 5. Details on the next scheduled hack night are coming. We will communicate it through the Slack channel #paws_data_pipeline.
Some people requested remote connection, perhaps doing remote connections each week and then in-person every other week or once a month. This is still TBD. Thank you for the suggestions.
Stay tuned to the Slack channel for more!
AND… IF YOU ARE NOT INVOLVED YET AND WANT TO BE…. PLEASE JOIN UP!!!!
Chris & Karla