Some say New York is the best city in the world. Love it or hate it, NYC certainly doesn't go by unnoticed. Can we use data to shine light on your favorite (or least favorite) thing about NYC? We certainly can! So much open data available, so many interesting problems, so much you can contribute to!
Are you interested in the inner workings of the transit system? Ever wonder how many people enter and exit the subway stations every day? Or when and where delays are most likely to happen? Then this is the perfect data set for you! MTA Open Data has everything from daily turnstyle data to lost and found item drop-off and pick-up details.Learn More
How did Uber and Lyft influence the market in NYC? How do tips scale depending on pickup time, trip duration or trip length? Or, most importantly, are there really no cabs available when it’s raining in the city? Use this dataset to search through various providers (yellow and green cabs, Uber, Lyft or other Vehicle For-Hire), pick-up and drop-off times, locations, trip distances, itemized fares and payment types.Learn More
Dive into this data set if you’re looking to add extra information to your analysis. For example, if you wonder if people from wealthier neighborhoods use cabs more often - look no more!
Want to know where the rats congregate? Or which areas of the city are the noisiest? Then check out this data set which lists all complaints made through 311 since 2010. Among other things, you can track changing trends or complaint response rates. Are rates in some neighborhoods dealt with faster than in others?Learn More
What happens when you go to a hospital in NY state? Will you receive the same care? Will you be equally likely to get readmitted within 30 days? Let’s look at hospital discharge data from 2009 to 2013 to learn more.Learn More
Hot off the press, Uber just released anonymized data from over 2 billion trips.Learn More
Sunny day, 60 F, light summer breeze, what better than to hack to Citi Bike data set and discover new things about what/when/where New Yorkers like to ride?Learn More
The Article Search API contains over 2.8 million articles from 1981 until today. How do these related to urban planning and open data? We're about to find out. The data comes in hard to access formats, so it's perfect if you're interested in scraping and data cleaning.Learn More
We have made the data being used by the visualization and modeling tracks easily accessible. Documentation of the data sets and how to access and download them can be found in our wiki
. SQL, Python, and R examples of accessing the data can be found here.
We have stored all the data on Google Cloud and will be using Google's BigQuery platform to access and download the data to our local machines using SQL commands.
Access through Python
Git / Github
We’ll be using git and github throughout the hackathon to keep your work in one central place. Make sure you’ve got git installed and that you’re familiar with the basics of git including basic git commands (think: status, add, commit, push, pull).
Please make sure that python or R (or both) are installed on your machine. Without, the day will be much less fun.
Mac OS X
[Easy] One python distribution with all the packages, even for Windows: Anaconda
If you want to use R, and you don't have R Studio installed yet, have a look:
|Friday, March 24th|
|6:30pm - 7:00pm||Snacks and Registration|
|7:00pm - 8:00pm||Opening remarks
Presentation: Quick tips for visualization of spatial data (in CARTO) by Ekaterina Levitskaya
|8:00pm - 9:00pm||Break up into teams to set up systems and pick problems|
|9:00pm - 9:30pm||Wrap up with Snacks and Drinks|
|Saturday, March 25th|
|9:30am - 10:00am||Breakfast and Registration|
|10:00am - 11:00am||Presentation: Crowd Sourced Performance Evaluation of Urban Drainage Infrastructure by Sina Kashuk
Flooding, especially in cities pose challenges for modeling and prediction of the drainage infrastructure performance due to lack of spatially resolved performance data, which is particularly acute for older, legacy infrastructure. We used five years of NYC311 calls for identifying the areas of the city most prone to sewer back up flooding and presented a novel algorithm for calculating the spatial distribution of flooding complaints across NYC’s five boroughs.
Sina Kashuk is a Data Scientist at DataKind and an alumnus of the Insight Data Science Fellowship program, with experience in remote sensing and machine learning, as well as computer vision, time series and geospatial analysis. Although he started his data science career coding on his Commodore 64 at age 7, his Ph.D. is in Civil Engineering from NYU School of Engineering.
|11:00am - 1:00pm||Work in teams|
|1:00pm - 1:30pm||Lunch|
|1:30pm - 2:00pm||
Presentation: Using Public Data To Expand Awareness and Make a Difference by Maureen Teyssier
Enigma Public is a the largest and most diverse repository of formatted public data around. I'll show a sample of projects that spread awareness of the world we live in, and even one project that has saved lives. Enigma public datasets are available online for noncommercial use- this is an unbeatable playground for data scientists.
Maureen Teyssier is a Senior Data Scientist at Enigma and holds a PhD in Computational Astrophysics from Columbia. At Enigma, she helps organizations and individuals fuse, organize, and explore data to make smarter decisions
|2:00pm - 6:00pm||Work in teams|
|6:00pm - 7:00pm||Sharing your results: Teams show off cool algorithms they used, plots they made, and results they found|
|7:00pm - 7:30pm||Snacks and drinks|
Eszter Schoell is a Learning Scientist at O’Reilly Media. She received her Ph.D. in Cognitive Neuroscience from the University of Hamburg, Germany. After building up a graduate school in Hamburg, Germany and doing a stint as a research project manager, she realized curating quality content to share knowledge is her calling and joined O’Reilly Media. She now thoroughly enjoys being on the production side again.
I am an astrophysicist turned data-scientist. I work and teach at the NYU Center for Urban Science and Progress and at the NYU Center for Cosmology and Particle Physics. I study time series of light. In the urban environment city lights enable sociological, ecological, and economical inference, while in astronomy my research focuses on cosmic explosions. I also have a second job as a professional boxer.
Iva is a Data Scientist at Macy’s where her role is to analyze customer behavior, pricing optimization, and the competitive landscape. Before joining Macy’s, Iva earned her Ph.D. in applied math from Columbia University. Her thesis focused on studying how waves (such as light) propagate through microstructures (such as fiber optic cables).
Laurence is a Senior Bioinformatics Analyst at the New York Genome Center. She is working on breast cancer and fighting the disease with Next-Generation Sequencing and Data Science. Before turning to bioinformatics, she did a Bachelor and Master degree in applied mathematics and a PhD in statistics at EPFL in Switzerland.
Sinziana recently joined Chase's Digital Intelligence team as a data scientist. Previously, she worked as a quantitative modeler in risk management under JPMorgan Chase's Consumer and Community Bank. Sinziana holds a Ph.D. in mathematics from the Courant Institute of Mathematical Sciences at New York University.
Ty is a Teaching Assistant at K2 Data Science. Previously, he worked as a data science and software engineering consultant. He started his career in distressed private equity.
Zuzanna is a fresh Ph.D. in Experimental Psychology from NYU and currently a Fellow at Insight Data Science. She's passionate about OpenData, fixing the leaky pipeline in STEM and hackathons. Her favorite data set? NYC Trees🌳 !
Through television and digital media platforms, Viacom and its brands connect with kids, youth and adults. MTV is the cultural home of the Millennial Generation. Nickelodeon is the number one brand for kids. Comedy Central is the number one brand in comedy. BET is the leading provider of content for African-Americans and all who celebrate black culture. Each of the Viacom Media Networks brands develops original content based on the deep insights and connections we cultivate with our fan base.
Bloomberg technology helps drive the world’s financial markets. We provide communications platforms, data, analytics, trading platforms, news and information for the world’s leading financial market participants. We deliver through our unrivaled software, digital platforms, mobile applications and state of the art hardware developed by Bloomberg technologists for Bloomberg customers.
MLconf was created to host the thought leaders in Machine Learning and AI to discuss their most recent experience with applying techniques, tools, algorithms and methodologies to problems that occur when dealing with massive and noisy data. MLconf is independent of any outside company or university – it’s simply a conference organized to gather the Machine Learning communities in various cities to share knowledge and create an environment for the community to coalesce.
CUSP is an NYU university-wide center whose research and education programs are focused on urban informatics. Using NYC as its lab, and building from its home in the NYU Tandon School of Engineering, it integrates and applies NYU strengths in the natural, data, and social sciences to understand and improve cities throughout the world. CUSP offers a one-year MS degree in Applied Urban Science & Informatics.
Stack Overflow is a question and answer site for professional and enthusiast programmers.
by Federica Bianco, Zuzanna Kłyszejko and Sinziana Eckner
An incredible amount of data about NYC is openly available. And we all know that magic happens when smart creative people get their hands on data! This hack-a-thon is designed to help us better understand the city we live in and create solutions to improve equitability, productivity, resilience, and sustainability.Read More
by Sinziana Eckner
We hacked, made new friends, learned how to use Carto and geospatial libraries, and made some awesome looking maps that exposed great insights into various aspects of NYC open data! Check out this post for descriptions of the work that was done and pictures of the event.Read More
by Friederike Schüür and Iva Horel
We ‹3 hackathons! People coming together using their skills for a good cause? That’s awesome! Right? Well, yes! Yes, it is. But implementation can ruin, or save, the best of ideas - and Hackathons are a great idea. So let’s talk about implementation.Read More