Smart Cities Hack

March 25th, 2017

Some say New York is the best city in the world. Love it or hate it, NYC certainly doesn't go by unnoticed. Can we use data to shine light on your favorite (or least favorite) thing about NYC? We certainly can! So much open data available, so many interesting problems, so much you can contribute to!

  • Learn: Learn about transportation and how city agencies operate by using openly available data sets.
  • Improve: Combine MTA, Uber and taxi data to explore traffic, parse 311 complaints and much more.
  • Collaborate: Work with others to achieve a common goal. Learn from the person next to you. Let's make our city even better!

MTA Open Data

Are you interested in the inner workings of the transit system? Ever wonder how many people enter and exit the subway stations every day? Or when and where delays are most likely to happen? Then this is the perfect data set for you! MTA Open Data has everything from daily turnstyle data to lost and found item drop-off and pick-up details.

Learn More

TLC Trip Data

How did Uber and Lyft influence the market in NYC? How do tips scale depending on pickup time, trip duration or trip length? Or, most importantly, are there really no cabs available when it’s raining in the city? Use this dataset to search through various providers (yellow and green cabs, Uber, Lyft or other Vehicle For-Hire), pick-up and drop-off times, locations, trip distances, itemized fares and payment types.

Learn More

Census Data

Dive into this data set if you’re looking to add extra information to your analysis. For example, if you wonder if people from wealthier neighborhoods use cabs more often - look no more!

311 Complaint Data

Want to know where the rats congregate? Or which areas of the city are the noisiest? Then check out this data set which lists all complaints made through 311 since 2010. Among other things, you can track changing trends or complaint response rates. Are rates in some neighborhoods dealt with faster than in others?

Learn More


What happens when you go to a hospital in NY state? Will you receive the same care? Will you be equally likely to get readmitted within 30 days? Let’s look at hospital discharge data from 2009 to 2013 to learn more.

Learn More


Hot off the press, Uber just released anonymized data from over 2 billion trips.

Learn More

Citi Bike

Sunny day, 60 F, light summer breeze, what better than to hack to Citi Bike data set and discover new things about what/when/where New Yorkers like to ride?

Learn More

New York Times Article Search

The Article Search API contains over 2.8 million articles from 1981 until today. How do these related to urban planning and open data? We're about to find out. The data comes in hard to access formats, so it's perfect if you're interested in scraping and data cleaning.

Learn More

We have made the data being used by the visualization and modeling tracks easily accessible. Documentation of the data sets and how to access and download them can be found in our wiki


. SQL, Python, and R examples of accessing the data can be found here.

Data Access Through BigQuery

We have stored all the data on Google Cloud and will be using Google's BigQuery platform to access and download the data to our local machines using SQL commands.

  • Access through Python​

  • Access through R

Git / Github

We’ll be using git and github throughout the hackathon to keep your work in one central place. Make sure you’ve got git installed and that you’re familiar with the basics of git including basic git commands (think: status, add, commit, push, pull).


Please make sure that python or R (or both) are installed on your machine. Without, the day will be much less fun.

  • Mac OS X

  • Windows

    • [Easy] One python distribution with all the packages, even for Windows: Anaconda

  • Linux

    • [Easy] One python distribution with all the packages: Anaconda

    • [Intermediate] Use the command line

If you want to use R, and you don't have R Studio installed yet, have a look:


                           Friday, March 24th
  6:30pm - 7:00pm      Snacks and Registration
  7:00pm - 8:00pm   Opening remarks
Presentation: Quick tips for visualization of spatial data (in CARTO) by Ekaterina Levitskaya
  8:00pm - 9:00pm   Break up into teams to set up systems and pick problems
  9:00pm - 9:30pm   Wrap up with Snacks and Drinks
                        Saturday, March 25th
   9:30am - 10:00am   Breakfast and Registration
  10:00am - 11:00am   Presentation: Crowd Sourced Performance Evaluation of Urban Drainage Infrastructure by Sina Kashuk

Flooding, especially in cities pose challenges for modeling and prediction of the drainage infrastructure performance due to lack of spatially resolved performance data, which is particularly acute for older, legacy infrastructure. We used five years of NYC311 calls for identifying the areas of the city most prone to sewer back up flooding and presented a novel algorithm for calculating the spatial distribution of flooding complaints across NYC’s five boroughs.

Sina Kashuk is a Data Scientist at DataKind and an alumnus of the Insight Data Science Fellowship program, with experience in remote sensing and machine learning, as well as computer vision, time series and geospatial analysis.  Although he started his data science career coding on his Commodore 64 at age 7, his Ph.D. is in Civil Engineering from NYU School of Engineering.
  11:00am - 1:00pm   Work in teams
  1:00pm - 1:30pm   Lunch
  1:30pm - 2:00pm  

Presentation: Using Public Data To Expand Awareness and Make a Difference by Maureen Teyssier

Enigma Public is a the largest and most diverse repository of formatted public data around.  I'll show a sample of projects that spread awareness of the world we live in, and even one project that has saved lives.  Enigma public datasets are available online for noncommercial use- this is an unbeatable playground for data scientists.  

Maureen Teyssier is a Senior Data Scientist at Enigma and holds a PhD in Computational Astrophysics from Columbia. At Enigma, she helps organizations and individuals fuse, organize, and explore data to make smarter decisions

  2:00pm - 6:00pm   Work in teams
  6:00pm - 7:00pm   Sharing your results: Teams show off cool algorithms they used, plots they made, and results they found
  7:00pm - 7:30pm   Snacks and drinks

Eszter D. Schoell

Learning Scientist

Eszter Schoell is a Learning Scientist at O’Reilly Media. She received her Ph.D. in Cognitive Neuroscience from the University of Hamburg, Germany. After building up a graduate school in Hamburg, Germany and doing a stint as a research project manager, she realized curating quality content to share knowledge is her calling and joined O’Reilly Media. She now thoroughly enjoys being on the production side again.

Federica Bianco

Research Scientist

I am an astrophysicist turned data-scientist. I work and teach at the​ NYU​ ​Center for Urban Science and Progress and​ ​at the NYU Center for Cosmology and Particle Physics. I study time series of light.​ I​n the urban environment​ city lights​ enable sociological, ecological, ​and ​economical inference,​ while in​ astronomy​ ​my research focuses on cosmic explosions. I also have a second job as a professional boxer.

Iva Horel

Data Scientist

Iva is a Data Scientist at Macy’s where her role is to analyze customer behavior, pricing optimization, and the competitive landscape. Before joining Macy’s, Iva earned her Ph.D. in applied math from Columbia University. Her thesis focused on studying how waves (such as light) propagate through microstructures (such as fiber optic cables).

Laurence de Torrente

Senior Bioinformatics Analyst

Laurence is a Senior Bioinformatics Analyst at the New York Genome Center. She is working on breast cancer and fighting the disease with Next-Generation Sequencing and Data Science. Before turning to bioinformatics, she did a Bachelor and Master degree in applied mathematics and a PhD in statistics at EPFL in Switzerland.

Sinziana Eckner

Data Scientist

Sinziana recently joined Chase's Digital Intelligence team as a data scientist. Previously, she worked as a quantitative modeler in risk management under JPMorgan Chase's Consumer and Community Bank. Sinziana holds a Ph.D. in mathematics from the Courant Institute of Mathematical Sciences at New York University.

Ty Shaikh


Ty is a Teaching Assistant at K2 Data Science. Previously, he worked as a data science and software engineering consultant. He started his career in distressed private equity.

Zuzanna Kłyszejko

Data Scientist

Zuzanna is a fresh Ph.D. in Experimental Psychology from NYU and currently a Fellow at Insight Data Science. She's passionate about OpenData, fixing the leaky pipeline in STEM and hackathons. Her favorite data set? NYC Trees🌳 !


Viacom 400

Through television and digital media platforms, Viacom and its brands connect with kids, youth and adults. MTV is the cultural home of the Millennial Generation. Nickelodeon is the number one brand for kids. Comedy Central is the number one brand in comedy.  BET is the leading provider of content for African-Americans and all who celebrate black culture. Each of the Viacom Media Networks brands develops original content based on the deep insights and connections we cultivate with our fan base.

Bloomberg logo large

Bloomberg technology helps drive the world’s financial markets. We provide communications platforms, data, analytics, trading platforms, news and information for the world’s leading financial market participants. We deliver through our unrivaled software, digital platforms, mobile applications and state of the art hardware developed by Bloomberg technologists for Bloomberg customers.


Ml conf avatar

MLconf was created to host the thought leaders in Machine Learning and AI to discuss their most recent experience with applying techniques, tools, algorithms and methodologies to problems that occur when dealing with massive and noisy data. MLconf is independent of any outside company or university – it’s simply a conference organized to gather the Machine Learning communities in various cities to share knowledge and create an environment for the community to coalesce.


CUSP is an NYU university-wide center whose research and education programs are focused on urban informatics.  Using NYC as its lab, and building from its home in the NYU Tandon School of Engineering, it integrates and applies NYU strengths in the natural, data, and social sciences to understand and improve cities throughout the world.  CUSP offers a one-year MS degree in Applied Urban Science & Informatics.



Stack Overflow is a question and answer site for professional and enthusiast programmers.

Urban Science

by Federica Bianco, Zuzanna Kłyszejko and Sinziana Eckner


An incredible amount of data about NYC is openly available. And we all know that magic happens when smart creative people get their hands on data! This hack-a-thon is designed to help us better understand the city we live in and create solutions to improve equitability, productivity, resilience, and sustainability.

Read More

Smart Cities Hack Event Summary

by Sinziana Eckner


We hacked, made new friends, learned how to use Carto and geospatial libraries, and made some awesome looking maps that exposed great insights into various aspects of NYC open data! Check out this post for descriptions of the work that was done and pictures of the event.

Read More

Our 3 Tracks

by Friederike Schüür and Iva Horel


We ‹3 hackathons! People coming together using their skills for a good cause? That’s awesome! Right? Well, yes! Yes, it is. But implementation can ruin, or save, the best of ideas - and Hackathons are a great idea. So let’s talk about implementation.

Read More

You should absolutely apply! Regardless of gender.
No, the goal of this event is to help you improve your skills. As a foundation, it would be great if you knew a little python or R.
As part of this event, you’ll team up with 3-4 other data enthusiasts to answer a specific question and perform one step of the data analysis process (data wrangling, descriptive statistics, or modeling). We will have domain experts at the event who will help you with any conceptual or data related questions you may have. Your teammates will have a variety of skills, so this is a great opportunity to learn and share what you know.
We will have experts in SQL, python, R, and healthcare on hand to help.
We want to get to know you, your interests, and what you would like to learn from this event. In particular, we have broken down the data analysis process into three steps: (1) data wrangling, (2) descriptive statistics, and (3) modeling. You only need to answer the questions (1-3) that correspond to the step(s) you would like to work on during the event.
The Friday night before the event, we will host a kick-off session. You will meet your team and set up your computer with the tools you’ll need to access and analyze the data. It is important you attend this session to make sure that you are prepared for the Saturday event.
Stack Overflow is generously hosting us. Their address is 110 William St, 28th floor, New York, NY.
We will respond to all applicants. Some weeks responses may be slower but you will hear back from us.