# Waze data for urban interventions - Part II

This is the Part II of a series of three posts. Check out Part I and Part III if you haven’t done so yet.

## Data Wrangling

Waze provides the data as a JSON file with dozens of jam objects. As described in Part I, each one of these objects contains information on the speed, length and other features of that particular jam line.

The problem, however, is that this is not how Municipalities manage their traffic system. It doesn’t help us to know that there is a traffic jam on Street X, if this street is 10km long. The traffic problem could be anywhere on this street and the information no longer helps the traffic engineering team. And yes, Waze does provide the exact coordinates of the jam line, but we need to find a meaningful way of summarizing this data to generate a standardized calculation unit that can allow us to build more sophisticated models.

In Joinville, in particular, our Geographical Information System divide the street network into street sections, which are subsections of streets that sit between two subsequent corners (see illustration in Figure 1). Our problem, then, is: how can we normalize Waze’s traffic information to this standardized traffic network unit? ## Enter Shapely and Geopandas

Before we tackle how we solved the problem, let’s digress a little to explain some basic concepts of GIS programming. In a nutshell, GIS is a way of representing maps through a complex combination of geometric objects containing extra information (attributes): What we want to do here is to convert Waze’s jam lines into a GIS representation, i.e. geometries that can be manipulated with operations such as intersects, contains, is contained, touches and etc..

In Python, we use the shapely library to achieve that. Furthermore, we want to tabulate the data so each jam line has its own row in a tidy dataframe - the Pandas library is here to help us on that. Finally, we wanna be able to perform Pandas’ dataframe JOIN operations using geometrical relations instead of exact matches, such as matching every row in one table whose geometry intersects with a row from another table. Or matching every row in one table whose geometry is entirely contained by the geometry of another table’s row. To achieve this, we use the GeoPandas library that among other things supports a GeoDataFrame object, which merges together the functionalities of boths Pandas dataframes and Shapely objects. So there it is! With this stack we’re finally able to accomplish what we wanted to do in the first place: cross-reference Waze’s data with the Municipality’s standard street grid. When this is accomplished, it doesn’t take much more than a GROUP BY operation for us to summarize the data and come up with a ranking of the most critical streets: ## The implementation

The first technical challenge to the proper manipulation of Waze’s data is the infrastructure to capture it, store it and enable its simple and fast retrieval. This article will assume that the data is already sitting on a database but read Post III of this series for more information on how we set that up.

To achieve the cross-referencing of data mentioned in the previous topic, we need to perform the following steps (the code for all functions below can be found here):

1. Read from the database into a Pandas Dataframe (function `extract_df_jams()`);
2. Transform the dataframe into a georeferenced GeoPandas Dataframe (function `transform_geo_jams()`);
3. Read the GIS information from a CSV file into a Pandas Dataframe (function (`wkt_to_df()`));
4. Transform the GIS dataframe into a georeferenced GeoPandas Dataframe, using the same coordinate system as Step 2 (function `transform_geo_sections()`);
5. Run a Spatial Join between the two GeoDataframes (function `allocate_jams()`). More information on this step below.

This last step is a little tricky. It’s called allocate_jams because we are essentially determining which street sections are being impacted by a particular jam or, in other words, allocating the jams to a number of streets they somehow interfere. But how do you do that?

The first obvious answer is imagining a long street section with a short traffic jam within it. In this case, the jam would be allocated to a particular street if the jam line geometry is fully contained by the street section geometry, as illustrated by the figure below: The above illustrated operation is then applied to every street section and every jam line through a Spatial Join, implemented using the code below:

``````import geopandas as gpd
# "jams" is the GeoDataFrame built in Step 2
# "network" is the GeoDataFrame of street sections built in Step 4
gpd.sjoin(jams, network, how="left", op="contained")
``````

There is the case, however, when the jam is longer than any one single street section. In this case, the jam will be allocated to all the street sections that it fully contains: ``````gpd.sjoin(jams, network, how="left", op="contains")
``````

In the above case, for simplicity, we are deliberately ignoring the fact that, in most cases, the beginning and end of a jam will only partially intersect a street section.

Finally, we’re still left with a number of jams that do not match either of the above described conditions. In these cases, we need to capture all street sections that intersect with the jam line, but ignore crossings and perpendicular streets. We can achieve this through the following steps:

1. Get the major direction of the jam line: North, South, East or West;
2. Get the major direction of each street section;
3. Maintain only the intersections in which the jam line and the street section have the same direction.
``````allocated_jams = gpd.sjoin(unallocated_jams, network, how="inner", op="intersects")
allocated_jams["match_directions"] = merge_3.direction_jams == merge_3.direction_net
merge_3 = merge_3[merge_3["match_directions"]] #delete perpendicular streets
``````

The implementation details of how to get the directions are beyond the scope of this post but are all documented here, in the function `allocate_jams()`.

Updated: