Posted in July 2019

Impact analysis of a new metro line in Amsterdam using automated data sources

A new metro line (the north-south line) was opened in Amsterdam in July 2018, adding significant capacity to the existing urban public transport network consisting of bus, tram and metro modes. The opening of the metro line was accompanied by changes to the existing bus and tram network, such as removal of duplicate routes and addition of feeder routes.

Traditionally, the impact of such a network change was measured either ex-ante or post-op based on surveys or model forecasts (Vuk 2005; Knowles 1996; Engebretsen, Christiansen, and Strand 2017). However, with the availability of automated data sources such as the smart card data, the exact impact on transit demand and service quality can now be measured. However, so far this has been limited to analysing the changes in travel times and reliability at a trip level (Fu and Gu 2018), excluding transfers.
This research utilises smart card and AVL data to study the impact of the new line on travel patterns (passenger flows), travel times and reliability from a passenger perspective by considering journeys including transfers. The metrics are calculated at a stop-cluster level, enabling also a distributional analysis of the impacts. Such a post-op analysis of any policy intervention or network change could be used to refine the demand predictive ex-ante tools.

Check the Transit Data workshop contributions of Malvika Dixit: Presentation and Extended abstract

Forecasting bus ridership with trip planner usage data: a machine learning application

Currently, public transport gives much attention to environmental impact, costs and traveler satisfaction. Good short-term demand forecasting models can help improve these performance indicators. It can help prevent denied boarding and overcrowding in busses by detecting insufficient capacity beforehand. It could be used to operate more economically by decreasing the frequency or the size of the bus if there is overcapacity. Moreover, it could help operators plan their busses during incidental occasions like big public events where little information is known. Finally, it could be used to reliably inform the travelers on the current crowdedness.
This study investigates the usefulness of a new data source; the usage data of a trip planner. In the Netherlands there are multiple trip planners available for users to help find the most optimal (multimodal) journeys. These trip planners require a date, a time and an origin and destination, which they use to construct multiple alternative journeys from which the user can choose. For this study the data of 9292 was used, being the major trip planner in the Netherlands including all public transport modes.
We developed a model for forecasting the number of people boarding and a model for forecasting the number of people alighting at a certain stop. These forecasts are defined at the vehicle-stop level. By summing the number of people boarding and subtracting the number of people alighting along the trip the forecasted number of passengers after a stop is calculated.

We compare five different machine learning models: multiple linear regression, decision tree, random forests, neural networks and support vector regression with a radial basis kernel. We compare these models with two simple rules: 1 predict the same number as last week, and 2 predict the historic average as number. The models are implemented in the Scikit-Learn library of Python. The data is stored in a PostgresSQL database.
The trip planner datasets and smart card dataset are merged and preprocessed. The resulted dataset is rather sparse; a lot of stops have zero passengers boarding or alighting or requests suggesting to do so. Therefore we investigated if subsampling is needed. From the datasets useful data is selected and features are constructed. The features are standardized. Different number of features are tested, these features are selected based on recursive elimination using a simple random forests model. Finally, the hyperparameters of the models are tuned and the optimal configurations are stored. The scores are validated by using cross validation.

Find more details in the following contributions by Jop van Roosmalen: Transit Data workshop presentation and MSc thesis

© 2011 TU Delft