rideshare demand predictiona dispatch algorithm



introduction

indego, philadelphia's bike share system, has grown rapidly, expanding into new neighborhoods and infilling existing stations to make bike share more convenient. since february 2022, the system added 16 new stations, 328 docking points, and 250 electric bikes. however, the bike share system’s usefulness hinges on availability: docks should have bikes to pick up and open spaces to return bikes.

to ensure consistent service, indego re-balances its bikes across the network, predicting demand at each dock and redistributing bikes accordingly. trucks transporting bikes are a familiar sight, shifting inventory from low-demand to high-demand areas. predicting bike demand accurately helps move bikes proactively and enhances user satisfaction. this report focuses on building a predictive model to support this re-balancing effort.


data

this project relies on several datasets:
  1. philadelphia census data
  2. indego open ride data (fourth quarter of last year)
  3. weather data
  4. amenity data


exploratory analysis

the dataset contains 18 weeks of bike share trips, representing over 490,000 unique observations. the trip data highlights daily patterns, with clear peaks during the morning and evening rush hours on weekdays. weekends show a different trend, with consistent demand throughout the day but without sharp peaks.

the demand for bikes is higher in central areas like university city, old city, and center city during the weekday rush hours. mapping bike usage confirms that the city’s core experiences greater bike activity on weekdays, while weekends see more evenly distributed demand.


for accurate demand prediction, the dataset must include every possible space/time combination. initially, the indego data frame was incomplete; certain time intervals had no recorded trips. to resolve this, i ensured that each unique station and hour/day combination was represented in the dataset. this created a comprehensive panel of 18 weeks of ride data, resulting in 490,728 unique space/time units across 3,024 distinct time units.

holidays and weekends likely impact bike demand significantly. the dataset covers september through december, capturing holidays like thanksgiving and christmas. to account for these effects, i created dummy variables to indicate the proximity to holidays. including time lag variables (e.g., demand from the previous hour, previous day) adds nuance to the model. these lags showed strong predictive power, with pearson’s r-value of 0.90 for the lagHour, indicating high correlation.

the analysis of serial autocorrelation shows a clear trend: the correlation diminishes with each additional lag hour but recovers with the 1-day lag. these patterns suggest that bike demand at a given hour is strongly influenced by demand at the same hour the previous day.

for spatial autocorrelation, i focused on data from weeks 47 to 52 (nov 19 - dec 30), excluding the 53rd week since it only includes a single day (december 31st). during this period, thanksgiving and christmas holidays caused significant dips in demand, underscoring the impact of major holidays on bike usage patterns.


regression modeling

i developed five linear regression models with different feature sets:
  1. reg1: uses only time and weather features.
  2. reg2: focuses solely on spatial features (station fixed effects).
  3. reg3: combines time and space features.
  4. reg4: adds time lag variables and holiday effects.
  5. reg5: incorporates amenities data (proximity to landmarks, markets, and universities).

the models with time lags (reg4) and amenities features (reg5) perform better, significantly reducing mean absolute error (mae). time lag features like previous hour demand show strong predictive power, indicating that current demand correlates closely with recent demand patterns.


evaluation and accuracy

to assess model performance, i split the data using a time series approach, training on 13 weeks and testing on the following 5 weeks. cross-validation further confirms the robustness of reg4, which consistently shows lower mae across test sets. the distribution of errors shows higher prediction discrepancies in high-traffic areas like the loop, particularly during the pm rush. the mae map highlights error patterns along market street, signaling areas where demand prediction could be improved.


interpreting the results

the analysis reveals a clear demand pattern: weekdays exhibit two peaks—around 8:30 am and 5:00 pm. the pm rush consistently shows the highest demand, indicating a need for better bike allocation in the afternoon. deploying additional trucks to redistribute bikes during this period could help meet demand and prevent shortages.


while adding time lags improved the model’s accuracy, seasonal effects weren’t fully captured. because the data covers fall and winter, the colder weather likely reduced overall demand. future models should consider seasonal factors and average monthly temperatures to better predict demand year-round.



cross-validation insights

cross-validation results show that models incorporating time lags and amenities features generalize well across different time periods and stations. reg4, with its strong temporal component, had the best performance, reducing mae significantly in comparison to simpler models.




recommendations

based on this analysis, i recommend implementing a more dynamic re-balancing strategy that considers temporal demand patterns and adjusts for peak times, especially during the weekday pm rush. additional trucks could be deployed during high-demand periods to maintain bike availability.

incentives for users to return bikes to designated high-demand locations could also help balance the system. this strategy could reduce the cost of under-prediction (leading to shortages) and over-prediction (leading to unused bikes), optimizing overall operations.

while the model performed well, there is room for improvement. future iterations should explore additional features, like nearby public transit stations or real-time weather changes, and address potential biases that may arise from seasonal and spatial variations.


conclusion

the bike share prediction model demonstrated strong performance, particularly with the inclusion of time lag and amenity features. while the pm rush remains the biggest challenge for demand forecasting, implementing a data-driven, dynamic re-balancing plan could significantly improve service. future work should refine feature selection and consider seasonal adjustments to enhance the model’s generalizability. ultimately, a well-predicted and efficiently re-balanced bike share system can better meet user needs and support philadelphia’s growing urban mobility network.


copyright @liuhaobing