Architecture Deep Dive

DATA INTELLIGENCE

How I transformed raw Nassau shipping records into actionable trade intelligence — uncovering cargo patterns, port performance, route bottlenecks, and seasonal logistics trends through end-to-end data analysis.

Python 3.10+ Pandas & NumPy Plotly & Seaborn Streamlit Dashboard Nassau Shipping Data Python 3.10+ Pandas & NumPy Plotly & Seaborn Streamlit Dashboard Nassau Shipping Data

The Raw Data Problem

Nassau's shipping dataset contained thousands of unstructured voyage records spanning multiple routes, cargo types, vessel classes, and port pairs. Without structured analysis, the data was noise — no visibility into which routes were profitable, which cargo types caused delays, or which ports created bottlenecks.

The EDA-First Approach

I applied a systematic Exploratory Data Analysis pipeline using pandas for wrangling and plotly for interactive visualization. Rather than jumping to conclusions, the analysis was driven by distributional patterns, correlations, and time-series decomposition — letting the data surface its own story.

Nassau_Shipping_Analysis.ipynb
# Load and inspect the raw dataset
import pandas as pd
import plotly.express as px

df = pd.read_csv('nassau_shipping.csv')

# Parse dates and compute transit time
df['Departure Date'] = pd.to_datetime(df['Departure Date'])
df['Arrival Date']   = pd.to_datetime(df['Arrival Date'])

df['Transit Time'] = (
    df['Arrival Date'] - df['Departure Date']
).dt.days

# Identify delayed shipments
df['Is Delayed'] = df['Status'] == 'Delayed'
route_analysis.py
# Route-level freight cost aggregation
route_stats = (
    df.groupby('Route')
    .agg(
        avg_cost=('Freight Cost ($)', 'mean'),
        total_volume=('Cargo Weight (tons)', 'sum'),
        delay_rate=('Is Delayed', 'mean')
    )
    .sort_values('avg_cost', ascending=False)
    .reset_index()
)

fig = px.bar(
    route_stats, x='Route', y='avg_cost',
    color='delay_rate', title='Route Cost vs Delay Rate'
)

Route Intelligence

Using groupby aggregations, I computed per-route KPIs — average freight cost, total cargo volume, and delay rate. This revealed that just the top 3 routes account for over 60% of total cargo volume, exposing a critical concentration risk in Nassau's logistics network.

Seasonal Trend Modeling

A time-series decomposition of monthly shipment volumes revealed a pronounced Q3 peak (July–September), indicating seasonal demand surges. Rolling averages smoothed short-term noise, making the cyclical freight cost pattern clearly visible for business forecasting.

KEY INSIGHTS

Bulk Cargo Dominance

Bulk cargo constitutes the largest share of all shipments, driving the majority of port throughput volumes and shaping Nassau's overall freight cost structure.

Cost–Transit Correlation

Transit time shows a strong positive correlation with freight cost — longer routes yield proportionally higher operational expenses, validated through scatter analysis and Pearson coefficient.

Perishable Delay Risk

Perishable goods experience a 2× higher delay rate compared to dry bulk cargo, revealing a critical vulnerability in cold-chain logistics across Nassau's maritime routes.

DASHBOARD ARCHITECTURE

Interactive Filters

Built dynamic Streamlit sidebar controls — date range sliders, multi-select dropdowns for cargo type, ship type, and port — that push filter state directly into Plotly chart re-renders.

Sankey Route Flow

Engineered a Plotly Sankey diagram to visualize cargo flow between loading and discharge ports, making route concentration and freight pathways immediately interpretable at a glance.

CSV Export Pipeline

Implemented a one-click filtered data export using Streamlit's download button, converting the live-filtered DataFrame to a CSV in-memory buffer for zero-friction user downloads.