In Spotlight

In this chapter, selected aspects of the Data Generation Process are explained on a more detailed level and supported by visuals. In this scope, some internal functions and methods are imported that are not part of the official interface.

[1]:
import datetime

import matplotlib.pyplot as plt
import pandas as pd

import conflowgen

Load some internal classes and functions that are not part of the regular API.

[2]:
from conflowgen.domain_models.container import Container
from conflowgen.flow_generator.truck_for_export_containers_manager import (
    TruckForExportContainersManager,
)
from conflowgen.flow_generator.truck_for_import_containers_manager import (
    TruckForImportContainersManager,
)
from conflowgen.tools.continuous_distribution import (
    multiply_discretized_probability_densities,
)

Set a style for matplotlib.

[3]:
plt.style.use("seaborn-colorblind")

Initialize ConFlowGen.

[4]:
database_chooser = conflowgen.DatabaseChooser()
database_chooser.create_new_sqlite_database(":memory:")

Combining truck arrival and container dwell time distribution

It is a challenge to synthetically generate container flows that take both the truck arrival distribution and the container dwell time distribution into account. This is, however, necessary in two cases:

  • When a container is picked up by a truck

  • When a container is delivered by a truck

The approach chosen in ConFlowGen is presented in the following, first for the import and then for the export process.

Picking up a container by truck

When a container is delivered to the container terminal by a vessel and a truck is to be generated to pick up the container, two naive approaches exist. First, a truck arrival time might be drawn from the truck arrival distribution. This, e.g., ensures that no truck arrivals happen on a Sunday. However, only considering the truck arrival distribution means that the container dwell time distribution is ignored. Second, the container dwell time distribution might be picked to draw the arrival of the truck. This ensures that the container dwell times are realistic. At the same time, the truck arrival patterns are ignored.

Prepare the container that arrives at the terminal with a deep sea vessel and departs with a truck

[5]:
container = Container.create(
    weight=20,
    delivered_by=conflowgen.ModeOfTransport.deep_sea_vessel,
    picked_up_by=conflowgen.ModeOfTransport.truck,
    picked_up_by_initial=conflowgen.ModeOfTransport.truck,
    length=conflowgen.ContainerLength.twenty_feet,
    storage_requirement=conflowgen.StorageRequirement.standard,
)
container_arrival_time = datetime.datetime.now().replace(second=0, microsecond=0)
container_arrival_time_hour = container_arrival_time.replace(
    minute=0
) + datetime.timedelta(
    hours=1
)  # turn 8:45 into 09:00

print(
    f"The container arrives at the terminal at {container_arrival_time.isoformat()} "
    f"which is counted as {container_arrival_time_hour.isoformat()}"
)
The container arrives at the terminal at 2022-10-23T18:20:00 which is counted as 2022-10-23T19:00:00

Load the two distributions that fit the container charateristics.

[6]:
manager = TruckForImportContainersManager()
manager.reload_distributions()
(
    container_dwell_time_distribution,
    truck_arrival_distribution,
) = manager._get_distributions(container)

print(container_dwell_time_distribution)
print(truck_arrival_distribution)
<ClippedLogNormal: avg=72.0h, min=3.0h, max=216.0h, var=3600.0h², sd=60.0h>
<WeeklyDistribution: size_of_time_window_in_hours=216h>

Then, the earliest truck time slot is chosen, i.e., when it can arrive first on the terminal.

[7]:
earliest_truck_time_slot = container_arrival_time_hour + datetime.timedelta(
    hours=container_dwell_time_distribution.minimum
)
print(
    f"The earliest available truck time slot is {earliest_truck_time_slot.isoformat()}"
)
The earliest available truck time slot is 2022-10-23T22:00:00

Now the truck arrival distribution is converted to a distribution that reflects the probability that the container is picked up at a given time. While the truck arrival distribution only covers a work week, the derived distribution must cover the whole time range from the time the container has arrived at the terminal until the point that is determined as the maximum dwell time. This time range is often longer than a week.

[8]:
truck_arrival_distribution_slice = truck_arrival_distribution.get_distribution_slice(
    earliest_truck_time_slot
)

truck_arrival_distribution_slice_as_dates = {
    (container_arrival_time_hour + datetime.timedelta(hours=hours_from_now)): fraction
    * 100
    for hours_from_now, fraction in truck_arrival_distribution_slice.items()
}

df_truck_arrival_distribution = pd.Series(
    truck_arrival_distribution_slice_as_dates
).to_frame("Truck Arrival Distribution")

df_truck_arrival_distribution.plot(legend=False)
plt.ylabel("Probability (as percentage overall)")
plt.show()
../_images/notebooks_in_spotlight_17_0.svg

After having loaded the truck arrival distribution, now it is time to turn to the container dwell time distribution. It assigns a probability of the container being picked up to any suggested time slot.

[9]:
time_windows_for_truck_arrival = list(truck_arrival_distribution_slice.keys())
container_dwell_time_probabilities = (
    container_dwell_time_distribution.get_probabilities(time_windows_for_truck_arrival)
)

container_dwell_time_probabilities_as_dates = {
    earliest_truck_time_slot + datetime.timedelta(hours=hours_from_now): fraction * 100
    for hours_from_now, fraction in enumerate(container_dwell_time_probabilities)
}

df_container_dwell_time_distribution = pd.Series(
    container_dwell_time_probabilities_as_dates
).to_frame("Container Dwell Time Distribution")

df_container_dwell_time_distribution.plot(legend=False)
plt.ylabel("Probability (as percentage overall)")
plt.show()
../_images/notebooks_in_spotlight_19_0.svg

In the last step, the two distributions are merged by multiplication.

[10]:
merged_distribution = multiply_discretized_probability_densities(
    list(truck_arrival_distribution_slice.values()), container_dwell_time_probabilities
)

merged_distribution_as_dates = {
    (
        earliest_truck_time_slot
        - datetime.timedelta(hours=container_dwell_time_distribution.minimum)
        + datetime.timedelta(hours=hours_from_now)
    ): fraction
    * 100
    for hours_from_now, fraction in enumerate(merged_distribution)
}

df_merged_distributions = pd.Series(merged_distribution_as_dates).to_frame(
    "Multiplication of Both Distributions"
)

df_merged_distributions.plot(legend=False)
plt.ylabel("Probability (as percentage overall)")
plt.show()
../_images/notebooks_in_spotlight_21_0.svg

Let’s re-check how the multiplication of the two distributions affected the merged distribution.

[11]:
df_merged = pd.concat(
    [
        df_truck_arrival_distribution,
        df_container_dwell_time_distribution,
        df_merged_distributions,
    ],
    axis=1,
)

ax = df_merged[
    ["Container Dwell Time Distribution", "Truck Arrival Distribution"]
].plot(
    color={
        "Truck Arrival Distribution": "navy",
        "Container Dwell Time Distribution": "dimgray",
    },
    alpha=0.5,
    style="--",
)

plt.axvline(
    earliest_truck_time_slot
    + datetime.timedelta(hours=container_dwell_time_distribution.minimum),
    color="dimgray",
)
plt.axvline(
    earliest_truck_time_slot
    + datetime.timedelta(hours=container_dwell_time_distribution.maximum),
    color="dimgray",
)

plt.axvline(container_arrival_time, color="black")

df_merged[["Multiplication of Both Distributions"]].plot(ax=ax, alpha=1, color="k")
plt.show()
../_images/notebooks_in_spotlight_23_0.svg

The multiplication of the two distributions clearly leads to a new distribution that will help to approximate both the container dwell time distribution and the truck arrival distribution likewise.

Delivering a container by truck

When a container is delivered by truck, ConFlowGen actually first allocated the container on a vessel and only then decides on the truck arrival time. The process is thus very similar to the previous case, only that both distributions needs to be reversed. This is because we look backwards: Given the chosen vessel, how many hours before the truck most likely has arrived?

Prepare the container that departs from the terminal with a deep sea vessel.

[12]:
container = Container.create(
    weight=20,
    delivered_by=conflowgen.ModeOfTransport.truck,
    picked_up_by=conflowgen.ModeOfTransport.deep_sea_vessel,
    picked_up_by_initial=conflowgen.ModeOfTransport.deep_sea_vessel,
    length=conflowgen.ContainerLength.forty_feet,
    storage_requirement=conflowgen.StorageRequirement.standard,
)
container_departure_time = datetime.datetime.now().replace(second=0, microsecond=0)
container_departure_time_hour = container_departure_time.replace(minute=0)

print(
    f"The container departs from the terminal at {container_departure_time.isoformat()} "
    f"which is counted as {container_departure_time_hour.isoformat()}"
)
The container departs from the terminal at 2022-10-23T18:20:00 which is counted as 2022-10-23T18:00:00

Load the two distributions that fit the container charateristics.

[13]:
manager = TruckForExportContainersManager()
manager.reload_distributions()
(
    container_dwell_time_distribution,
    truck_arrival_distribution,
) = manager._get_distributions(container)

print(container_dwell_time_distribution)
print(truck_arrival_distribution)
<ClippedLogNormal: avg=156.0h, min=12.0h, max=468.0h, var=7800.0h², sd=88.3h>
<WeeklyDistribution: size_of_time_window_in_hours=468h>
[14]:
earliest_truck_time_slot = container_departure_time_hour - datetime.timedelta(
    hours=container_dwell_time_distribution.maximum
)

print(
    f"The earliest available truck time slot is {earliest_truck_time_slot.isoformat()}"
)
The earliest available truck time slot is 2022-10-04T06:00:00

The truck arrival distribution is prepared like before.

[15]:
truck_arrival_distribution_slice = truck_arrival_distribution.get_distribution_slice(
    earliest_truck_time_slot
)

truck_arrival_distribution_slice_as_dates = {
    earliest_truck_time_slot + datetime.timedelta(hours=hours_from_now): fraction * 100
    for hours_from_now, fraction in truck_arrival_distribution_slice.items()
}

df_truck_arrival_distribution = pd.Series(
    truck_arrival_distribution_slice_as_dates
).to_frame("Truck Arrival Distribution")

df_truck_arrival_distribution.plot(legend=False)
plt.ylabel("Probability (as percentage overall)")
plt.show()
../_images/notebooks_in_spotlight_32_0.svg

Likewise, the container dwell time distribution is prepared.

[16]:
time_windows_for_truck_arrival = list(truck_arrival_distribution_slice.keys())
container_dwell_time_probabilities = (
    container_dwell_time_distribution.get_probabilities(
        time_windows_for_truck_arrival, reversed_distribution=True
    )
)

container_dwell_time_probabilities_as_dates = {
    earliest_truck_time_slot + datetime.timedelta(hours=hours_from_now): fraction * 100
    for hours_from_now, fraction in enumerate(container_dwell_time_probabilities)
}

df_container_dwell_time_distribution = pd.Series(
    container_dwell_time_probabilities_as_dates
).to_frame("Container Dwell Time Distribution")

df_container_dwell_time_distribution.plot(legend=False)
plt.ylabel("Probability (as percentage overall)")
plt.show()
../_images/notebooks_in_spotlight_34_0.svg

In the last step, the two distributions are merged by multiplication.

[17]:
merged_distribution = multiply_discretized_probability_densities(
    list(truck_arrival_distribution_slice.values()), container_dwell_time_probabilities
)

merged_distribution_as_dates = {
    earliest_truck_time_slot + datetime.timedelta(hours=hours_from_now): fraction * 100
    for hours_from_now, fraction in enumerate(merged_distribution)
}

df_merged_distributions = pd.Series(merged_distribution_as_dates).to_frame(
    "Multiplication of Both Distributions"
)

df_merged_distributions.plot(legend=False)
plt.ylabel("Probability (as percentage overall)")
plt.show()
../_images/notebooks_in_spotlight_36_0.svg

Let’s re-check how the multiplication of the two distributions affected the merged distribution.

[18]:
df_merged = pd.concat(
    [
        df_truck_arrival_distribution,
        df_container_dwell_time_distribution,
        df_merged_distributions,
    ],
    axis=1,
)

ax = df_merged[
    ["Container Dwell Time Distribution", "Truck Arrival Distribution"]
].plot(
    color={
        "Truck Arrival Distribution": "navy",
        "Container Dwell Time Distribution": "dimgray",
    },
    alpha=0.5,
    style="--",
)

plt.axvline(
    earliest_truck_time_slot
    + datetime.timedelta(hours=container_dwell_time_distribution.minimum),
    color="dimgray",
)
plt.axvline(
    earliest_truck_time_slot
    + datetime.timedelta(hours=container_dwell_time_distribution.maximum),
    color="dimgray",
)

plt.axvline(container_departure_time, color="black")

df_merged[["Multiplication of Both Distributions"]].plot(ax=ax, alpha=1, color="k")

left, right = plt.xlim()
plt.xlim(right=right + datetime.timedelta(hours=15).total_seconds() / 3600)

plt.show()
../_images/notebooks_in_spotlight_38_0.svg

Further topics

If you have a topic in mind that should be presented step-by-step like the previous one, please reach out to https://github.com/1kastner/conflowgen/issues or write a mail directly to marvin.kastner@tuhh.de.