Optimizing Plots with a TSP Solver

Paul Butler – March 25, 2018

This post is also available as a Jupyter notebook. You can download it or run it on Binder.

There's something very rewarding about watching a pen plotter's dance as it traces out a graphic you've previously only seen on a screen. It can also be a stark reminder of how forgiving screen graphics are by comparison, as images that render in miliseconds on screen can take ages to plot on paper.

Each line in a plot takes time in two ways: it takes time to trace the line, and it takes time to move the pen into place from the last position. As a result of the latter, the drawing order of the lines can make a big difference to how long a plot takes. It's common to generate an image that can be sped up by an order of magnitude just by choosing a better drawing order.

Some folks in the plotting community have already been generous enough to share route optimizers: Anders Hoff's svgsort, Trammell Hudson's vecsort, Windell Oskay's AxiDraw patch, Romain Testuz's InkscapeOptimizePath.

Most of these use a “greedy” approach, in which the path is built by iteratively tracing the closest untraced path to the current position. (InkscapeOptimizePath is the exception: it uses an approach based on Eulerian cycles.) In this post I investigate whether we can do better than a greedy algorithm using a more sophisticated path solver.

Environment Setup

As an optimization package, we'll use Google's Optimization Tools. For the greedy approach, we'll use the rtree package. It's easiest to install through Conda since the libspatialindex dependency is installed automatically. Also, the svgpathtools package greatly simplifies parsing and writing paths from SVG files.

In [1]:
# Install dependencies
!conda install -q -y rtree 2&> /dev/null
!pip -q install ortools svgpathtools

Visualizing the Problem

I have an SVG file that I'd like to plot. Here's what it looks like:

In [2]:
from IPython.display import SVG
SVG('spiralthing.svg')
Out[2]:

Unfortunately, the drawing order is very inefficient. To show you what I mean, let's process the image in two ways:

  1. Colorizing the lines based on drawing order.
  2. Adding a line for each “pen transit” that is necessary to draw the lines in order.

We can use svg2paths to grab all the paths as a list. This function also returns a dictionary of attributes for each path, but we can ignore those.

In [3]:
import svgpathtools
paths, _ = svgpathtools.svg2paths('spiralthing.svg')

To colorize the lines, generate_color returns a color string from an HSV value. By using different values of hue between 0 and 1, we get a nice rainbow gradient. visualize_pen_transits takes a list of paths like svgpathtools produces and renders a visualization of the transits with the notebook.

In [4]:
import colorsys
import tempfile
import os

def generate_color(hue, saturation=1.0, value=1.0):
    rgb = colorsys.hsv_to_rgb(hue, saturation, value)
    return 'rgb({},{},{})'.format(*[int(x*255) for x in rgb])

def visualize_pen_transits(paths):
    # We will construct a new image by storing (path, attribute)
    # pairs in this list.
    parts = list()
    
    last_end = None
    for i, path in enumerate(paths):
        start = path.start
        end = path.end
        
        # Generate a color based on how far along in the plot we are.
        frac = i / (len(paths) - 1)
        color = generate_color(frac, 1.0, 1.0)

        if last_end is not None:
            # If this isn't our first path, add a line between the end of
            # the last path and the start of this one.
            parts.append((
                svgpathtools.Line(last_end, start),
                {
                    'stroke': 'black',
                    'fill': 'none',
                }
            ))
            
        last_end = end
        
        # Also draw a faded, colorized version of this path.
        parts.append((
            path,
            {
                'stroke': color,
                'fill': 'none',
                'opacity': '0.5'
            }
        ))
    
    # Write the SVG to a temporary file and load it back as an object that
    # will appear in the notebook.
    with tempfile.TemporaryDirectory() as svg_path:
        svg_file = os.path.join(svg_path, 'image.svg')
        new_paths, new_attrs = zip(*parts)
        svgpathtools.wsvg(new_paths, attributes=new_attrs, filename=svg_file)
        svg = SVG(svg_file)
    return svg

Let's see what the input image looks like when we visualize the transits:

In [5]:
visualize_pen_transits(paths)
Out[5]:

Ick! All those lines represent time the pen is just flying through the air instead of writing on the page. Surely we can do better.

Quantifying the Problem

Now we know what a bad solution looks like, but how do we measure it? One way is to simply add up the distance that the pen travels in the “pen up” (not drawing) position. The distance in the “pen down” position is the same no matter what order we draw the lines in, so we can just ignore it.

svgpathtools stores an (x, y) coordinate pair as the complex number x + y*j (in Python, j represents the imaginary component). This simplifies the Euclidean distance calculation:

In [6]:
def dist(p1, p2):
    return abs(p1 - p2)

Using this distance function, we can define the cost of a route to be the travel time between the end of each path and the start of the next one. A pen plotter starts and returns to an origin point, so to fully account for the cost we also need to add the distance to and from the origin.

In [7]:
def cost_of_route(path, origin=0.+0j):
    # Cost from the origin to the start of the first path
    cost = dist(origin, path[0].start)
    # Cost between the end of each path and the start of the next
    cost += sum(
        dist(path[i].end, path[i+1].start) for i in range(len(path) - 1)
    )
    # Cost to return back to the origin
    cost += dist(path[-1].end, origin)
    return cost

We can now calculate the cost of the route. This will give us a perspective of how much things improve (or don't improve; I'm not giving spoilers.)

In [8]:
initial_cost = cost_of_route(paths)
initial_cost
Out[8]:
24116.595176769544

Building a Graph

Now we have a cost function that we hope to reduce by re-ordering the paths. But before we jump into that, here's another thing to consider: in addition to re-ordering the paths, we can also reverse them. In order to take advantage of that fact, we need to keep track of both the order of the paths and whether or not each one is reversed.

Rather than storing the direction explicitly, I find it easier to represent the problem by creating a graph and adding nodes to the graph for each path and its reverse. A valid drawing order is then a cycle through the graph, starting and returning to the origin (which is also represented as a node in the graph), which visits exactly one node belonging to each (path, reversed path) pair.

To build this graph, we construct a list of nodes like this: [origin, path[0], path[0] reversed, path[1], path[1] reversed, ...], that is, self.endpoints[0] represents the origin, and then each pair of nodes that follows represents the two directions of one path. Excepting the origin (0), paths with an odd index correspond to unmodified paths from the original drawing, and paths with an even index correspond to their reversed versions.

In [9]:
class PathGraph:
    # The origin is always at index 0.
    ORIGIN = 0
    
    def __init__(self, paths, origin=0.+0j):
        """Constructs a PathGraph from the output of svgpathtools.svg2paths."""
        self.paths = paths
        # For any node i, endpoints[i] will be a pair containing that node's
        # start and end coordinates, respectively. For i==0 this represents
        # the origin.
        self.endpoints = [(origin, origin)]
        
        for path in paths:
            # For each path in the original list of paths,
            # create nodes for the path as well as its reverse.
            self.endpoints.append((path.start, path.end))
            self.endpoints.append((path.end, path.start))
    
    def get_path(self, i):
        """Returns the path corresponding to the node i."""
        index = (i - 1) // 2
        reverse = (i - 1) % 2
        path = self.paths[index]
        if reverse:
            return path.reversed()
        else:
            return path
    
    def cost(self, i, j):
        """Returns the distance between the end of path i
        and the start of path j."""
        return dist(self.endpoints[i][1], self.endpoints[j][0])
    
    def get_coordinates(self, i, end=False):
        """Returns the starting coordinates of node i as a pair,
        or the end coordinates iff end is True."""
        if end:
            endpoint = self.endpoints[i][1]
        else:
            endpoint = self.endpoints[i][0]
        return (endpoint.real, endpoint.imag)
    
    def iter_starts_with_index(self):
        """Returns a generator over (index, start coordinate) pairs,
        excluding the origin."""
        for i in range(1, len(self.endpoints)):
            yield i, self.get_coordinates(i)
    
    def get_disjoint(self, i):
        """For the node i, returns the index of the node associated with
        its path's opposite direction."""
        return ((i-1) ^ 1) + 1
    
    def iter_disjunctions(self):
        """Returns a generator over 2-element lists of indexes which must
        be mutually exclusive in a solution (i.e. pairs of nodes which represent
        the same path in opposite directions.)"""
        for i in range(1, len(self.endpoints), 2):
            yield [i, self.get_disjoint(i)]
    
    def num_nodes(self):
        """Returns the number of nodes in the graph (including the origin.)"""
        return len(self.endpoints)
In [10]:
path_graph = PathGraph(paths)

A Greedy Approach

Using this graph, we can implement a greedy solution. Here's how it works:

  1. Start at the origin.
  2. Move to the nearest node. Add the path represented by that node to the end of the solution.
  3. Remove both the node you are on, and the node corresponding to the opposite direction of the same path.
  4. If there are nodes remaining, return to 2.

The second step requires a bit of nuance: the naive approach of looping over every point to find the nearest can get computationally expensive. Instead, we'll use a spatial index called an R-tree which is good at quickly finding the nearest node to a point.

The PathIndex class below takes a PathGraph and turns it into a data structure that can be quickly queried to find the nearest point to a given coordinate.

In [11]:
import rtree

class PathIndex:
    def __init__(self, path_graph):
        self.idx = rtree.index.Index()
        self.path_graph = path_graph
        for index, coordinate in path_graph.iter_starts_with_index():
            self.idx.add(index, coordinate + coordinate)
    
    def get_nearest(self, coordinate):
        return next(self.idx.nearest(coordinate))
    
    def delete(self, index):
        coordinate = self.path_graph.get_coordinates(index)
        self.idx.delete(index, coordinate + coordinate)
    
    def delete_pair(self, index):
        self.delete(index)
        self.delete(self.path_graph.get_disjoint(index))

The greedy_walk function implements the logic mentioned earlier: we start at the origin, then find the nearest point and move there, deleting nodes along the way.

In [12]:
def greedy_walk(path_graph):
    path_index = PathIndex(path_graph)
    location = path_graph