felipeblassioli's blog

By felipeblassioli, 9 years ago, In English

DISCLAIMER: I really want to understand why this solution (which I think is quite reasonable) is wrong. If this is a nonsensical solution, please say so. I am not looking for a AC solution, I'd like to prove that this one is wrong, but i've been unable to do so..


Problem Statement

The statement is from timus 1588 — Jamaica:

Each two cities are to be connected by a road going along a straight line. One road may connect several cities if they lie on the same straight line. Jamaican economists are sure that the new network will minimize transportation expenses. In order to estimate the cost of the project, it is required to determine the total length of the roads to be constructed, and this is the job for Andrey.


Solution to be proven wrong

So, we have N points and we want the sum of a subset of the lines.

Consider all point combinations, this give all lines. A line is described by its slope and its point of intersection with the Y axis. The ideas is simply to find the biggest segment in each line.

Some code:

from fractions import Fraction # available since python 2.6 . Irreducible fractions, very slow
...
        for a,b in combinations(pts,2):
		dx,dy = a[0]-b[0], a[1]-b[1]
		dist = _dist(a,b)
                # Vertical, horizontal and default line
		if a[1] == b[1]: 
			line = (0, a[1])
		elif a[0] == b[0]: 
			line = (INF, a[0])
		else:
                        # irreducible fraction Fraction(8,2) = 4/1
                        # M = dy/dx and B = Y - MX
			M = Fraction(dy,dx)
                        # line is tuple of integers. No division is done to avoid precision errors.
			B = Fraction(dx*b[1] - (dy*b[0]), dx)
			line = ((M.numerator,M.denominator, (B.numerator,B.denominator))

                # Find biggest line
		if line not in lines:
		    lines[line] = dist
		elif lines[line] <= dist:
		    lines[line] = dist

Here's the example input: (correct answer is 412)

0 0
0 100
100 0
50 50

And a verbose output:

(A,B) DIST , where (A,B) describes segment AB

((0, 0), (100, 0)) 100.0
((0, 0), (50, 50)) 70.7106781187
((0, 0), (0, 100)) 100.0
((0, 100), (100, 0)) 141.421356237
412

What may be wrong

Why is it wrong? I couldn't find a test in which this algorithm fail.

  • Is the general idea Faulty? I'm quite 'preocupied' because the problem's discussion said that the "best algo" for this problem was O(N^2 log N) or O(N^2 log N^2) or something like that, but in my solution I don't see a O(log X) factor hah (i.e I could be overlooking some case). In fact, both log(N) solutions mentioned sort (sort by angle or sort by distance), but I didn`t find a reason to sort..

  • Is it a precision problem? Or a rounding problem? I've used python's Fraction to (try) avoid precision problems and math.round(X + 0.5) to print the answer. Is that not enough?

OBS: Also I'm aware of some O(N^3) AC solutions, such as the one I put at the bottom of this post. But I didn't find they interesting..


I think my solution is quite similar to one mentioned in the problem's discussion:

  1. Sort n^2 lines by distance.
  2. Iterate from largest to smallest.
  3. We have to add line to the answer, if we not have bigger line in set, covering this. We can easily check it with set.

Where did I go wrong?


EDIT: As the user @ffao said, my problem could be anywhere in my code, so here's the whole thing.

Maybe it's a precision problem while printing the answer (i.e floor( X + 0.5) not enought) I don't think so, I kind of tested with some 'round problematic answers'.

I really suspect that has got something todo with the general idea of the solution and that I'm overlooking some sort of case, unfortunately haha

Here`s the whole code:

def main():
	from sys import stdin
	from itertools import izip, combinations
	from math import sqrt, floor
	from fractions import Fraction

	tkns = iter(map(int,stdin.read().strip().split()))
	n = tkns.next()

	pts = [ (x,y) for x,y in izip(tkns, tkns) ]
	_dist = lambda A,B: sqrt((A[0]-B[0])*(A[0]-B[0]) + (A[1]-B[1])*(A[1]-B[1]))
	lines = {}
	INF = 10000005

        for a,b in combinations(pts,2):
		dx,dy = a[0]-b[0], a[1]-b[1]
		dist = _dist(a,b)
		if a[1] == b[1]: 
			line = (0, a[1])
		elif a[0] == b[0]: 
			line = (INF, a[0])
		else:
			M = Fraction(dy,dx)
			B = Fraction(dx*b[1] - (dy*b[0]), dx)
			line = ((M.numerator,M.denominator), (B.numerator,B.denominator))
			#M = float(dy)/dx
			#line = (M, b[1] - M*b[0])

		if line not in lines:
		    lines[line] = dist
		elif lines[line] <= dist:
		    lines[line] = dist

	print int(floor(sum(lines.values())+0.5))

Also, here's an AC solution which I think is O(n^3): (most timus1588 solutions I googled were analogous to this straightforward one)

#include <bits/stdc++.h>
using namespace std;
typedef long long ll;

int x[303], y[303];
int dist(int i, int j) {
	return (x[i] - x[j]) * (x[i] - x[j]) + (y[i] - y[j]) * (y[i] - y[j]);
}

int cross(int i, int j, int k) {
	return (x[j] - x[i]) * (y[k] - y[i]) - (y[j] - y[i]) * (x[k] - x[i]);
}

map<vector<int>, int> ss;
int main() {
	int i, j, k, n;
	scanf("%d", &n);
	for(i = 0; i < n; i++) {
		scanf("%d %d", &x[i], &y[i]);
	}
	for(i = 0; i < n; i++)
		for(j = i + 1; j < n; j++) {
			vector<int> v;
			for(k = 0; k < n; k++)
				if(k == i || k == j || cross(i, j, k) == 0)
					v.push_back(k);
			ss[v] = max(ss[v], dist(i, j));
		}
	double tot = 0;
	for(auto &e : ss)
		tot += sqrt(e.second);
	printf("%.0f\n", tot);
}

Full text and comments »

  • Vote: I like it
  • +1
  • Vote: I do not like it

By felipeblassioli, history, 9 years ago, In English

I've written this in response to this comment when I was solving a different problem.

Summary: You got weight:

  • Lookup time (in graph traversals)
  • str to int conversion -> reading the graph
  • int to str conversion -> output the graph

Take this problem for example, it has an awfully big input and also a big output. (for big I mean ~10^5 ints)

We have more than 200000 numbers. (when N=10^5 and M=10^5)

I`ve used an EC2 small machine to do the timings and they were quite close to the timus judge's output. EC2 seems slower than timus.

Here's some timings for the input with max N and M:

Using a list for the graph (int indexed):

from sys import stdin
from itertools import izip, islice

tkns = iter(map(int, stdin.read().split()))
n,m = int(tkns.next()), int(tkns.next())
graph = [[] for i in xrange(n+1)]
for u,v in islice(izip(tkns,tkns),m):
        graph[u].append(v)
        graph[v].append(u)

In 10 runs the min,max time where: 0.726s and 0m0.814s

Using a dict for the graph (str indexed):

from sys import stdin
from itertools import izip, islice

tkns = iter(stdin.read().split())
n,m = int(tkns.next()), int(tkns.next())
graph = {str(i): [] for i in xrange(1,n+1)}
for u,v in islice(izip(tkns,tkns),m):
        graph[u].append(v)
        graph[v].append(u)

In 10 runs the min,max time where: 0.582s and 0m0.652s

The time limit for this problem was 1s, my solution took 0.717 and is currently the only python solution. The "same" solution in C took 0.072s.

Should we ALWAYS use dict based graphs?

NO, even though the access time for list is faster, if you traverse the graph a lot and you have many edges or vertices you probably should pay the price of 0.1-0.2s (which is not cheap). Here's a problem (timus1389-roadworks) which I couldn`t use a dict based graph to AC it. The solution is in the post: Maximal Matching in a Tree.

Keep in mind that if you have to cast int to str for output (in this problem you had to output lots of stuff) you have to pay this price too. (and it is not cheap)

Observations:

  • Casting str to int is not cheap
  • str to int and int to str DO NOT cost the same.

Related SO: http://stackoverflow.com/questions/11687183/more-efficient-to-convert-string-to-int-or-inverse

Full text and comments »

  • Vote: I like it
  • +3
  • Vote: I do not like it

By felipeblassioli, history, 9 years ago, In English

EDIT: Solved with iterative version and optmizations with 0.982s 28 424 KB. Recursive version throws runtime error at test 31. (non-exit code 0). I think it is segmentation fault

Lesson learned: Python stack frames are heavy, too much recursion is no option if you cant use import resource and get more machine resources.


I am trying to pass a timus problem in python, but I still get TLE in test 23. I tried looots of stuff, so do you guys have any idea?

The input is quite big, I've passed some problems in python with big (10^4, 10^5) inputs and its always quite a pain. Input has a maximum of 100000 lines. And it a graph that is a tree.

Example:

Maybe I should read the input in other way? Maybe I could print the dp solution in a better way? (is solve() really needed?)

Here's the code:

def main():
    from sys import stdin, setrecursionlimit, stdout
    from itertools import islice, izip
    from collections import deque

    setrecursionlimit(10**5+5)
    tkns = iter(stdin.read().split())
    n,m = int(tkns.next()), int(tkns.next()) 

    graph = {str(i): [] for i in xrange(1,n+1)}

    for u,v in izip(tkns, tkns):
        graph[u].append(v)
        graph[v].append(u)

    dp1 = {}
    dp2 = {}
    parent = {}

    visited = set()
    def dfs(v):
        visited.add(v)
        if len(graph[v]) == 0:
            dp1[v] = dp2[v] = 0
            return 

        a = 0
        tmp = deque()
        for c in graph[v]:
            if c not in visited:
                parent[c] = v
                dfs(c)
                a += max(dp1[c],dp2[c])
                tmp.append( 1 + dp2[c] - max(dp1[c],dp2[c]) )

        dp2[v] = a
        if len(tmp) > 0:
            dp1[v] = max( t + a for t in tmp )
        else:
            dp1[v] = 0

    w = stdout.write
    def solve(v):
        a = dp1[v]
        b = dp2[v]
        if b >= a:
            for c in graph[v]:
                if c != parent[v]:
                    solve(c)
        else:
            best = 0
            for c in graph[v]:
                if c != parent[v]:
                    t = dp2[c]
                    x = 1 + t + b - max(t,dp1[c])
                    if x >= best:
                        best = x
                        best_c = c
                    solve(c)
            w('%s %s\n' % (v,best_c))

    root = u
    parent[root] = None
    dfs(root)
    w('%d\n' % max(dp1[root],dp2[root]))
    solve(root)

main()

AC Solution (Iterative):

def main():
    from sys import stdin, stdout
    from itertools import izip
    from collections import deque

    tkns = iter(map(int,stdin.read().split()))
    n = tkns.next()
    m = tkns.next()

    graph = [[] for i in xrange(n+1)]

    for u,v in izip(tkns, tkns):
        graph[u].append(v)
        graph[v].append(u)

    dp1 = {}
    dp2 = {}
    best = {}
    cache = {}

    root = u
    reverse_topological_order = deque(maxlen=n)
    stk = deque([root],n)
    app = stk.append
    while stk:
        v = stk.pop()
        if len(graph[v]) == 0:
            dp1[v] = dp2[v] = 0
            best[v] = -1
            cache[v] = 0
        else:
            reverse_topological_order.appendleft(v)
            for c in graph[v]:
                graph[c].remove(v)
                app(c)

    for v in reverse_topological_order:
        dp2[v] = sum( cache[c] for c in graph[v] )
        a = dp2[v]
        M = 0
        for c in graph[v]:
            b = 1 + dp2[c] + a - cache[c] 
            if M <= b:
                M = b
                best_c = c

        dp1[v] = M
        if dp2[v] >= dp1[v]:
            best[v] = -1
            cache[v] = dp2[v]
        else:
            best[v] = best_c
            cache[v] = dp1[v]

    w = stdout.write
    stk = deque([root],n)
    w('%d\n' % cache[root])
    ret = []
    app = ret.append
    while stk:
        v = stk.pop()
        if best[v] != -1:
            app('%s %s\n' % (v,best[v]))
        stk.extend(graph[v])
    w(''.join(ret))

main()

Recursive Solution (RTE):

Got RTE probably because of segmentation fault

def main():
    from sys import stdin, setrecursionlimit, stdout, exit
    from itertools import islice, izip
    from collections import deque

    setrecursionlimit(10**5+1000)
    tkns = iter(map(int,stdin.read().split()))
    n = tkns.next()
    m = tkns.next()

    graph = [[] for i in xrange(n+1)]

    for u,v in islice(izip(tkns, tkns),m):
        graph[u].append(v)
        graph[v].append(u)

    dp1 = {}
    dp2 = {}
    best = {}
    cache = {}

    def dfs(v):
        if len(graph[v]) == 0:
            dp1[v] = dp2[v] = 0
            best[v] = -1
            cache[v] = 0
            return 

        a = 0
        for c in graph[v]:
            graph[c].remove(v)
            dfs(c)
            a += cache[c]

        dp2[v] = a
        M = 0
        for c in graph[v]:
            b = 1 + dp2[c] + a - cache[c] 
            if M <= b:
                M = b
                best_c = c

        dp1[v] = M
        if dp2[v] >= dp1[v]:
            cache[v] = dp2[v]
            best[v] = -1
        else:
            cache[v] = dp1[v]
            best[v] = best_c

    w = stdout.write
    ret = []
    app = ret.append
    def solve(v):
        for c in graph[v]:
            solve(c)
        if best[v] != -1:
            app('%s %s\n' % (v,best[v]))

    from random import randint
    root = randint(1,n)
    dfs(root)
    w('%d\n' % max(dp1[root],dp2[root]))
    solve(root)
    w(''.join(ret))

main()

Full text and comments »

  • Vote: I like it
  • +5
  • Vote: I do not like it

By felipeblassioli, history, 9 years ago, In English

Can we modify Floyd-Warshall and have dist[u][u] = shortest distance from u to itself?

Floyd-Warshall is initialized with dist[u][u] = 0 , can we do dist[u][u] = INF and run a default implementation of Floyd-Warshall hoping that dist[u][u] is the shortest distance from u to u? (i.e the minimum weight cycle)?

Does this works for both directed and undirected graphs?

I'm taking for a default implementation something like wikipedia:

let dist be a |V| × |V| array of minimum distances initialized to ∞ (infinity)
2 for each vertex v
3    dist[v][v] ← 0
4 for each edge (u,v)
5    dist[u][v] ← w(u,v)  // the weight of the edge (u,v)
6 for k from 1 to |V|
7    for i from 1 to |V|
8       for j from 1 to |V|
9          if dist[i][j] > dist[i][k] + dist[k][j] 
10             dist[i][j] ← dist[i][k] + dist[k][j]
11         end if

Apparently it works for directed graphs. Is there a way to modify Floyd-Warshall and find the minimum weight cycle?

The problem with this approach for undirected graphs is:

If your undirected graph is a tree, there are no cycles right? Suppose there is a edge (u,v) whose weight is dist[u][v] = dist[v][u] = 1. (remember that with the 'modification' dist[u][u] = INF and not zero anymore)

The at iteration i = j = u and k = v:

dist[i][j] > dist[i][k] + dist[k][j] => dist[u][u] > dist[u][v] + dist[v][u] => INF > 1 + 1

So dist[u][u] = 2. But the distance from u to itself cant be 2, because the graph has no cycles (it is a tree)!

Full text and comments »

  • Vote: I like it
  • +4
  • Vote: I do not like it