substring

2015-01-26

#出现多于一次的最长的子串
eg: “abcabcbcb” -> “abc”
eg: “aababa” -> “aba”
解法：为后缀建立Trie树，建树过程中，找到最长的。
代码：https://gist.github.com/senarukana/25b759f1ef1d28405dce

#重复次数最多的子串
找到重复次数最多的子串，如果相等，返回长度最长的
eg: “acbdacb” -> “acb”, 2
eg: “aaaa” -> “aa”,3
解法：为后缀建立Trie树，建树过程中，找到出现次数最多的。
代码： https://gist.github.com/senarukana/eea81781fdc10d211c1f

#重复压缩字符串

给一个字符串判断压缩后，它在给定的dictionary中是不是唯一的。
压缩方式，只保留首尾和中间子串的长度： wabc -> w2c
Dict{word, dog, dag} string: w2D > true; d1g > false

解法：使用空间26 26 n， n代表最长字符串的长度。

#找到回文字符组

List of words. Find pairs of words which form a palindrome on concatenation
INPUT: cigar, tragic, warrener, raw
OUTPUT: {cigar + tragic} = (cigartragic)
(warrener + raw) = (warrenerraw)

思路:哈希
使用哈希存储所有的words
对于每一个word，切割成前缀P和后缀S
如果P是回文，则P可以放在中间，此时检查S的逆序是否存在于哈希表中。
对应tragic，t作为前缀是回文，ragic的逆序是cigar，在哈希表中
如果S是回文，则S可以放在中间，此时检查P的逆序是否存在于哈希表中。
对应warrener，rener作为后缀是回文，war的逆序raw，在哈希表中。

def isPalindrome(word):
    for i in range(len(word)/2):
        if word[i] != word[-(i+1)]:
            return False
    return True

def palindromeWords(words):
    dict = set()
    result = []
    for word in words:
        dict.add(word[::-1])
    for word in words:
        for i in range(len(word)):
            prefix, suffix = word[:i+1], word[i+1:]
            if isPalindrome(prefix) and suffix in dict:
                result.append([suffix[::-1], word])
            if isPalindrome(suffix) and prefix in dict:
                result.append([word, prefix[::-1]])
    return result

时间复杂度O(n len(word) len(word))
空间复杂度O(n * len(word))

#重复最多的字串

given string S, judge whether it’s equal to a substring T which repeats K times. T is unknown, K is unknown
for example:
S = ‘abcabc’, return True, because T = ‘abc’, K = 2
S = ‘ab’, return False

思路：KMP，假定substring长度为l，kmp生成的next值，从l开始到n将会从0开始递增。

KMP next[i]表示字符串S[0..next[i]]与S[i-next[i]..i]相等

def kmpPrepare(word):
    n = len(word)
    next = [-1]*n
    next[0] = -1
    for i in range(1, n):
        j = next[i-1]+1
        while j > 0 and word[i] != word[j]:
            j = next[j-1]+1
        if j > 0:
            next[i] = j
        else:
            if word[0] == word[i]:
                next[i] = 0
            else:
                next[i] = -1
    return next

#消除连续的字符串

Given a string of letters from alphabet. Remove all pairs(2 consecutive same character) of characters which occur consecutively.And do it recursively on remaining string.
eg: acbbbcd -> ad
解法：双指针。维护j表示下次比较的起始位置，迭代i与j进行比较

如果s[i] != s[j]: 将j前移，并 s[j+1] = s[i]
如果s[j] == s[i]: 此时要删除以s[j]为起点的连续的元素，找到下一个使得s[i] != s[j]。完成后将j后移
代码：https://gist.github.com/senarukana/fe6e473cdfa46a080229