I would like to create alogritmo, which allows me to tell if a tree is included in another. Thanks to this site I managed to develop an algorithm that allows me to know binary trees, but I would like to summarize it.
When did Beyonce start becoming popular?
I did this with the following code: treeQuestion[0] :
[Tree('start_VB_ROOT', ['When_WRB_advmod', 'did_VBD_aux', 'Beyonce_NNP_nsubj', Tree('becoming_VBG_xcomp', ['popular_JJ_acomp']), '?_._punct'])] We can draw a tree like this:
>>>questionSpacy = spacy_nlp(question) >>>treeQuestion = nltk_spacy_tree(questionSpacy) >>>print(treeQuestion) When did Beyonce start becoming popular? [to_nltk_tree(sent.root).pretty_print() for sent in en_nlp(predicted.iloc[0,3]).sents] >>>treeQuestion = nltk_spacy_tree(questionSpacy) >>>print(treeQuestion) [Tree('start_VB_ROOT', ['When_WRB_advmod', 'did_VBD_aux', 'Beyonce_NNP_nsubj', Tree('becoming_VBG_xcomp', ['popular_JJ_acomp']), '?_._punct'])] start __________|___________ | | | | becoming | | | | | When did Beyonce ? popular And I would like to know if it is contained in the next tree and to what extent it is shown:
--- 1 --- born and raised in houston, texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of r&b girl-group destiny's child. [Tree('performed_VBD_ROOT', [Tree('born_VBN_advcl', ['and_CC_cc', Tree('raised_VBN_conj', [Tree('in_IN_prep', [Tree('houston_NN_pobj', [',_,_punct'])]), 'texas_NN_npadvmod'])]), ',_,_punct', 'she_PRP_nsubj', Tree('in_IN_prep', [Tree('competitions_NNS_pobj', ['various_JJ_amod', Tree('singing_NN_nmod', ['and_CC_cc', 'dancing_NN_conj'])])]), Tree('as_IN_prep', [Tree('child_NN_pobj', ['a_DT_det'])]), ',_,_punct', 'and_CC_cc', Tree('rose_VBD_conj', [Tree('to_IN_prep', ['fame_NN_pobj']), Tree('in_IN_prep', [Tree('1990s_NNS_pobj', ['the_DT_det', 'late_JJ_amod'])]), Tree('as_IN_prep', [Tree('singer_NN_pobj', ['lead_JJ_compound', Tree('of_IN_prep', ['r&b_NN_punct', Tree('child_NN_pobj', [Tree('destiny_NN_poss', [Tree('group_NN_compound', ['girl_NN_compound', '-_HYPH_punct']), "'s_POS_case"])])])])])]), '._._punct'])] performed ________________________________________________________|__________________________________________________ | | | | | | | | rose | | | | | | | | ___________________|_________ | | | | | | | | | | as | | | | | | | | | | | | | | | | | | | | | singer | | | | | | | | | | _________|_____ | | | | | born | | | | | of | | | | | ____|_____ | | | | | __________|_____ | | | | | | raised in | | | | | child | | | | | | _____|_______ | | | | | | | | | | | | | | in competitions as | in | | destiny | | | | | | | | _________|__________ | | | | | ___________|______ | | | | | | | houston | singing child to 1990s | | | group | | | | | | | | | __________|_______ | | ____|____ | | | ______|____ , she , and . and texas , various and dancing a fame the late lead r&b 's girl, ', _, _ punct', 'she_PRP_nsubj', Tree ( 'in_IN_prep', [Tree ( 'competitions_NNS_pobj', [ 'various_JJ_amod', Tree ( 'singing_NN_nmod', [ 'and_CC_cc', '--- 1 --- born and raised in houston, texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of r&b girl-group destiny's child. [Tree('performed_VBD_ROOT', [Tree('born_VBN_advcl', ['and_CC_cc', Tree('raised_VBN_conj', [Tree('in_IN_prep', [Tree('houston_NN_pobj', [',_,_punct'])]), 'texas_NN_npadvmod'])]), ',_,_punct', 'she_PRP_nsubj', Tree('in_IN_prep', [Tree('competitions_NNS_pobj', ['various_JJ_amod', Tree('singing_NN_nmod', ['and_CC_cc', 'dancing_NN_conj'])])]), Tree('as_IN_prep', [Tree('child_NN_pobj', ['a_DT_det'])]), ',_,_punct', 'and_CC_cc', Tree('rose_VBD_conj', [Tree('to_IN_prep', ['fame_NN_pobj']), Tree('in_IN_prep', [Tree('1990s_NNS_pobj', ['the_DT_det', 'late_JJ_amod'])]), Tree('as_IN_prep', [Tree('singer_NN_pobj', ['lead_JJ_compound', Tree('of_IN_prep', ['r&b_NN_punct', Tree('child_NN_pobj', [Tree('destiny_NN_poss', [Tree('group_NN_compound', ['girl_NN_compound', '-_HYPH_punct']), "'s_POS_case"])])])])])]), '._._punct'])] performed ________________________________________________________|__________________________________________________ | | | | | | | | rose | | | | | | | | ___________________|_________ | | | | | | | | | | as | | | | | | | | | | | | | | | | | | | | | singer | | | | | | | | | | _________|_____ | | | | | born | | | | | of | | | | | ____|_____ | | | | | __________|_____ | | | | | | raised in | | | | | child | | | | | | _____|_______ | | | | | | | | | | | | | | in competitions as | in | | destiny | | | | | | | | _________|__________ | | | | | ___________|______ | | | | | | | houston | singing child to 1990s | | | group | | | | | | | | | __________|_______ | | ____|____ | | | ______|____ , she , and . and texas , various and dancing a fame the late lead r&b 's girl
In fact, as you can see, we can find rose to fame , which is very similar to start becoming popular . As for half (3/6) of the tree, here we can say that 50% is shown.
Expected Result
For example:
>>># создание первого дерева >>>text = start becoming popular >>>textSpacy = spacy_nlp(text1) >>>treeText = nltk_spacy_tree(textSpacy) >>>t = WordTree(treeText[0]) >>># создание второго дерева >>>question = When did Beyonce start becoming popular? >>>questionSpacy = spacy_nlp(question) >>>treeQuestion = nltk_spacy_tree(questionSpacy) >>>q = WordTree(treeQuestion[0]) >>># сравнение деревьев >>>isSubtree(t,q) True My attempt
I found a way to find out when one phrase is contained in another. I leave the question open if you can tell me when one tree looks like another. In particular, synonyms or missing words . For example,
rose to fameforstart becoming popularstart being populartostart becoming popular
A class that converts a list into a tree.
class WordTree: '''Дерево для массива синтаксического анализа зависимости spaCy''' def __init__(self, tree, is_subtree=False): """ Construct a new 'WordTree' object. :param array: Массив, содержащий зависимость :param parent: Родитель массива if существует :return: ничего не возвращает """ self.parent = [] self.children = [] self.data = tree.label().split('_')[0] # первый элемент дерева # Мы также добавим синонимы. for subtree in tree: if type(subtree) == Tree: # Итерации по глубине поддерева. t = WordTree(subtree, True) t.parent=tree.label().split('_')[0] elif type(subtree) == str: surface_form = subtree.split('_')[0] self.children.append(surface_form) A function that checks if the tree is contained in another
def isSubtree(T,S): ''' функция сказать, если два дерева одинаковы ''' if S is None: return True if T is None: return False if areIdentical(T, S): return True return any(isSubtree(c, S) for c in T.children) def areIdentical(root1, root2): ''' функцию сказать, если два корня одинаковы ''' if root1 is None and root2 is None: return True if root1 is None or root2 is None: return False # Проверьте, являются ли данные обоих корней и их детей одинаковыми return (root1.data == root2.data and ((areIdentical(child1 , child2)) for child1, child2 in zip(root1.children, root2.children))) example
# создание первого дерева text = "start becoming popular" textSpacy = spacy_nlp(text) treeText = nltk_spacy_tree(textSpacy) t = WordTree(treeText[0]) # создание второго дерева question = "When did Beyonce start becoming popular?" questionSpacy = spacy_nlp(question) treeQuestion = nltk_spacy_tree(questionSpacy) q = WordTree(treeQuestion[0]) # сравнение деревьев isSubtree(t,q) Here we test two trees from two sentences, where one is part of the other. We then have True . It may be interesting to find a way to tell when it is partially contained in another.
Purpose of comparing a tree
This will be used to compare the question tree S with several T_i answer trees. If T_i is similar to S, then it is more likely to contain the answer.
Sis similar toTifTcontains elements ofSThis is perhaps the most naive approach, but perhaps a good start. - ThePassenger