I would like to create alogritmo, which allows me to tell if a tree is included in another. Thanks to this site I managed to develop an algorithm that allows me to know binary trees, but I would like to summarize it.

When did Beyonce start becoming popular?

I did this with the following code: treeQuestion[0] :

 [Tree('start_VB_ROOT', ['When_WRB_advmod', 'did_VBD_aux', 'Beyonce_NNP_nsubj', Tree('becoming_VBG_xcomp', ['popular_JJ_acomp']), '?_._punct'])] 

We can draw a tree like this:

 >>>questionSpacy = spacy_nlp(question) >>>treeQuestion = nltk_spacy_tree(questionSpacy) >>>print(treeQuestion) When did Beyonce start becoming popular? [to_nltk_tree(sent.root).pretty_print() for sent in en_nlp(predicted.iloc[0,3]).sents] >>>treeQuestion = nltk_spacy_tree(questionSpacy) >>>print(treeQuestion) [Tree('start_VB_ROOT', ['When_WRB_advmod', 'did_VBD_aux', 'Beyonce_NNP_nsubj', Tree('becoming_VBG_xcomp', ['popular_JJ_acomp']), '?_._punct'])] start __________|___________ | | | | becoming | | | | | When did Beyonce ? popular 

And I would like to know if it is contained in the next tree and to what extent it is shown:

 --- 1 --- born and raised in houston, texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of r&b girl-group destiny's child. [Tree('performed_VBD_ROOT', [Tree('born_VBN_advcl', ['and_CC_cc', Tree('raised_VBN_conj', [Tree('in_IN_prep', [Tree('houston_NN_pobj', [',_,_punct'])]), 'texas_NN_npadvmod'])]), ',_,_punct', 'she_PRP_nsubj', Tree('in_IN_prep', [Tree('competitions_NNS_pobj', ['various_JJ_amod', Tree('singing_NN_nmod', ['and_CC_cc', 'dancing_NN_conj'])])]), Tree('as_IN_prep', [Tree('child_NN_pobj', ['a_DT_det'])]), ',_,_punct', 'and_CC_cc', Tree('rose_VBD_conj', [Tree('to_IN_prep', ['fame_NN_pobj']), Tree('in_IN_prep', [Tree('1990s_NNS_pobj', ['the_DT_det', 'late_JJ_amod'])]), Tree('as_IN_prep', [Tree('singer_NN_pobj', ['lead_JJ_compound', Tree('of_IN_prep', ['r&b_NN_punct', Tree('child_NN_pobj', [Tree('destiny_NN_poss', [Tree('group_NN_compound', ['girl_NN_compound', '-_HYPH_punct']), "'s_POS_case"])])])])])]), '._._punct'])] performed ________________________________________________________|__________________________________________________ | | | | | | | | rose | | | | | | | | ___________________|_________ | | | | | | | | | | as | | | | | | | | | | | | | | | | | | | | | singer | | | | | | | | | | _________|_____ | | | | | born | | | | | of | | | | | ____|_____ | | | | | __________|_____ | | | | | | raised in | | | | | child | | | | | | _____|_______ | | | | | | | | | | | | | | in competitions as | in | | destiny | | | | | | | | _________|__________ | | | | | ___________|______ | | | | | | | houston | singing child to 1990s | | | group | | | | | | | | | __________|_______ | | ____|____ | | | ______|____ , she , and . and texas , various and dancing a fame the late lead r&b 's girl , ', _, _ punct', 'she_PRP_nsubj', Tree ( 'in_IN_prep', [Tree ( 'competitions_NNS_pobj', [ 'various_JJ_amod', Tree ( 'singing_NN_nmod', [ 'and_CC_cc', ' --- 1 --- born and raised in houston, texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of r&b girl-group destiny's child. [Tree('performed_VBD_ROOT', [Tree('born_VBN_advcl', ['and_CC_cc', Tree('raised_VBN_conj', [Tree('in_IN_prep', [Tree('houston_NN_pobj', [',_,_punct'])]), 'texas_NN_npadvmod'])]), ',_,_punct', 'she_PRP_nsubj', Tree('in_IN_prep', [Tree('competitions_NNS_pobj', ['various_JJ_amod', Tree('singing_NN_nmod', ['and_CC_cc', 'dancing_NN_conj'])])]), Tree('as_IN_prep', [Tree('child_NN_pobj', ['a_DT_det'])]), ',_,_punct', 'and_CC_cc', Tree('rose_VBD_conj', [Tree('to_IN_prep', ['fame_NN_pobj']), Tree('in_IN_prep', [Tree('1990s_NNS_pobj', ['the_DT_det', 'late_JJ_amod'])]), Tree('as_IN_prep', [Tree('singer_NN_pobj', ['lead_JJ_compound', Tree('of_IN_prep', ['r&b_NN_punct', Tree('child_NN_pobj', [Tree('destiny_NN_poss', [Tree('group_NN_compound', ['girl_NN_compound', '-_HYPH_punct']), "'s_POS_case"])])])])])]), '._._punct'])] performed ________________________________________________________|__________________________________________________ | | | | | | | | rose | | | | | | | | ___________________|_________ | | | | | | | | | | as | | | | | | | | | | | | | | | | | | | | | singer | | | | | | | | | | _________|_____ | | | | | born | | | | | of | | | | | ____|_____ | | | | | __________|_____ | | | | | | raised in | | | | | child | | | | | | _____|_______ | | | | | | | | | | | | | | in competitions as | in | | destiny | | | | | | | | _________|__________ | | | | | ___________|______ | | | | | | | houston | singing child to 1990s | | | group | | | | | | | | | __________|_______ | | ____|____ | | | ______|____ , she , and . and texas , various and dancing a fame the late lead r&b 's girl 

In fact, as you can see, we can find rose to fame , which is very similar to start becoming popular . As for half (3/6) of the tree, here we can say that 50% is shown.

Expected Result

For example:

 >>># создание первого дерева >>>text = start becoming popular >>>textSpacy = spacy_nlp(text1) >>>treeText = nltk_spacy_tree(textSpacy) >>>t = WordTree(treeText[0]) >>># создание второго дерева >>>question = When did Beyonce start becoming popular? >>>questionSpacy = spacy_nlp(question) >>>treeQuestion = nltk_spacy_tree(questionSpacy) >>>q = WordTree(treeQuestion[0]) >>># сравнение деревьев >>>isSubtree(t,q) True 

My attempt

I found a way to find out when one phrase is contained in another. I leave the question open if you can tell me when one tree looks like another. In particular, synonyms or missing words . For example,

  • rose to fame for start becoming popular
  • start being popular to start becoming popular

A class that converts a list into a tree.

 class WordTree: '''Дерево для массива синтаксического анализа зависимости spaCy''' def __init__(self, tree, is_subtree=False): """ Construct a new 'WordTree' object. :param array: Массив, содержащий зависимость :param parent: Родитель массива if существует :return: ничего не возвращает """ self.parent = [] self.children = [] self.data = tree.label().split('_')[0] # первый элемент дерева # Мы также добавим синонимы. for subtree in tree: if type(subtree) == Tree: # Итерации по глубине поддерева. t = WordTree(subtree, True) t.parent=tree.label().split('_')[0] elif type(subtree) == str: surface_form = subtree.split('_')[0] self.children.append(surface_form) 

A function that checks if the tree is contained in another

 def isSubtree(T,S): ''' функция сказать, если два дерева одинаковы ''' if S is None: return True if T is None: return False if areIdentical(T, S): return True return any(isSubtree(c, S) for c in T.children) def areIdentical(root1, root2): ''' функцию сказать, если два корня одинаковы ''' if root1 is None and root2 is None: return True if root1 is None or root2 is None: return False # Проверьте, являются ли данные обоих корней и их детей одинаковыми return (root1.data == root2.data and ((areIdentical(child1 , child2)) for child1, child2 in zip(root1.children, root2.children))) 

example

 # создание первого дерева text = "start becoming popular" textSpacy = spacy_nlp(text) treeText = nltk_spacy_tree(textSpacy) t = WordTree(treeText[0]) # создание второго дерева question = "When did Beyonce start becoming popular?" questionSpacy = spacy_nlp(question) treeQuestion = nltk_spacy_tree(questionSpacy) q = WordTree(treeQuestion[0]) # сравнение деревьев isSubtree(t,q) 

Here we test two trees from two sentences, where one is part of the other. We then have True . It may be interesting to find a way to tell when it is partially contained in another.

Purpose of comparing a tree

This will be used to compare the question tree S with several T_i answer trees. If T_i is similar to S, then it is more likely to contain the answer.

  • What is your problem? You want to develop an algorithm. What prevents you from doing this? - hedgehogues
  • @hedgehogues Yes, but I would like to summarize and get help. That's why I asked a question. Especially when words are missing or words are synonymous. - ThePassenger
  • @jfs Yes, that's true. This is a basic case, but also good. I renewed. If S NONE is hit, it is necessarily included. If T is NONE, I put FALSE, but this is discussed. - In connection with the question: What do you mean by "tree is included in another" I added a graphic representation of the trees from spaCy. At first it was a strict inclusion. Now I want to know how much tree S is represented in T, that is, even if it lacks words or has similar words. - ThePassenger
  • @ThePassenger is unclear what is meant by "Now I want to know how much tree S is represented in T, that is, even if it lacks words or has similar words." As I understand it, you want to know how much one tree is included in another. But here you need to specify what exactly you are behind this. Otherwise, the decision may be inadequate - hedgehogues
  • @hedgehogues Yes! This is exactly, and this is exactly the problem. I would say that S is similar to T if T contains elements of S This is perhaps the most naive approach, but perhaps a good start. - ThePassenger

0