There is an index.html file that refers to three other files one.html, two.html and three.html. The structure is extremely simple. In the one and three files there is an ul tag with the class list, and in the two file of this class there is no tag.

There is also a get.php file that parses these pages and checks for the presence of the .list class, and the answer is whether this class exists or not. Here is the code:

<?php header('Content-type: text/html; charset=UTF-8'); $start = microtime(true); set_include_path(get_include_path().PATH_SEPARATOR.'library/'); set_include_path(get_include_path().PATH_SEPARATOR.'phpQuery/'); require('config.php'); function __autoload( $className ) {require_once( "$className.php" );} echo "<br>".date('H:i:s')." Начинаем парсинг "; echo '<pre>'; $page=file_get_contents('index.html'); $document = phpQuery::newDocument($page); $links=[]; foreach($document->find('ul li a') as $link){ $links[] = pq($link)->attr('href'); } print_r($links); foreach($links as $sublink){ $pageText =new Curl(); $pagenew=$pageText->get_page($sublink); $cat_page = phpQuery::newDocument($pagenew); $catlist = []; foreach($cat_page as $cat_page){ if($item=pq($cat_page)->find('ul.list a')) { echo "class is</br>";}else{ echo "class not is</br>";} } } 

The problem is that the result is that the class is on all three pages, although it is not on the second page. Help please understand what is not right here? This is what is displayed:

 11:07:22 Начинаем парсинг Array ( [0] => one.html [1] => two.html [2] => three.html ) class is class is class is 

Here are the contents of the one.html and three.html pages.

 <!DOCTYPE html> <html lang="ru"> <head> <meta charset="utf-8"> </head> <body> <ul class="list"> <li><a href="link.html">link</a></li> </ul> </body> </html> 

and two.html pages:

 <!DOCTYPE html> <html lang="ru"> <head> <meta charset="utf-8"> </head> <body> <p>нет класса</p> </body> </html> 
  • PhpQuery has a function hasClass. It will help determine if the element has a class. - ilyaplot
  • And how to apply it? I do not see this function in the dock. I do like this if ($ item = pq ($ cat_page) -> find ('ul') -> hasClass ('list')), but now it shows the absence of a class everywhere - ZaurK
  • Here a similar question was asked and the answer should come up. stackoverflow.com/questions/6000743/… - ilyaplot
  • Thank you, but did not fit - ZaurK

1 answer 1

You provide the pages one.html , too.html , free.html . And that condition depends on the direct content of these pages, and poking your finger is not productive, you can really have somewhere li with this class. Finally, make var_dump($item); after the condition to see what he chooses. Most likely, find returns an empty array where there is no element, and an empty array is not false .

further we change the algorithm as follows

 foreach($cat_page as $cat_page){ $item=pq($cat_page)->find('ul.list a'); var_dump($item); if($item) { echo "class is</br>"; }else{ echo "class not is</br>"; } } 

Run the script, look at the input that we return to $item But first, go down to phpDocumentation and find the find method

  /** * Enter description here... * * @return phpQueryObject|QueryTemplatesSource|QueryTemplatesParse|QueryTemplatesSourceQuery */ 

And so we see that it returns an object and the object is all the same true always.

Next, look for how to check the number of elements in this object. And then we come across the stack method

  /** * Internal stack iterator. * * @access private */ public function stack($nodeTypes = null) { if (!isset($nodeTypes)) return $this->elements; if (!is_array($nodeTypes)) $nodeTypes = array($nodeTypes); $return = array(); foreach($this->elements as $node) { if (in_array($node->nodeType, $nodeTypes)) $return[] = $node; } return $return; } 

Yes, all cool we modify our algorithm

  foreach($cat_page as $cat_page){ $item=pq($cat_page)->find('ul.list a'); $arrayForCheck = $item->stack(); if(count($arrayForCheck) > 0) { echo "class is</br>"; }else{ echo "class not is</br>"; } } 

but you still need to look at the files before making any conclusions there. And for the future, use the documentation in the source code, and write your own. And do not forget (hammer) about simple debug methods: 1. var_dump($var);die; dump a variable with the script stopping 2. var_dump(get_class($var)) get the object class name 3. var_dump(get_class_methods(get_class($var))) get the available methods of class 4. etc ...

  • Thank you very much for the detailed analysis and comments, I will understand, missing files and led. - ZaurK
  • one
    @ZaurK you have this foreach($cat_page as $cat_page){ superfluous because it simply overwrites the $ cat_page array with the first element. - Naumov
  • Indeed ... and without foreach it works like it worked. I checked var_dump ($ item); there the object is returned. Here's how I can check for the presence of the class .list on the page, so that depending on its presence or absence, certain actions are developed? - ZaurK
  • @ZaurK var_dump($item->stack()) what does? - Naumov
  • Displays an empty array. Array (0) {} - ZaurK