Greetings

I need to build an adjacency matrix, based on data obtained by a crawler bot. The data has a list of visited pages, inlinks (links pointing to the page itself), outlinks (links pointing to another page) and extra pages (for example, there are no http).

We need to create such an algorithm so that the program "reads" the information, "bypasses" extra pages, and if there are inlinks / outlinks, would write them as 1, if not, as 0, thereby building a 500x500 binary matrix.

Here is a possible pseudocode:

for each visited page vp for each outlink of vp if link relative revolve link if ink to visited page write 1 else if link dangling ignore it else write 0 

Links to resources and another pseudocode: https://moodle.bbk.ac.uk/pluginfile.php/666888/mod_resource/content/2/sewn_2016_labsheet_3_pseudocode.pdf http://www.dcs.bbk.ac.uk/~ martin / sewn / ls3 / sewn_2016_labsheet_3_full_crawl.xlsx

Is it possible to implement this task in Python? or is it better to use other tools like r and matlab?

  • Yes, really in Python. - Ogurtsov
  • one
    This is a trivial task for Pandas. But it’s not quite clear how exactly you want to build your "adjacency matrix". What is the role of the column "LinkNo" in your Excel? Do I need to take it into account? Edit the question and indicate two small (3-5 lines), but indicative plates and the expected result based on these tables. - MaxU

0