There is a site code and I would like to find links in this code by the following criteria:
The beginning of <td class="player"><a href="/player/ and the end" and what needs to be found between them, there will be 10 such results and everyone should be saved, how can this be done? I found only as 1 save ..
Sample HTML code (very huge):

 b'b'<!DOCTYPE html>\n<html lang="en">\n <head>\n <meta charset="utf-8">\n <meta name="viewport" content="width=device-width, initial-scale=1" id="metaViewport">\n <meta property="fb:admins" content="1004164229">\n <meta property="fb:pages" content="249997999009">\n <meta property="fb:app_id" content="1460388157605817">\n <meta name="google-site-verification" content="DcypRFLQvgYQL5Acx7feoGWbblSsmKv6HpPI7mM_1uw">\n <link rel="apple-touch-icon" sizes="180x180" href="/img/static/favicon/apple-touch-icon.png">\n <link rel="icon" type="image/png" sizes="32x32" href="/img/static/favicon/favicon-32x32.png">\n <link rel="icon" type="image/png" sizes="16x16" href="/img/static/favicon/favicon-16x16.png">\n <link rel="manifest" href="/img/static/favicon/manifest.json">\n <link rel="mask-icon" href="/img/static/favicon/safari-pinned-tab.svg" color="#5bbad5">\n <meta name="theme-color" content="#ffffff">\n <link href="https://fonts.googleapis.com/css?family=Open+Sans:400,400i,700,700i|Oswald:700&amp;amp;subset=latin-ext" rel="stylesheet">\n <link rel="stylesheet" href="/vendor/font-awesome-4.7.0/css/font-awesome.min.css" type="text/css">\n <script type="text/javascript" src="https://cdn.ravenjs.com/3.15.0/raven.min.js"></script>\n <script type="text/javascript" src="/scripts/hltv-csstheme.js?hash=df73d0c197fafcc78aa1a2dd4f4737c7" data-day-css="b3f4be4a65cf62bb5b98ff6fb57100c9" data-night-css="cb563c7041bc4d4ca98a8d616dc7524a"></script>\n <script type="text/javascript" src="/scripts/hltv.js?hash=1230be3cc0ee223e10b6a4f52c7bd2ec"></script>\n <script type="text/javascript" src="https://notification-secure.hltv.org/hltvNotification.js?v2" async="async"></script>\n <script type="text/javascript" src="https://scorebot-secure.hltv.org/scorebotClientApi.js?v5" async="async"></script>\n <title>spray&apos;n&apos;pray vs. Impossible at Headshot Cup #2 | HLTV.org</title>\n <link href="/rss/news" rel="alternate" type="application/rss+xml">\n <meta name="description" content="Complete overview of the spray&apos;n&apos;pray vs. Impossible matchup at Headshot Cup #2!">\n <meta property="og:title" content="HLTV.org - The home of competitive Counter-Strike">\n <meta property="og:image" content="https://www.hltv.org/img/static/openGraphHltvLogo.png">\n <meta property="og:site_name" content="HLTV.org">\n </head>\n <body class="preload colsCustom1101" data-livescore-server-url="https://scorebot-secure.hltv.org">\n <div class="navbar">\n <div class="navcon"><a href="/" class="small-logo"><img alt="HLTV.org" src="/img/static/TopSmallLogo2x.png" class="small-logo-img"></a><a href="/" class="navnews">News</a><a href="/matches" class="navmatches">Matches</a><a href="/results" class="navresults">Results</a><a href="/events" class="navevents">Events</a>\n <div class="navburger navburger1"><i class="fa fa-bars" aria-hidden="true"></i></div>\n <div class="navbreakline1"></div>\n<a href="/stats" class="navstats">Stats</a><a href="/galleries" class="navgalleries">Galleries</a><a href="/ranking/teams" class="navranking smartphone-only">Rankings</a><a href="/forums" class="navforums">Forums</a>\n <div class="navburger navburger2"><i class="fa fa-bars" aria-hidden="true"></i></div>\n <div class="navbreakline2"></div>\n <div class="navsearch search-typeahead">\n <form action="/search?term="><input type="text" class="navsearchinput" name="query" data-topbar-search-url="/search?term=" placeholder="Search...">\n <div class="search-submit-hidden"><input type="submit" tabIndex="-1"></div>\n </form>\n <div class="navsearchborder"></div>\n<span class="navsearchicon"><i class="fa fa-search"></i></span></div>\n <div class="navborder"></div>\n <div class="navsignin" data-overlay-popup-button="" data-overlay-popup-content="overlay-popup-1462440830">Sign in</div>\n <div class="hidden">\n <div class="fixed-overlay-popup-content-con" id="overlay-popup-1462440830">\n <div class="fixed-overlay-popup-content">\n <div class="login-dialog standard-box" data-login-url="/login">\n <div class="logo"><img alt="HLTV.org" src="/img/static/TopSmallLogo2x.png" height="46px"></div>\n <form><input type="text" name="username" class="loginInput" required="required" placeholder="Username"><input type="password" name="password" class="loginInput" required="required" placeholder="Password">\n <div class="login-elm clearfix"><span class="remember-me left"><input type="checkbox" name="autologin" class="loginCheckbox" checked="checked"> Remember me</span><span class="forgot-link right" data-overlay-popup-content="overlay-popup-992863196">Forgot password</span></div>\n <div class="login-error"></div>\n<button type="submit" class="login-button button" name="login">Login</button></form>\n <hr class="login-elm">\n<a href="/signup" class="signup-button button">Sign up</a></div>\n </div>\n </div>\n </div>\n <div class="hidden">\n <div class="fixed-overlay-popup-content-con" id="overlay-popup-992863196">\n <div class="fixed-overlay-popup-content">\n <div class="forgot-password-dialog standard-box">\n <div>\n <div class="logo"><img alt="HLTV.org" src="/img/static/TopSmallLogo2x.png" height="46px"></div>\n <div id="forgot-password-username"><input type="text" name="username" class="loginInput" required="required" placeholder="Username"><span class="validation-error hidden"><i class=" fa fa-times" aria-hidden="true"></i><span class="message"></span></span></div>\n </div>\n <div>\n <div class="g-recaptcha" id="forgot-password-recaptcha"></div>\n<button type="button" class="recover-button button" data-forgot-password-location="/forgotpassword">Recover</button>\n <hr class="login-elm">\n<button type="button" class="back-button button" data-overlay-popup-button="" data-overlay-popup-content="overlay-popup-1462440830">Back</button></div>\n </div>\n </div>\n </div>\n </div>\n <div class="navborder"></div>\n <div class="navdown"><i class="fa fa-caret-down"></i>\n <div class="arrow"></div>\n <div class="arrow2"></div>\n </div>\n <div class="navpopup" id="popupsettings">\n <div class="nav-popup-header">Settings</div>\n <div class="nav-popup-elm"><span>Toggle nightmode</span><span class="right"><span class="toggleUserTheme userTheme-night" data-url="/profile/settings/changetheme?theme=night">On</span><span> / </span><span class="toggleUserTheme userTheme-day selected" data-url="/profile/settings/changetheme?theme=day">off</span></span></div>\n <div class="nav-popup-elm"><span>Timezone</span><span class="right">\n <form action=""><select class="timezoneSelector" data-timezone-update-on-select="1" id="timezoneSelector" name="timezone"></select></form>\n </span></div>\n <div class="nav-popup-elm desktop-mode-con"><span>Force desktop mode</span><span class="right"><span class="toggleDesktopMode desktopModeOn">On</span><span> / </span><span class="toggleDesktopMode desktopModeOff">off</span></span></div>\n </div>\n </div>\n </div>\n <div class="bgPadding">\n <div class="widthControl">\n <div class="logoCon"><a href="/">\n <div class="hltv-logo-container"></div>\n </a>\n <div class="" id="i0_middle"></div>\n <div class="" id="i0_right"></div>\n </div>\n <div class="colCon">\n <div class="contentCol">\n <div class="match-page">\n <div class="standard-box teamsBox">\n <div class="team"><img alt="Ukraine" src="https://static.hltv.org/images/bigflags/300x200/UA.png" class="team1 " title="Ukraine">\n <div class="team1-gradient"><a href="/team/7264/spraynpray"><img alt="spray&apos;n&apos;pray" src="https://static.hltv.org/images/team/logo/7264" class="logo" title="spray&apos;n&apos;pray">\n <div class="teamName">spray&apos;n&apos;pray</div>\n </a></div>\n </div>\n <div class="timeAndEvent">\n <div class="time" data-time-format="HH:mm" data-unix="1496768400000">19:00</div>\n <div class="date" data-time-format="do &apos;of&apos; MMMM Y" data-unix="1496768400000">6th of June 2017</div>\n <div class="event text-ellipsis"><a href="/events/2886/headshot-cup-2" title="Headshot Cup #2">Headshot Cup #2</a></div>\n <div class="text dummy-spacer">\xc2\xa0</div>\n <div class="countdown" data-time-countdown="LIVE" data-unix="1496768400000">1h : 27m : 36s</div>\n </div>\n <div class="team"><img alt="Russia" src="https://static.hltv.org/images/bigflags/300x200/RU.png" class="team2 " title="Russia">\n <div class="team2-gradient"><a href="/team/7835/impossible"><img alt="Impossible" src="https://static.hltv.org/images/team/logo/7835" class="logo" title="Impossible">\n <div class="teamName">Impossible</div>\n </a></div>\n </div>\n </div>\n <div class="section-spacer"></div>\n <div class="flexbox fix-half-width-margin maps">\n <div class="half-width "><span class="headline">Maps</span>\n <div class="standard-box veto-box">\n <div class="padding preformatted-text">Best of 3\n\n* Semi-final</div>\n </div>\n <div class="flexbox-column">\n <div class="mapholder">\n <div class="spacing ">\n <div class="map-name-holder"><img src="/img/static/maps/tba.png" class="minimap">\n <div class="mapname">TBA</div>\n </div>\n </div>\n </div>\n <div class="mapholder">\n <div class="spacing ">\n <div class="map-name-holder"><img src="/img/static/maps/tba.png" class="minimap">\n <div class="mapname">TBA</div>\n </div>\n </div>\n </div>\n <div class="mapholder">\n <div class="spacing optional">\n <div class="map-name-holder"><img src="/img/static/maps/tba.png" class="minimap">\n <div class="mapname">TBA</div>\n </div>\n </div>\n </div>\n </div>\n </div>\n <div class="half-width"><span class="headline">Watch</span>\n <div class="streams">\n <div class="stream-box " data-stream-embed="https://player.twitch.tv/?channel=binarydragons_4"><span class="flagAlign"><img alt="Russia" src="https://static.hltv.org/images/bigflags/30x20/RU.gif" class="stream-flag flag" title="Russia">Binary Dragons 4</span><span class="viewers left-right-padding">0</span></div>\n </div>\n </div>\n </div>\n <div class="section-spacer"></div>\n <div class="video-container hidden">\n <div class="standard-box videoWrapper"></div>\n <div class="section-spacer"></div>\n </div>\n <div class="flexbox fix-half-width-margin">\n <div class="three-quarter-width"><span class="headline">Betting</span>\n <div class="betting standard-box padding">\n <table class="table">\n <tr class="">\n <td class="provider-cell"></td>\n <td class="team-cell">spray&apos;n&apos;pray</td>\n <td class="team-cell"></td>\n <td class="team-cell">Impossible</td>\n </tr>\n <tr class="">\n <td class=""><a href="http://egbaffiliates.com/track?p=tables&amp;aff_id=52"><img src="https://static.hltv.org/images/egb.png" class="betting-logo"></a></td>\n <td class="odds-cell border-left"><a href="http://egbaffiliates.com/track?p=play/simple_bets&amp;aff_id=52&amp;anchor=282890">1.47</a></td>\n <td class="odds-cell border-left"><a href="http://egbaffiliates.com/track?p=play/simple_bets&amp;aff_id=52&amp;anchor=282890">-</a></td>\n <td class="odds-cell border-left"><a href="http://egbaffiliates.com/track?p=play/simple_bets&amp;aff_id=52&amp;anchor=282890">2.41</a></td>\n </tr>\n <tr class="">\n <td class="">\n </div>\n </div>\n <div class="quarter-width"><span class="headline">Pick a winner</span>\n <div class="standard-box pick-a-winner">\n <div class="flexbox-column">\n <div class="pick-a-winner-team team1 canvote" data-pick-a-winner-team="1" data-pick-a-winner-url="/matches/2311385/pickawinner">\n <div class="pick-a-winner-team-name">spray&apos;n&apos;pray</div>\n <div class="percentage">65.8%</div>\n <div class="pick-a-winner-team-bg"><img alt="spray&apos;n&apos;pray" src="https://static.hltv.org/images/team/logo/7264" class="logo" title="spray&apos;n&apos;pray"></div>\n </div>\n <div class="pick-a-winner-team team2 canvote" data-pick-a-winner-team="2" data-pick-a-winner-url="/matches/2311385/pickawinner">\n <div class="pick-a-winner-team-name">Impossible</div>\n <div class="percentage">34.2%</div>\n <div class="pick-a-winner-team-2-bg"><img alt="Impossible" src="https://static.hltv.org/images/team/logo/7835" class="logo" title="Impossible"></div>\n </div>\n </div>\n </div>\n </div>\n </div>\n <div class="section-spacer"></div>\n <div class="csgofastbetting"><iframe id="hltvBetWidget" src="https://hltv.gainskins.com/w3/match/hid/2311385/7264/7835/0cca454036cc79ac81bf35d3e6e1aa87?http://www.hltv.org/team1Name=spray%27n%27pray&amp;http://www.hltv.org/team2Name=Impossible&amp;http://www.hltv.org/startsAt=2017-06-06+19%3A00%3A00&amp;http://www.hltv.org/matchUrl=%2Fmatches%2F2311385%2Fspraynpray-vs-impossible-headshot-cup-2&amp;initialLoad=1&amp;autoResize=1" width="100%" height="347px" frameborder="none"></iframe></div>\n <div class="section-spacer"></div>\n <div class="rek gtSmartphone-only" id="matchpage_1"></div>\n <div class="lineups"><span class="headline">Lineups</span>\n <div class="">\n <div class="lineup standard-box">\n <div class="box-headline flex-align-center"><img alt="spray&apos;n&apos;pray" src="https://static.hltv.org/images/team/logo/7264" class="logo" title="spray&apos;n&apos;pray"><a href="/team/7264/spraynpray">spray&apos;n&apos;pray</a></div>\n <div class="players">\n <table class="table">\n <tr>\n <td class="player"><a href="/player/13899/la3euka">\n <div><img alt="Vladimir &apos;la3euka&apos; Shurygin" src="https://static.hltv.org/images/playerprofile/blankplayer.svg" class="player-photo" title="Vladimir &apos;la3euka&apos; Shurygin"></div>\n </a></td>\n <td class="player"><a href="/player/8368/jmqa">\n <div><img alt="Savelii &apos;jmqa&apos; Bragin" src="https://static.hltv.org/images/playerprofile/thumb/8368/400.jpeg?v=5" class="player-photo" title="Savelii &apos;jmqa&apos; Bragin"></div>\n </a></td>\n <td class="player"><a href="/player/9349/F1L1N">\n <div><img alt="Ivan &apos;F1L1N&apos; Semenets" src="https://static.hltv.org/images/playerprofile/thumb/9349/400.jpeg?v=2" class="player-photo" title="Ivan &apos;F1L1N&apos; Semenets"></div>\n </a></td>\n <td class="player"><a href="/player/7609/Tresh1k">\n <div><img alt="Bogdan &apos;Tresh1k&apos; Nakonechniy" src="https://static.hltv.org/images/playerprofile/thumb/7609/400.jpeg?v=2" class="player-photo" title="Bogdan &apos;Tresh1k&apos; Nakonechniy"></div>\n </a></td>\n <td class="player"><a href="/player/1866/Shara">\n <div><img alt="Oleksandr &apos;Shara&apos; Hordieyev" src="https://static.hltv.org/images/playerprofile/thumb/1866/400.jpeg?v=2" class="player-photo" title="Oleksandr &apos;Shara&apos; Hordieyev"></div>\n </a></td>\n </tr>\n <tr>\n <td class="player"><a href="/player/13899/la3euka">\n <div class="flagAlign"><img alt="Russia" src="https://static.hltv.org/images/bigflags/30x20/RU.gif" class="flag gtSmartphone-only" title="Russia">\n <div class="text-ellipsis">la3euka</div>\n </div>\n </a></td>\n <td class="player"><a href="/player/8368/jmqa">\n <div class="flagAlign"><img alt="Russia" src="https://static.hltv.org/images/bigflags/30x20/RU.gif" class="flag gtSmartphone-only" title="Russia">\n <div class="text-ellipsis">jmqa</div>\n </div>\n </a></td>\n <td class="player"><a href="/player/9349/F1L1N">\n <div class="flagAlign"><img alt="Ukraine" src="https://static.hltv.org/images/bigflags/30x20/UA.gif" class="flag gtSmartphone-only" title="Ukraine">\n <div class="text-ellipsis">F1L1N</div>\n </div>\n </a></td>\n <td class="player"><a href="/player/7609/Tresh1k">\n <div class="flagAlign"><img alt="Ukraine" src="https://static.hltv.org/images/bigflags/30x20/UA.gif" class="flag gtSmartphone-only" title="Ukraine">\n <div class="text-ellipsis">Tresh1k</div>\n </div>\n </a></td>\n <td class="player"><a href="/player/1866/Shara">\n <div class="flagAlign"><img alt="Ukraine" src="https://static.hltv.org/images/bigflags/30x20/UA.gif" class="flag gtSmartphone-only" title="Ukraine">\n <div class="text-ellipsis">Shara</div>\n </div>\n </a></td>\n </tr>\n </table>\n </div>\n </div>\n <div class="lineup standard-box">\n <div class="box-headline flex-align-center"><img alt="Impossible" src="https://static.hltv.org/images/team/logo/7835" class="logo" title="Impossible"><a href="/team/7835/impossible">Impossible</a></div>\n <div class="players">\n <table class="table">\n <tr>\n <td class="player"><a href="/player/8120/PLAZ">\n <div><img alt="Kiril &apos;PLAZ&apos; Sidorov" src="https://static.hltv.org/images/playerprofile/thumb/8120/400.jpeg?v=1" class="player-photo" title="Kiril &apos;PLAZ&apos; Sidorov"></div>\n </a></td>\n <td class="player"><a href="/player/9082/krecker">\n <div><img alt="Petr &apos;krecker&apos; Stepanov" src="https://static.hltv.org/images/playerprofile/thumb/9082/400.jpeg?v=1" class="player-photo" title="Petr &apos;krecker&apos; Stepanov"></div>\n </a></td>\n <td class="player"><a href="/player/7404/insom">\n <div><img alt="Igor &apos;insom&apos; Cherkasov" src="https://static.hltv.org/images/playerprofile/thumb/7404/400.jpeg?v=1" class="player-photo" title="Igor &apos;insom&apos; Cherkasov"></div>\n </a></td>\n <td class="player"><a href="/player/12015/AKIMOV">\n <div><img alt="Erik &apos;AKIMOV&apos; Akimov" src="https://static.hltv.org/images/playerprofile/blankplayer.svg" class="player-photo" title="Erik &apos;AKIMOV&apos; Akimov"></div>\n </a></td>\n <td class="player"><a href="/player/12016/svyat">\n <div><img alt="Svyatoslav &apos;svyat&apos; Dovbakh" src="https://static.hltv.org/images/playerprofile/blankplayer.svg" class="player-photo" title="Svyatoslav &apos;svyat&apos; Dovbakh"></div>\n </a></td>\n </tr>\n <tr>\n <td class="player"><a href="/player/8120/PLAZ">\n <div class="flagAlign"><img alt="Russia" src="https://static.hltv.org/images/bigflags/30x20/RU.gif" class="flag gtSmartphone-only" title="Russia">\n <div class="text-ellipsis">PLAZ</div>\n </div>\n </a></td>\n <td class="player"><a href="/player/9082/krecker">\n <div class="flagAlign"><img alt="Russia" src="https://static.hltv.org/images/bigflags/30x20/RU.gif" class="flag gtSmartphone-only" title="Russia">\n <div class="text-ellipsis">krecker</div>\n </div>\n </a></td>\n <td class="player"><a href="/player/7404/insom">\n <div class="flagAlign"><img alt="Russia" src="https://static.hltv.org/images/bigflags/30x20/RU.gif" class="flag gtSmartphone-only" title="Russia">\n <div class="text-ellipsis">insom</div>\n </div>\n </a></td>\n <td class="player"><a href="/player/12015/AKIMOV">\n <div class="flagAlign"><img alt="Russia" src="https://static.hltv.org/images/bigflags/30x20/RU.gif" class="flag gtSmartphone-only" title="Russia">\n <div class="text-ellipsis">AKIMOV</div>\n </div>\n </a></td>\n <td class="player"><a href="/player/12016/svyat">\n <div class="flagAlign"><img alt="Russia" src="https://static.hltv.org/images/bigflags/30x20/RU.gif" class="flag gtSmartphone-only" title="Russia">\n ' 
  • 3
    Use BeautifulSoup . - Wiktor Stribiżew
  • I understand this thing makes of unreadable code in readable? But I don't need it .. - Mr Lucky Tomas
  • 3
    BeautifulSoup is an HTML parser. What you need. - Wiktor Stribiżew
  • one
    Add to the question an example with a piece of html and what data from it you need to pull out. For parsing html you need to use html parser, regulars are not the best tool for parsing xml / html - gil9red
  • An example added. - Mr Lucky Tomas

3 answers 3

Perhaps the re.findall or re.finditer? methods will help you re.finditer?
findall quite easy to use:

 for i in re.findall('<td class="player"><a href="/player/([A-Za-z0-9/]*)"' ,text): print(i) 

and it will find the users you need:

 13899/la3euka 8368/jmqa 9349/F1L1N 7609/Tresh1k 1866/Shara 

So, notice that re.findall accepts a variable of type str , and you get your html page as a set of bytes.

To get the set of all values, you can first bring the list to a set, and then back to the list:

 find = re.findall('<td class="player"><a href="/player/([A-Za-z0-9/]*)"' ,text) print(list(set(find))) 
  • Probably, but I don’t understand how to use them, I get a TypeError error: can’t use a string pattern on a
  • @MrLuckyTomas what does google translate say? - Pavel Durmanov
  • I already understood that str () should have been added, but it starts correctly but does not stop - Mr Lucky Tomas
  • How to remove repetition? There are links to the photo and the profile itself, it seems, should give out 10 users, and gives out 20 - Mr Lucky Tomas
  • @MrLuckyTomas Use set(content) - Pavel Durmanov

Another option:

 In [43]: from bs4 import BeautifulSoup In [44]: soup = BeautifulSoup(html, 'html.parser') In [46]: links = set(s.find('a').get('href') for s in soup.find_all('td', {'class':['player']})) In [47]: links Out[47]: {'/player/12015/AKIMOV', '/player/12016/svyat', '/player/13899/la3euka', '/player/1866/Shara', '/player/7404/insom', '/player/7609/Tresh1k', '/player/8120/PLAZ', '/player/8368/jmqa', '/player/9082/krecker', '/player/9349/F1L1N'} 
  • one
    Inside set () you can remove the square brackets in the list :) - gil9red
  • @ gil9red, really, thanks! :) - MaxU

I will give an example of parsing through BeautifulSoup , similarly you can do it through another library, for example lxml :

 # Строка или байтовый массив с страницей html html = ... from bs4 import BeautifulSoup root = BeautifulSoup(html, 'lxml') player_list = set() for a in root.select('.player > a[href]'): player = a['href'].replace('/player/', '') player_list.add(player) # Короткая запись # player_list = set(a['href'].replace('/player/', '') for a in root.select('.player > a[href]')) print(player_list) 

Console:

 {'9349/F1L1N', '7609/Tresh1k', '9082/krecker', '7404/insom', '1866/Shara', '13899/la3euka', '8120/PLAZ', '12015/AKIMOV', '12016/svyat', '8368/jmqa'}