Good afternoon there are json

var model = {"ALLSKUS":["84664020","07961015","84664113","84664116"],"NBR":"137127","PRICERANGE":"$186.99 - $189.99","GENDER_AGE":"Men's","PRICEADJUSTDATE":"","AVAILABLE_SIZES":[" 07.5"," 08.0"," 08.5"," 09.0"," 09.5"," 10.0"," 10.5"," 11.0"," 11.5"," 12.0"," 12.5"," 13.0","14.0","15.0"],"DISCOUNT_PERCENT":"15","isFieldTestable":false,"SORT":"152","HASCUSTOMPRODUCTTEMPLATE":false,"PR_LIST":"224.99","SPORTS":[{"ID":"3","NM":"Basketball"},{"ID":"39","NM":"Casual"}],"SIZECHART_CD":"S0584","HASSIZES":true,"PR_SALE":"189.99","LOCALIZATION":{},"MODELTEMPLATE":{"ISMODELTEMPLATEACTIVE":"N","MODELTEMPLATE_IMAGE":""},"ISCUSTOMPRODUCT":false,"INTRODUCTIONDATE":"","SKU":"84664020","ISINTANGIBLE":false,"PROD_TP":"Shoes","CUSTPROD_CD":"","NM":"Jordan Retro 6 - Men's","REVIEWS": 
  1. I'm looking for it

     "AVAILABLE_SIZES":[" 07.5"," 08.0"," 08.5"," 09.0"," 09.5"," 10.0"," 10.5"," 11.0"," 11.5"," 12.0"," 12.5"," 13.0"," 14.0"," 15.0"] 

then I remove all unnecessary

  1. The output should be a table.csv

     |размер|размер|размер|размер|размер|размер|размер|размер|размер|размер| |07.0|07.5|08.0|10.0|10.5|11.5|12.0"|13.0|14.0|15| 
  2. I write this in csv

I look for data through regular expressions:

 ad = requests.get('http://www.footlocker.com/product/model:132512/sku:A1781919/timberland-roll-top-mens/tan/tan/').text #сылка для примера bb = re.findall(r'"AVAILABLE_SIZES":(.*)"DISCOUNT_PERCENT"', ad) out: ['[" 07.0"," 07.5"," 08.0"," 10.0"," 10.5"," 11.5"," 12.0"," 13.0"," 14.0"," 15.0"],'] 

Then I remove them too much data

How now to remove too much? On swears to replace. The output error is a space in the incorrect output of json?

  out: bb = re.findall(r'"AVAILABLE_SIZES":(.*)],"DISCOUNT_PERCENT"', ad).str(var).replace('[', ' ') AttributeError: 'list' object has no attribute 'str' 

update

 bb_strings = re.findall(r'var model = ({.*})', ad) bp = {} if bb_strings: bp = json.loads(bb_strings[0]) out: {'ALLSKUS': ['A1781919', '6635A001', '6634A'], 'NBR': '132512', 'PRICERANGE': '$99.99 - $125.99', 'GENDER_AGE': "Men's", 'PRICEADJUSTDATE': '', 'AVAILABLE_SIZES': [' 07.0', ' 07.5', ' 08.0', ' 10.0', ' 10.5', ' 11.5', ' 12.0', ' 13.0', ' 14.0', ' 15.0'], 'DISCOUNT_PERCENT': '10', 'isFieldTestable': False, 'SORT': '1036', 'HASCUSTOMPRODUCTTEMPLATE': False, 'PR_LIST': '139.99', 'SPORTS': [{'ID': '31', 'NM': 'Snow'}, {'ID': '39', 'NM': 'Casual'}], 'SIZECHART_CD': 'S0629', 'HASSIZES': True, 'PR_SALE': '125.99', 'LOCALIZATION': {}, 'MODELTEMPLATE': {'ISMODELTEMPLATEACTIVE': 'N', 'MODELTEMPLATE_IMAGE': ''}, 'ISCUSTOMPRODUCT': False, 'INTRODUCTIONDATE': '', 'SKU': '6635A001', 'ISINTANGIBLE': False, 'PROD_TP': 'Shoes', 'CUSTPROD_CD': '', 'NM': "Timberland Roll-Top - Men's", 'REVIEWS': {'HASREVIEWS': True, 'TOTALREVIEWCOUNT': '17', 'WEIGHTEDAVERAGERATING': '4.82', 'WEIGHTEDAVERAGERECOMMENDED': '16'}, 'BRAND': 'Timberland', 'INET_COPY': 'A style unlike any other. The Timberland Roll Top Boot rolls down for a little built-in air conditioning and a whole lotta style. Premium, full-grain leather upper provides comfort, durability and abrasion resistance. Direct-attach seam construction promises lasting durability. Padded collar provides a comfortable fit around the ankle and keeps out debris. Rubber lug sole for traction and durability. Embossed Timberland tree logo on the side.'} for bl in bp['AVAILABLE_SIZES']: footlocker.append(('размер', bl)) 

All the rules work, now how to do that all data was written to csv and not the first values?

now you need to get the data you need

  • Besides the fact that you write poorly in Russian, it would be nice to correct the question so that it is readable at least technically (indents, selections, etc.) - 0andriy
  • There is no JSON on the link provided by you - there is a plain text page. Delete / retrieve data only after JSON is deserialized, that is, converted from text to a tree object - you do not need to work with JSON, as with text, especially with such a clumsy method. Specifically, in your case, an extra comma is the very last character in your JSON array. - m9_psy

1 answer 1

re.findall() returns a list (type list ). The list has no str() method. Besides common methods for (mutable) sequences , the list provides only the sort() method .

Interactively, in the REPL (ptpython, ipython) play: look at what re.findall() returns, which methods the autocompletion shows for the list. A complete list of methods can be seen in the help(list) output.

See How to get information from the json string, which is specified in the Javascript code inside the html page, using python3.x?

  • now updated json on exit, it remains to remove too much - fermeg
  • @fermeg if you have a new problem, then a new question should be asked, and not the old one fixed, - jfs