import vk_api import pandas as pd import time import json vk_session = vk_api.VkApi('******', '*******') # логин и пароль vk_session.auth() vk = vk_session.get_api() def main(): count_in = vk.groups.getMembers(group_id='******') count = count_in['count'] print(count) offset = 0 i = 0 step = 1000 for count in range(count_in['count'], 0, -step): y = vk.groups.getMembers(group_id='******', offset=i * step,fields='contacts') time.sleep(3) data = y df = pd.io.json.json_normalize(data['items']) df.to_csv(r'********.csv', index=False, mode='a', header=(i == 0), encoding='utf8') i += 1 if __name__ == '__main__': main() 

The fact is that when I open the downloaded csv in Linux, there is no problem, but if it is Windows or Mac OS, then I have problems with the encoding. If I do df.to_excel , then an error occurs that the excel module does not have the keyword mode.

What is this parameter generally responsible for? And is there its equivalent for Excel? How best to solve this problem? Write a converter separately? If so, how can this be implemented for a large number of CSV files? Or can export directly to Excel xlsx file? And how is this implemented?

The fact is that as the loop iterates, the data should be written to the file, and not overwritten.

  • mode is the mode for opening the file '********.csv' . If mode='w' , then the file will be overwritten by new data, if 'a' , new data will be added to the end of the file (there are other mode values) - gil9red

1 answer 1

The .to_excel() method does not allow to add / add data to an existing Excel file. This can be done using this function .

You can also collect all the data in one list, convert it to a DataFrame and save it as an Excel file:

 step = 1000 data = [] for count in range(count_in['count'], 0, -step): y = vk.groups.getMembers(group_id='your_public', offset=offset, fields='contacts') data.append(y['items']) time.sleep(3) df = pd.io.json.json_normalize(data) df.to_excel(r'resulting_file.xlsx', index=False) 

UPDATE: judging by the published file, the data should be collected a little differently:

 step = 1000 data = [] for count in range(count_in['count'], 0, -step): y = vk.groups.getMembers(group_id='your_public', offset=offset, fields='contacts') #data += y['items'] data.extend(y['items']) time.sleep(3) df = pd.io.json.json_normalize(data) df.to_excel(r'resulting_file.xlsx', index=False) 
  • Traceback (most recent call last): File "/home/cloudsurfer/PycharmProjects/PyVkParser/vk_parsing_numbers_v3.py", 47 line line 35, in main df = pd.io.json.json_normalize (data) File "/home/cloudsurfer/PycharmProjects/PyVkParser/venv/lib/python3.5/site-packages/pandas/io/json/normalize.py" , line 193, in json_normalize for x in compat.itervalues ​​(y)] for y in data]): - CloudSurferCode
  • File "/home/cloudsurfer/PycharmProjects/PyVkParser/venv/lib/python3.5/site-packages/pandas/io/json/normalize.py", line 193, for x in compat.itervalues ​​(y) ] for y in data]): File "/home/cloudsurfer/PycharmProjects/PyVkParser/venv/lib/python3.5/site-packages/pandas/compat/__init__.py", line 211, in italvalues ​​return iter (obj. values ​​(** kw)) AttributeError: 'list' object has no attribute 'values' - CloudSurferCode
  • @CloudSurferCode, can you put somewhere the value of the resulting data on which this error falls out? - MaxU