Good day. I'm learning python and I actually want to translate the code from R and I am very interested in the following question: How convenient would it be to implement a comparison of data from the excel table with the criteria that are entered in the script, as is done in the R script?

a <- as.numeric(readline(prompt="Введите количество рабочих на вашем предприятии: 1, если меньше 8,5 тыс.чел, 2 в противном случае: ")) b<-as.numeric(readline(prompt="Введите тип вашей индустрии: 2, если высокодоходная, 1 в противном случае: ")) c<-as.numeric(readline(prompt="Введите производительность труда вашего предприятия в тыс. руб./чел: ")) d<-as.numeric(readline(prompt="Введите рентабельность компании в %: ")) e<-as.numeric(readline(prompt="Введите темп роста компании в %: ")) tryCatch( { localenv <- environment() asde<-work_file[as.numeric(work_file$WORKER)==a & as.numeric(work_file$OTRASL)==b & (work_file$PROISVOD>c-500 & work_file$PROISVOD<c+500) & (work_file$RENTAB>d/100-0.2 & work_file$RENTAB<d/100+0.2) & (work_file$TEMP>e/100-0.3 & work_file$TEMP<e/100+0.3),] xyz <- asde[,c("КОМП", "REAL")] saq<-ggplot(xyz, aes(x = factor(xyz[,1]), y = xyz[,2], fill=xyz[,1]),environment = localenv) + geom_bar(stat = "identity") + xlab("Название компаний") + ylab("Объем реализации в млн. рублей")+ ggtitle("Данные по схожим компаниям") + theme(text = element_text(size=12,face="bold"), axis.text.x = element_text(size=14,face="bold"), axis.text.y = element_text(size=14,face="bold")) + scale_y_continuous(limit = c(0, max(xyz[,2])+7000), breaks = seq(0, max(xyz[,2]), by = 10000)) + geom_text(aes(label = xyz[,2]), size = 5,face="bold",vjust=0) return(list(saq,asde)) }, error=function(cond) { print(paste("Компаний с похожими данными не выявлено. Проверьте правильность ввода данных или воспользуйтесь опцией прогнозирования")) }, warning=function(cond) { print(paste("Компаний с похожими данными не выявлено. Проверьте правильность ввода данных или воспользуйтесь опцией прогнозирования")) }) } 
  • What is the question? How to read data from Excel file? They are different - you need to specify the format. - m9_psy
  • one
    This is quite easily done on Pandas and you may already be familiar with working with DataFrame's from R. If you post an example of a data set (as text or a link to a CSV / Excel / TSV file), a brief description of what you need to do and an example the resulting DataFrame (data set), then I could jot down a working version on Python + Pandas ... PS plotting graphs on Pandas is also very easy to do - MaxU
  • dropbox.com/s/nb1xfw5be5kqpzn/data.xlsx?dl=0 <link to data, you need to do: 1. Analysis of companies with comparable characteristics, 2. Prediction of a forecast for the volume of sales of the company. In the first case, (that part of the script indicated above) a comparison is made between the database and the data entered by the user according to: - PavelD
  • 1. Are in the same industry as the company with the data entered (high / low income) 2. Have the same number of workers (more than 8.5 thousand people or less) 3. Have about the same labor productivity (with the allowed deviation of 500 thousand rubles / person) 4. The profitability of these companies should also be at the same level in the entered data (a deviation of 20% is allowed) 5. The growth rate is plus / minus 30%, which in the entered company - PavelD
  • This is all in the line: asde <-work_file [as.numeric (work_file $ WORKER) == a & as.numeric (work_file $ OTRASL) == b & (work_file $ PROISVOD> c-500 & work_file $ PROISVOD <c + 500) & (work_file $ RENTAB> d / 100-0.2 & work_file $ RENTAB <d / 100 + 0.2) & (work_file $ TEMP> e / 100-0.3 & work_file $ TEMP <e / 100 + 0.3),] - PavelD

1 answer 1

You can start with this option:

 import pandas as pd url = r'd:/download/data.xlsx' # читаем Excel в Pandas DataFrame df = pd.read_excel(url) # это нужно будет переделать на ввод текста # или можно читать это из другого CSV/Excel файла a = 1 b = 1 c = 1000 d = 50 e = 80 # query ... qry = ''' WORKER == @a & \ OTRASL == @b & \ PROISVOD > @c-500 & PROISVOD < @c+500 & \ RENTAB > @d/100-0.2 & RENTAB < @d/100+0.2 & \ TEMP > @e/100-0.3 & TEMP < @e/100+0.3 \ ''' print(df.query(qry)) 

Output:

  КОМП НОМЕР REAL TEMP RENTAB PROISVOD OTRASL WORKER 25 Башнефть 18 36948.0 0.833 0.329 838.1 1 1 112 Норильск 6 134617.0 0.878 0.394 1402.1 1 1 
  • Oh great. I basically wrote a similar. I have another question about dog regression: m <-lm (data = work_file, log (REAL) ~ RENTAB + OTRASL + WORKER + log (PROISVOD)), and how best to present it? - PavelD
  • @PawełDiulin, I don’t know R, so I can’t translate this line. As for the packages, I advise you to look at sklearn (full name: sci-kit learn ) - MaxU
  • @PawełDiulin, still worth a look at the scipy.stats package - MaxU
  • @PawełDiulin, I was hooked by your task and I decided to sort out (at least superficially) the topic of machine learning . Can you clarify your purpose? Is this a REAL prediction depending on ['RENTAB','OTRASL','WORKER','PROISVOD'] or visualization or something else? And why did you choose this formula: log (REAL) ~ RENTAB + OTRASL + WORKER + log (PROISVOD)? - MaxU
  • Yes, the prediction value + confidence limits for the forecast. The formula is purely from econometric considerations so that there is no heteroscedasticity. - PavelD