Good evening to all!

The problem is as follows:

Clause 1 I use the RODBC package. The use of the sqlSave function for inserting a table into the database was unsuccessful, because "table not found in channel" channel and did not upload anything there).

2, so I resorted to cycles: I create an empty table in the database myself and fill it in line by line using the insert value (code below) the entire table. All anything, if not speed. This method works for me at a speed of 150 lines per minute. I need to fill in 3 million lines.

In this regard, two questions:

  1. How to fix tbale not found in channel error in claim 1?
  2. Are there any ways to increase the speed to normal in step 2.? Why so slow and how to fix it?
  3. Maybe there is an alternative to these two methods? Proven and working.

    conn <- odbcDriverConnect(connection ="Driver={SQL Server};Server=servername.com; catalog=Analysis;Uid=login;Pwd=parole") 

sqlQuery (conn, paste ("use Analysis;", "if object_id ('Analysis.dbo.ChM_credit_cards') is not null", "drop table dbo.ChM_credit_cards"))

 sqlQuery(conn, paste ("use Analysis; ", "create table Analysis.dbo.ChM_credit_cards (", "id int null, ", "external_id nvarchar(100) null, ", "card_name nvarchar(50) null, ", "user_id bigint null, ", "expiry_status nvarchar(20) null, ", "is_deleted smallint null, ", "expiry_date date null, ", "card_number nvarchar(50) null, ", "is_business smallint null, ", "created_at datetime null, ", "updated_at datetime null, ", "saved_email nvarchar(100) null, ", "holder_name nvarchar(100) null, ", "activated smallint null, ", "identifier_id bigint null)")) nr <- nrow(crcard) for (i in 1:nr) { id_crc <- crcard$id[i] external_id_crc <- crcard$external_id[i] card_name_crc <- crcard$name[i] user_id_crc <- crcard$user_id[i] expiry_status_crc <- crcard$expiry_status[i] is_deleted_crc <- crcard$is_deleted[i] expiry_date_crc <- as.character(crcard$expiry_date[i]) card_number_crc <- crcard$number[i] is_business_crc <- crcard$is_business[i] created_at_crc <- as.character(crcard$created_at[i]) updated_at_crc <- as.character(crcard$updated_at[i]) email_crc <- crcard$saved_email[i] holdername_crc <- crcard$holder_name[i] activated_crc <- crcard$activated[i] identifier_id_crc <- ifelse(is.na(crcard$identifier_id[i])==TRUE, 0, crcard$identifier_id[i]) sqlQuery(conn, paste ("use Analysis; ", "insert into Analysis.dbo.ChM_credit_cards ", "(id , external_id, card_name, ", "user_id, expiry_status, is_deleted, ", "expiry_date, card_number, is_business, ", "created_at, updated_at, saved_email, ", "holder_name, activated, identifier_id)", "values (", id_crc,", '", external_id_crc,"', '", card_name_crc, "', ", user_id_crc, ", '", expiry_status_crc, "', ", is_deleted_crc, ", '", expiry_date_crc, "', '", card_number_crc, "', ", is_business_crc, ", '", created_at_crc,"', '", updated_at_crc,"', '", email_crc, "', '", holdername_crc, "', ", activated_crc, ", ", identifier_id_crc, ")")) 

}

  • In order not to insert a single line with the speed of a turtle, your result query in sql must have one insert several values โ€‹โ€‹of inserted rows insert into tableName (id, field1, field2) values (1, 'x1', 'y1'), (2, 'x2', 'y2'), .... limiting the size of the generated query to reasonable limits. - teran
  • I tried to concatenate several values โ€‹โ€‹(...) into one insert. The result did not impress: max for one insert I managed to insert 1000 lines. (Apparently, restrictions on the length of the request or the amount of memory or something else). So I inserted 3 million lines in about 4 hours. - Tatyana Aulova
  • Yes, I also once rested against the length limit. And where do you insert that 3 million lines, from which source? xs, my experience in the rate of such amounts of data is not great, mb will help disabling the database log, such as deleting large table data through delete and truncate , the latter is done quickly and without writing logs. Checks integrity constraints like you do not. - teran

1 answer 1

On the second issue.

The function for forming the query string:

 sqlInsert <- function(.data, dbname, table) { stopifnot(is.data.frame(.data)) stopifnot(is.character(dbname) && length(dbname) == 1L) stopifnot(is.character(table) && length(table) == 1L) collapse <- function(x) paste0("'", x, "'", collapse = ",") cols <- collapse(names(.data)) vals <- apply(.data, 1, function(row) paste0("(", collapse(row), ")")) vals <- paste(vals, collapse = ",") vals <- gsub("'NA'", "NULL", vals, fixed = TRUE) sql <- sprintf("USE %s; INSERT INFO %s.%s(%s) VALUES(%s);", dbname, dbname, table, cols, vals) return(sql) } sqlInsert(iris[1:5, ], "analytics", "iris") #> [1] "USE analytics; INSERT INFO analytics.iris('Sepal.Length','Sepal.Width','Petal.Length','Petal.Width','Species') VALUES(('5.1','3.5','1.4','0.2','setosa'),('4.9','3.0','1.4','0.2','setosa'),('4.7','3.2','1.3','0.2','setosa'),('4.6','3.1','1.5','0.2','setosa'),('5.0','3.6','1.4','0.2','setosa'));" 

Function to split the source data into parts and send to the server:

 sqlInsert <- function(con, .data, dbname, table, pagesize = 10000) { stopifnot(is.data.frame(.data)) stopifnot(is.character(dbname) && length(dbname) == 1L) stopifnot(is.character(table) && length(table) == 1L) collapse <- function(x) paste0("'", x, "'", collapse = ",") n <- .row_names_info(.data, 2) pages <- split(seq_len(n), seq_len(n) %/% pagesize) cols <- collapse(names(.data)) for (idx in pages) { vals <- apply(.data[idx, ], 1, function(row) paste0("(", collapse(row), ")")) vals <- paste(vals, collapse = ",") vals <- gsub("'NA'", "NULL", vals, fixed = TRUE) sql <- sprintf("USE %s; INSERT INFO %s.%s(%s) VALUES(%s);", dbname, dbname, table, cols, vals) sqlQuery(con, sql) } return(invisible(NULL)) }