Hello, I recently encountered the problem of processing large Excel files (about 160 MB, more than 560000, more than 26 columns). In general, I'm trying to load an Excel spreadsheet into a DataSet and process it. I use the following code:

using (System.Data.OleDb.OleDbConnection conn = new System.Data.OleDb.OleDbConnection(connectionString)) { conn.Open(); System.Data.OleDb.OleDbCommand cmd = new System.Data.OleDb.OleDbCommand(); cmd.Connection = conn; DataTable dtSheet = conn.GetOleDbSchemaTable(System.Data.OleDb.OleDbSchemaGuid.Tables, null); string sheetName = String.Empty; try { foreach (DataRow dr in dtSheet.Rows) { // dr["TABLE_NAME"] = "Table"; sheetName = dr["TABLE_NAME"].ToString(); // Get all rows from the Sheet cmd.CommandText = "SELECT * FROM [" + sheetName + "]"; DataTable dt = new DataTable(); dt.TableName = sheetName; System.Data.OleDb.OleDbDataAdapter da = new System.Data.OleDb.OleDbDataAdapter(cmd); da.Fill(dt); ds.Tables.Add(dt); } } catch (Exception ex) { outputDataExcel(string.Format("Ошибка заполнения таблицы. Error:{0}", ex.Message)); } } 

On the line da.Fill(dt) an error occurs: "Not enough system resources.", Although the machine still has 9 GB of RAM, the process of filling the table ( da.Fill(dt) ) eats up to 2.3 GB of RAM.

Help solve the problem, tried to export data through Microsoft.Office.Interop.Excel , but it works very slowly due to accessing Excel cells.

Update

I tried to do this:

 DataTable dataTable = new DataTable(); using (OleDbConnection connection = new OleDbConnection(connectionString)) { OleDbCommand command = new OleDbCommand("SELECT * FROM [Лист1$]", connection); connection.Open(); OleDbDataReader reader = command.ExecuteReader(); try { dataTable.Load(reader); } catch (Exception ex) { } while (reader.Read()) { Console.WriteLine(reader[0].ToString()); } reader.Close(); } 

Got the same error in the string

 OleDbDataReader reader = command.ExecuteReader(); 
  • Normal code: - Jiraiya
  • You load all the tables in the DataSet at once. Ship and process them one by one. Or even the first is not loaded? - Alexander Petrov
  • @Alexander Petrov There is only one table, and as I understand, cmd.CommandText = "SELECT * FROM [" + sheetName + "]"; selects one table. - Jiraiya
  • Yes, but after all in a cycle they are added in. - Alexander Petrov
  • @Alexander Petrov and? Falls on the first table. - Jiraiya

2 answers 2

He asked, I will answer. The error occurs due to the small address space (amount of RAM) allocated to the process by the 32nd system (who cares about the link !). The project was going in visual studio, and it has such an option (Debug-> Configuration Manager), where you can build a project for a 32-bit or 64-bit system (by default, Any CPU). In general, I assembled a project for a 64-bit system and the error disappeared. PS There is one more subtlety - the project uses Microsoft.ACE.OLEDB.12.0 when working with COM objects (I could be wrong), if you want to build a project under a 64-bit platform, then you should have a 64-bit Microsoft Office, t .to. it is he who basically installs the Microsoft.ACE.OLEDB.12.0 driver.

  • Tell me, is this a solution to the question or its addition? - Nicolas Chabanovsky

Self-written code for streaming imports xslx. It exports excel as text. Does not require non-standard libraries. Supports NET 2.0. sheet_no = 1 to put. Pages start at 1.

  public static List<string[]> import_xlsx(Stream file, int sheet_no) { //FileStream file = new FileStream(filename, FileMode.Open); //XmlDocument sharedString = new XmlDocument(); XmlDocument page = new XmlDocument(); List<string> sharedString = new List<string>(); List<string> colums = new List<string>(); List<string[]> data = new List<string[]>(); string Query = "xl/worksheets/sheet"+ sheet_no + ".xml"; int ready = 0; while (ready < 2) { byte[] head = new byte[30]; file.Read(head, 0, 30); if (head[0] != 'P') break; //zip-header int i =(head[27] + head[29]) * 256 + head[28]; // extra len long paked = head[18] + ((head[19] + (head[20] + head[21] * 256) * 256) * 256); byte[] nam = new byte[255]; file.Read(nam, 0, head[26]); if (i!=0) file.Seek(i, SeekOrigin.Current); String aname = System.Text.Encoding.ASCII.GetString(nam, 0, head[26]); if ((aname == "xl/sharedStrings.xml") || (aname == Query) || (paked == 0)){ long lastpos = file.Position; System.IO.Compression.DeflateStream deflate = new System.IO.Compression.DeflateStream(file, System.IO.Compression.CompressionMode.Decompress); if (aname == "xl/sharedStrings.xml") { XmlDocument xml = new XmlDocument(); xml.Load(deflate); foreach (XmlNode node in xml.SelectNodes("/*/*/*")) sharedString.Add(node.InnerText); ready++; } else if (aname == Query) { page.Load(deflate); ready++; }; file.Position = lastpos + paked; } else file.Seek(paked, SeekOrigin.Current); }; int line = 1; foreach (XmlNode n in page.SelectNodes("/*/*/*")) { /*nodes*/ if (n.LocalName == "row") foreach (XmlNode nn in n.ChildNodes) { string xpos = nn.Attributes["r"].Value; int line2 = 0; int ss = -1; int v = -1; int.TryParse(xpos.Substring(1), out line2); if (line2 != line){ data.Add(colums.ToArray()); /*ИМПОРТ ТУТ*/ colums.Clear(); line = line2; } if (nn.FirstChild != null) int.TryParse(nn.FirstChild.InnerText, out v); colums.Add((nn.Attributes["t"] == null) ? ((nn.FirstChild == null)?"":nn.FirstChild.InnerText) : ((v < 0) ? "" : sharedString[v])); }; if (colums.Count !=0) data.Add(colums.ToArray()); /*ИМПОРТ ТУТ*/ }; return data; } 

To speed up (more than 10,000 records), I recommend rewriting that instead of data.Add (in two places) the data was sent to the DBMS directly ( cmd.ExecuteNonQuery into the loop). For volumes over 2Gb, XmlDocument is better to replace with XmlReader, but this is not very simple.

The xlsx file consists of a zip wrapper, but it differs from the zip in that the labels of the “files” are already affixed. By default, xlsx are packaged by deflate. In another packaged pokachto not met. Next you need a "dictionary" of sharedStrings.xml . And after the dictionary you can read the data sheet1.xml .