It is necessary to create folders and subfolders in them by the names of business entities so that the names are as close as possible to the original ones.

At the same time, it is necessary to take into account the limitations of the operating system, which adds hassle - if there are virtually no limitations on linux-macos, then they are more than full on windows.

Such code turned out here, inadmissible characters are replaced with a point.

private static readonly string NormalizationPattern = string.Format(@"([{0}]*\.+$)|([{0}]+)", Regex.Escape(string.Concat(new string(Path.GetInvalidPathChars()), "?", "/", "*", "\""))); private static readonly string[] DosReservedNames = { "CON", "PRN", "AUX", "NUL", "COM0", "COM1", "COM2", "COM3", "COM4", "COM5", "COM6", "COM7", "COM8", "COM9", "LPT0", "LPT1", "LPT2", "LPT3", "LPT4", "LPT5", "LPT6", "LPT7", "LPT8", "LPT9" }; public static string NormalizePath(string name) { if (Environment.OSVersion.Platform == PlatformID.Unix || Environment.OSVersion.Platform == PlatformID.MacOSX) return name; const string replacement = "."; var matchesCount = Regex.Matches(name, @":\\").Count; string correctName; if (matchesCount > 0) { var regex = new Regex(@":", RegexOptions.RightToLeft); correctName = regex.Replace(name, replacement, regex.Matches(name).Count - matchesCount); } else correctName = name.Replace(":", replacement); var replace = Regex.Replace(correctName, NormalizationPattern, replacement); foreach (var reservedName in DosReservedNames) { var builder = new List<string>(); foreach (var folder in replace.Split(Path.DirectorySeparatorChar)) { var changedName = folder; if (string.Equals(folder, reservedName, StringComparison.InvariantCultureIgnoreCase)) changedName = replacement + reservedName; var value = reservedName + '.'; if (folder.StartsWith(value, StringComparison.InvariantCultureIgnoreCase)) changedName = replacement + value + folder.Remove(0, value.Length); builder.Add(changedName); } replace = string.Join<string>(Path.DirectorySeparatorChar.ToString(), builder); } return replace.TrimEnd(' ', '.'); } 

The root of the folder is usually selected in the system and it already exists. And then all levels of nesting are created through normalization. Therefore, for example, trimming of points and spaces is made only at the end, not at each level. Maybe you should not do so and it is worth the name of each folder.

Tests are written on it, cases in general look like this:

 [Test, Sequential] public void CheckNotAllowedNames([Values( "test" ,@"C:\somename\somename:name" ,@"usr\home\somename:name" ,@"start < > : "" / \ | ? * end" ,"\x15\x3D" // less than ASCII space ,"\x21\x3D" // HEX of !, valid ,"\x3F\x3D" // HEX of ?, not valid ,@"C:\somename\ trailing space " ,@"C:\somename\...trailing period..." ,@"C:\somename\CON" ,@"C:\somename\CON.txt" ,@"CON" ,@"C:\somename\con.txt\context" ,@"home\NUL.liza" ,@"home\ NUL.liza" ,@"C:\somename\..." // Bad name get the root folder, bug =_= ,@"root\..\sub" ,@"root\..\" ,@".\..\some?folder" ,@"root\.." // relative path trimmed, bug =_= )] string name, [Values( "test" ,@"C:\somename\somename.name" ,@"usr\home\somename.name" ,@"start . . . . . \ . . . end" ,".=" ,"!=" ,".=" ,@"C:\somename\ trailing space" ,@"C:\somename\...trailing period" ,@"C:\somename\.CON" ,@"C:\somename\.CON.txt" ,@".CON" ,@"C:\somename\.CON.txt\context" ,@"home\.NUL.liza" ,@"home\ NUL.liza" ,@"C:\somename\" ,@"root\..\sub" ,@"root\..\" ,@".\..\some.folder" ,@"root\" )] string expected) { Assert.AreEqual(expected, NormalizePath(name)); } 

Actually, it would be desirable in the first place that someone looked and may have found the errors I missed.

And secondly - can I reinvent the wheel, and where is the finished normalization? Googled long and hard, but could miss, the cycle is full.


UPD1: a problem with relative paths was found and I don’t have any idea how to solve it yet, added tests with current behavior. Api dotnet allows you to request the creation of the root\folder\..... and returns the root folder. The help on msdn says that you can create points through api, but you should not, so as not to cause problems. As a result, processing the relative paths correctly is another question.

  • I usually change forbidden characters to _ . Points are a really awkward option. I did not see ready implementations. - rdorn
  • @rdorn, I also initially had underscores, users complained that it was ugly, changed to dots. There are no real cases that these points turn into relative paths, but with underscores it would really be easier. While in thought, how best to do. - Monk
  • In the order of delirium: to find the symbol of a point, the code of which does not coincide with the code of the standard point, in Unicode we work. - rdorn
  • @rdorn I tried the dots, in general, the rules =) But here again we must be careful with the relative paths. And this leads me to another topic - probably you need to separately validate the path from the settings and the names of entities from the business of logic. - Monk
  • Well, if the root is set by the settings, and then by the names of the entities, then yes, it is worth validating both options independently. And for relative paths, you can add a check that with automatic and manual conversion to an absolute path, equivalent paths are obtained. This will reveal problems with points, in theory. It is possible to prohibit certain characters in the names of entities, but this is not always possible according to business logic. - rdorn

1 answer 1

As a result, I clearly divided the logic of working with the root folder specified in the settings and with the folders of business objects.

Settings folders are validated simply:

 /// <summary> /// ΠŸΡ€ΠΎΠ²Π΅Ρ€ΠΈΡ‚ΡŒ ΠΏΡƒΡ‚ΡŒ ΠΊ ΠΏΠ°ΠΏΠΊΠ΅ хранСния ΡƒΠΊΠ°Π·Ρ‹Π²Π°Π΅ΠΌΡ‹ΠΉ Π² настройках. /// </summary> /// <param name="path">ΠŸΡƒΡ‚ΡŒ ΠΊ ΠΏΠ°ΠΏΠΊΠ΅.</param> /// <returns>True, Ссли ΠΏΡƒΡ‚ΡŒ Π² порядкС.</returns> /// <remarks>ΠŸΡƒΡ‚ΡŒ Π΄ΠΎΠ»ΠΆΠ΅Π½ ΡΡƒΡ‰Π΅ΡΡ‚Π²ΠΎΠ²Π°Ρ‚ΡŒ ΠΈΠ»ΠΈ Π±Ρ‹Ρ‚ΡŒ Π·Π½Π°Ρ‡Π΅Π½ΠΈΠ΅ΠΌ ΠΏΠΎ ΡƒΠΌΠΎΠ»Ρ‡Π°Π½ΠΈΡŽ. НС Π΄ΠΎΠ»ΠΆΠ΅Π½ Π·Π°ΠΊΠ°Π½Ρ‡ΠΈΠ²Π°Ρ‚ΡŒΡΡ ΠΎΡ‚Π½ΠΎΡΠΈΡ‚Π΅Π»ΡŒΠ½Ρ‹ΠΌΠΈ путями.</remarks> public static bool ValidateSettingPath(string path) { if (Equals(path, AppConfig.DownloadFolderName) || Equals(path, AppConfig.DownloadFolder)) return true; if (path.TrimEnd(Path.PathSeparator, Path.AltDirectorySeparatorChar, Path.DirectorySeparatorChar, Path.VolumeSeparatorChar).EndsWith(".")) return false; return Directory.Exists(GetAbsoluteFolderPath(path)); } 

And the validation of folders from business objects is actually the one in question, but it turned out to be greatly simplified:

 private static readonly string NormalizationPattern = string.Format(@"([{0}]*\.+$)|([{0}]+)", Regex.Escape(string.Concat(new string(Path.GetInvalidPathChars()), "?", "/", "*", "\"", ":", "\\"))); private static readonly string[] DosReservedNames = { "CON", "PRN", "AUX", "NUL", "COM0", "COM1", "COM2", "COM3", "COM4", "COM5", "COM6", "COM7", "COM8", "COM9", "LPT0", "LPT1", "LPT2", "LPT3", "LPT4", "LPT5", "LPT6", "LPT7", "LPT8", "LPT9" }; /// <summary> /// ΠžΡ‡ΠΈΡΡ‚ΠΊΠ° ΠΈΠΌΠ΅Π½ΠΈ ΠΎΡ‚ нСдопустимых символов. /// </summary> /// <param name="name">Имя.</param> /// <returns>Имя Π±Π΅Π· нСдопустимых символов.</returns> /// <remarks>Π’ ΠΈΠΌΠ΅Π½ΠΈ Π½Π΅ Π΄ΠΎΠ»ΠΆΠ½ΠΎ Π±Ρ‹Ρ‚ΡŒ Ρ€Π°Π·Π΄Π΅Π»ΠΈΡ‚Π΅Π»Π΅ΠΉ, ΠΎΠ½ΠΈ Π±ΡƒΠ΄ΡƒΡ‚ Π²ΠΎΡΠΏΡ€ΠΈΠ½ΠΈΠΌΠ°Ρ‚ΡŒΡΡ ΠΊΠ°ΠΊ Ρ‡Π°ΡΡ‚ΡŒ ΠΈΠΌΠ΅Π½ΠΈ.</remarks> public static string RemoveInvalidCharsFromName(string name) { if (Environment.OSVersion.Platform == PlatformID.Unix || Environment.OSVersion.Platform == PlatformID.MacOSX) return name; const string replacement = "."; var folder = Regex.Replace(name, NormalizationPattern, replacement); foreach (var reservedName in DosReservedNames) { var reservedNameWithDot = reservedName + '.'; if (string.Equals(folder, reservedName, StringComparison.InvariantCultureIgnoreCase)) folder = replacement + reservedName; else if (folder.StartsWith(reservedNameWithDot, StringComparison.InvariantCultureIgnoreCase)) folder = replacement + reservedNameWithDot + folder.Remove(0, reservedNameWithDot.Length); } // Если имя оказалось Ρ†Π΅Π»ΠΈΠΊΠΎΠΌ ΠΈΠ· Ρ‚ΠΎΡ‡Π΅ΠΊ ΠΈ\ΠΈΠ»ΠΈ ΠΏΡ€ΠΎΠ±Π΅Π»ΠΎΠ² - замСняСм Π½Π° константу. folder = folder.TrimEnd(' ', '.'); if (string.IsNullOrWhiteSpace(folder)) folder = "invalid name"; return folder; } 

The current implementation requires calling RemoveInvalidCharsFromName for each nesting level. Those. if the logic requires Folder1 / Subfolder2 / Subfolder3, then you need to "normalize" each folder separately:

 var path = Path.Combine(settingFolder, RemoveInvalidCharsFromName("Folder1")); path = Path.Combine(path, RemoveInvalidCharsFromName("Subfolder2")); path = Path.Combine(path, RemoveInvalidCharsFromName("Subfolder3"));