I create the URL parser in C #. Task: display top-N domains, top-N paths. File paths (input output) are taken from the console, N from the console in the form of an optional flag (how to implement it correctly?)

It works, but I would like to know how to write better from your point of view. And the flag, yes, his too!

class Program { static void Main(string[] args) { int N = 0; if (args[0] == "-n") if (!Int32.TryParse(args[1], out N)) throw new FormatException("N is not valid"); string input = File.ReadAllText(args[2]); string pattern = @"(http://|https://)(?<domen>[\da-z\.-]+)/(?<path>[[\/\w \.-]*)"; Regex regex = new Regex(pattern, RegexOptions.Multiline | RegexOptions.Compiled); MatchCollection matchCollection = regex.Matches(input); SortedDictionary<string, int> Domens = new SortedDictionary<string, int>(); SortedDictionary<string, int> Paths = new SortedDictionary<string, int>(); for (int i = 0; i < matchCollection.Count; i++) { if (Domens.ContainsKey(matchCollection[i].Groups["domen"].ToString())) (Domens[matchCollection[i].Groups["domen"].ToString()])++; else Domens.Add(matchCollection[i].Groups["domen"].ToString(), 1); if (Paths.ContainsKey(matchCollection[i].Groups["path"].ToString())) (Paths[matchCollection[i].Groups["path"].ToString()])++; else Paths.Add(matchCollection[i].Groups["path"].ToString(), 1); } //Domen и Path - 2 класса, содержащих поля: строку и частоту встречаемости, реализованы в другом файле List<Domen> SortedDomens = new List<Domen>(); foreach (KeyValuePair<string, int> keyValue in Domens) { SortedDomens.Add(new logparser.Domen(keyValue.Key, keyValue.Value)); } SortedDomens.Sort(); List<Path> SortedPaths = new List<Path>(); foreach (KeyValuePair<string, int> keyValue in Paths) { SortedPaths.Add(new logparser.Path(keyValue.Key, keyValue.Value)); } SortedPaths.Sort(); //Вывод сначала общей информации, затем через статический шаблонный класс информацию о доменах и путях //Внутри переопределен метод .ToString() в классах Domen и Path using (System.IO.StreamWriter file = new System.IO.StreamWriter(args[3], true)) { file.WriteLine("total URLs: {0}, domains: {1}, paths: {2}", matchCollection.Count + SortedDomens.Count + SortedPaths.Count); file.Close(); } WriteToFile<Domen>.writetofile(SortedDomens, args[3], N); WriteToFile<Path>.writetofile(SortedPaths, args[3], N); } } 
  • four
    Do you know that in C # there is a wonderful thing like uri ? She herself will break the line into the necessary parts, which can then be easily removed, and the regulars - the evil ... - EvgeniyZ
  • Yes, of course, but this is, in many ways, a learning task. Sorry, did not specify this in the condition - zhuk

1 answer 1

If you take a ready-made package for parsing arguments ( PowerArgs ), then something like this:

 using PowerArgs; using System; using System.Collections.Generic; using System.IO; using System.Linq; namespace ConsoleApp30 { public class Arguments { [ArgShortcut("n")] public int Top { get; set; } [ArgPosition(0)] [ArgRequired(PromptIfMissing = true)] public string InputFileName { get; set; } } class Program { static void Main(string[] args) { var arguments = Args.Parse<Arguments>(args); var urls = File.ReadAllLines(arguments.InputFileName) .Select(l => new Uri(l)) .ToList(); PrintTopN(urls, "Host", u => u.Host, arguments.Top); PrintTopN(urls, "Path", u => u.AbsolutePath, arguments.Top); } private static void PrintTopN(List<Uri> urls, string byText, Func<Uri, string> selector, int top) { Console.WriteLine("By " + byText); var groups = urls .GroupBy(selector) .OrderByDescending(g => g.Count()) .Select(g => g.Key); if (top > 0) { groups = groups.Take(top); } // File.WriteAllLines, если нужно вывести в файл foreach (var g in groups) { Console.WriteLine(g); } } } } 
  • Thank you, I didn’t know about PowerArgs - zhuk