I create the URL parser in C #. Task: display top-N domains, top-N paths. File paths (input output) are taken from the console, N from the console in the form of an optional flag (how to implement it correctly?)
It works, but I would like to know how to write better from your point of view. And the flag, yes, his too!
class Program { static void Main(string[] args) { int N = 0; if (args[0] == "-n") if (!Int32.TryParse(args[1], out N)) throw new FormatException("N is not valid"); string input = File.ReadAllText(args[2]); string pattern = @"(http://|https://)(?<domen>[\da-z\.-]+)/(?<path>[[\/\w \.-]*)"; Regex regex = new Regex(pattern, RegexOptions.Multiline | RegexOptions.Compiled); MatchCollection matchCollection = regex.Matches(input); SortedDictionary<string, int> Domens = new SortedDictionary<string, int>(); SortedDictionary<string, int> Paths = new SortedDictionary<string, int>(); for (int i = 0; i < matchCollection.Count; i++) { if (Domens.ContainsKey(matchCollection[i].Groups["domen"].ToString())) (Domens[matchCollection[i].Groups["domen"].ToString()])++; else Domens.Add(matchCollection[i].Groups["domen"].ToString(), 1); if (Paths.ContainsKey(matchCollection[i].Groups["path"].ToString())) (Paths[matchCollection[i].Groups["path"].ToString()])++; else Paths.Add(matchCollection[i].Groups["path"].ToString(), 1); } //Domen и Path - 2 класса, содержащих поля: строку и частоту встречаемости, реализованы в другом файле List<Domen> SortedDomens = new List<Domen>(); foreach (KeyValuePair<string, int> keyValue in Domens) { SortedDomens.Add(new logparser.Domen(keyValue.Key, keyValue.Value)); } SortedDomens.Sort(); List<Path> SortedPaths = new List<Path>(); foreach (KeyValuePair<string, int> keyValue in Paths) { SortedPaths.Add(new logparser.Path(keyValue.Key, keyValue.Value)); } SortedPaths.Sort(); //Вывод сначала общей информации, затем через статический шаблонный класс информацию о доменах и путях //Внутри переопределен метод .ToString() в классах Domen и Path using (System.IO.StreamWriter file = new System.IO.StreamWriter(args[3], true)) { file.WriteLine("total URLs: {0}, domains: {1}, paths: {2}", matchCollection.Count + SortedDomens.Count + SortedPaths.Count); file.Close(); } WriteToFile<Domen>.writetofile(SortedDomens, args[3], N); WriteToFile<Path>.writetofile(SortedPaths, args[3], N); } }
uri? She herself will break the line into the necessary parts, which can then be easily removed, and the regulars - the evil ... - EvgeniyZ