End of sequence in IEnumerator <T>

Question

In my opinion, the bool MoveNext() method violates one of the principles of SOL I D because combines the method of moving to the next element and obtaining a sign of reaching the end of the sequence. This causes problems when the iterator is used not in the simplest case in one foreach loop, but as an argument in nested method calls. Those. when used in an external method, passed to an internal method that calls MoveNext() . It turns out that only the internal method will know about the end of the sequence, and the external method will not be able to find out about it, except in the returned parameter from the internal one. But do not pass the same bool from each method called! This will make the code just terrifying. It would be much more logical in IEnumerator to make another property of the end attribute. Moreover, such information is actually always available in the iterator class itself and is returned in the displacement method.

Example:

 public class Parser { void Parse(IEnumerator<char> enumerator) { // Допустим есть последовательности букв, разделенных пробелами enumerator.Reset(); while (enumerator.MoveNext()) { if (!char.IsWhiteSpace(enumerator.Current)) // пробелы пропускаем ParseWord(enumerator); // не понятно в каком положении последовательность // Далее должна быть обработка итератора и анализ enumerator.Current } } void ParseWord(IEnumerator<char> enumerator) { // читаем пока буквы while (char.IsLetter(enumerator.Current)) { // ... // если при переходе встретили конец последовательности, выходим if (!enumerator.MoveNext()) { break; // Отсюда начнутся проблемы, т.к.выйдя из метода, // теряется информация о конце последовательности. } } } }

As you can see, when exiting the ParseWord method, 2 situations are possible - either we are on the character following the word, or we have moved to the end of the sequence. The Parse method cannot call MoveNext (), because otherwise, it loses the character, and when accessing enumerator.Current risks an exception. One would think that by the exception it is possible to determine the end of the sequence, but this is not so. Firstly, an InvalidOperationException exception may occur for some other reason, and secondly, in the case of an array it will be, but in the case of a List<T> or string it will not.

I also want to note that in Java, as far as I understand, the situation is not particularly better. There is a combination of the transition method and getting the element itself:

Interface Iterator<E>
- boolean hasNext() - Returns true if the iteration has more elements. (In other words, returns true if you’re rather than a throwing an exception.)
- E next() - Returns the next element in the iteration. Throws: NoSuchElementException - iteration has no more elements.

Similarly, in the example above in the case of Java, we will acquire information about the end of the sequence, but we will lose the opportunity to analyze the element itself.

As a solution to this problem, I saw the creation of a template wrapper class for an arbitrary iterator that would catch and store the values returned from MoveNext() and then be provided as a property.

Ideally, I would like to have this:

 public interface IEnumerator { object Current { get; } bool MoveNext(); bool HasValue { get; } // то же что и MoveNext, но без перемещения void Reset(); }

But the question is whether there really is an architectural interface problem or maybe there is a solution that allows you to use an iterator in many nested methods without passing the value obtained from MoveNext() ?

@VladD: I apologize, but I did not quite understand about caching. I did not begin to understand for a long time, but having understood that most likely we are talking about different things, I decided to spend time on my example. More precisely, even 2 examples. 1st with extended enumerator and 2nd with standard. And what was the surprise that in the end they turned out to be almost the same - only 3 differences.

 // Внимание! // Допущение: вложенный энумератор должен управляться только логикой класса-обертки и не должен учавствовать в других операциях. // Допустим в интерфейсе итератора существует еще одно свойство public interface IEnumeratorEx<out T> : IEnumerator<T> { // Признак конца последовательности // Он же то, что возвращает MoveNext() // Он же, означающий наличие значения, которое можно получить bool HasValue { get; } } // Класс с расширенным энумератором (+HasValue) public class FilteringEnumeratorEx<T> : IEnumeratorEx<T> { IEnumeratorEx<T> wrapped; // Оригинальная последовательность, подлежащая фильтрации Func<T, bool> filter; // Фильтр public T Current { get { // Если вложенный итератор не пройден до конца, значит текущий элемент был найден, // отфильтрован и должен быть возвращен if (HasValue) return wrapped.Current; else throw new InvalidOperationException("Энумератор достиг конца. Значение не может быть получено."); } } object IEnumerator.Current { get { return Current; } } public bool HasValue { get { return wrapped.HasValue; } } public FilteringEnumeratorEx(IEnumeratorEx<T> wrapped, Func<T, bool> filter) { this.wrapped = wrapped; this.filter = filter; } public bool MoveNext() { if (!HasValue) return false; while (wrapped.MoveNext() && !filter(wrapped.Current)) ; return HasValue; } public void Reset() { wrapped.Reset(); } public void Dispose() { wrapped.Dispose(); } } // Класс со стандартным энумератором public class FilteringEnumerator<T> : IEnumerator<T> { IEnumerator<T> wrapped; // Оригинальная последовательность, подлежащая фильтрации Func<T, bool> filter; // Фильтр // Поскольку вложенный энумератор не имеет отдельного признака конца, кэшируем последнюю MoveNext() // Иными словами, изобретаем костыли // Хотя, раз уж есть такое полезное свойство, то почему бы его не выставить наружу, пусть пользуются. public bool HasValue { get; private set; } // 1 отличие public T Current { get { // Если вложенный итератор не пройден до конца, значит текущий элемент был найден, // отфильтрован и должен быть возвращен if (HasValue) return wrapped.Current; else throw new InvalidOperationException("Энумератор достиг конца. Значение не может быть получено."); } } object IEnumerator.Current { get { return Current; } } public FilteringEnumerator(IEnumerator<T> wrapped, Func<T, bool> filter) { this.wrapped = wrapped; this.filter = filter; HasValue = true; // -1 элемент всегда должен позволять шагнуть дальше // 2 отличие } public bool MoveNext() { if (!HasValue) // Если предыдущая MoveNext вернула false, значит ушли в конец return false; // - вернем признак конца. while ((HasValue = wrapped.MoveNext()) && !filter(wrapped.Current)) // 3 отличие ; return HasValue; } public void Reset() { wrapped.Reset(); } public void Dispose() { wrapped.Dispose(); } }

I can conclude that the lack of a HasValue property in the iterator being wrapped doesn’t really interfere with adding it to the wrapper. There is practically no difference in complexity.

I think it is worth explaining which of the principles is violated in my opinion.

Interface Segregation Principle (ISP). Clients should not depend on methods that they do not use. Many interfaces specifically designed for customers are better than one general-purpose interface. Too “thick” interfaces should be divided into smaller and more specific ones so that the clients of small interfaces know only about the methods they need in their work. As a result, when changing the interface method, clients that do not use this method should not change.

This principle, I think, is usually perceived regarding the interface as an interface in a programming language. But if you think about it, this principle is much more general and concerns the interaction interface, which is also any method, function, and even the whole system. It even seems to me that, in fact, it is still the same principle of a single duty (Single responsibility principle). The only responsibility should have not only the object, but also its interaction interface. At the same time, each method of this interface in turn should also have a sole responsibility. Any entity should be split as small as possible, but no more.

In the interface of interaction with the iterator there should be only those methods that relate to its work. To do this, you need to be able to receive items, sort through them (advance on them), as well as receive information about whether we have finished our search or not. Further crushing is no longer possible, because removing any of the features will make working with the interface impossible.

Now you need to design how to use these enumerator capabilities. Possible options:

 // Простой enumerator public interface ISimpleEnumerator { // получить текущий элемент object Current { set; } // продвинуться дальше void MoveNext(); // проверить наличие текущего элемента bool HasValue { get; } } // C# enumerator public interface ICSharpEnumerator { object Current { get; } // попытка переместиться на следующий элемент bool MoveNext(); } // Java enumerator public interface IJavaEnumerator { // остались ли еще впереди элементы bool HasNext { get; } object Next(); } // Super enumerator public interface ISuperEnumerator { // пытаемся получить следующий элемент // возвращает признак его получения и сам элемент bool MoveNext(out object current); } // Все тесты предполагают нахождение итератора в -1 позиции // т.е. перед первой попыткой чтения. // В начальном состоянии доступ к любому методу/свойству // кроме MoveNext некорректен и скорее всего должн вызывать исключение. public class TestClass { void SimpleEnumeratorTest(ISimpleEnumerator e) { for (;;) { e.MoveNext(); if (!e.HasValue) break; var x = e.Current; } } void CSharpEnumeratorTest(ICSharpEnumerator e) { while (e.MoveNext()) { var x = e.Current; } } void JavaEnumeratorTest(IJavaEnumerator e) { while (e.HasNext) { var x = e.Next(); } } void SuperEnumeratorTest(ISuperEnumerator e) { object x; while (e.MoveNext(out x)) { // use x } } }

As you can see, the possibilities of each of the 4 enumerators are the same, but the difference is in how the interaction interface is designed with each of them. If you follow the principle of common responsibility or the principle of separation of interfaces, then each of the methods should also be elementary, but no more. So These principles are satisfied by a simple enumerator of 3 methods / properties, and the ISuperEnumerator with only 1 method is the most disturbed.

At first, it may seem that only 1 method is very convenient, but in practice it will surely appear that a lot of problems will start to arise due to the fact that different data from the MoveNext method and so on will be required in different places of the algorithm you have to cache and drag them all the time. Within one method, this may not cause difficulties, and when the call stack is affected, then they will manifest themselves fully, since visibility of local variables will disappear, and it will be impossible to get them again from the interface. As a result, an adapter will be constructed that brings this super interface to the simplest interface. Actually it is a crutch.

If you pay attention to all kinds of APIs, then you can see that most often each interaction is made as small and elementary as possible. This is exactly where the principle of ISP is manifested (many interfaces specifically designed for customers are better than one general-purpose interface). It could also be written in the following form: many methods / functions specifically designed for clients are better than one general purpose method / function.

Although SRP and ISP look different, they actually have the same basic principle - crushing the system to achieve Low coupling and High cohesion, see GRASP. The basis is the goal to simplify the system by reducing the number of links in it.

I do not pretend to the truth, I just expressed my opinion.

unfortunately, comments are not transferred - the presence of the HasNext property will require the iterator (from all its implementations!) to get this property without extracting Next from the real source.
Which, in many cases, is difficult - for example, when retrieving data from the database.
At the same time, almost all existing code (clients of the current implementation) works fine without HasNext.
And you, for the sake of a single client (one case), offer to extend the basic interface, forcing all clients to depend on a thicker interface, which violates isp.
"At the expense of the ISP, I literally do not perceive the interface as an interface. An interface can be one method and anything. I crush it into smaller methods. Instead of the first more general method, I make 2 more specialized ones."
- and it would be worthwhile to perceive the interface precisely as a “public class interface” == a set of methods.
The fact is that your "special" perception of the interface distorts the principle of the ISP to the opposite.
And precisely because of this, when you apply it, you get the opposite effect - instead of getting separate, small, segregated interfaces - one big fat interface
Come to the local chat - chat.stackexchange.com/rooms/22462/stack-overflow-- - it is more suited to managing long discussions.
Your implementation of FilteringEnumeratorEx is incorrect.
Imagine that your filter is such that the resulting sequence is empty.
Then if you create FilteringEnumeratorEx and query HasNext , you get true , which is wrong.
The semantics of using the enumerator requires that MoveNext () be called first.
Prior to this, the enumerator position is logically up to the 1st element.
In this position, the Current and HasNext / HasValue values do not make sense.
In the case of FilteringEnumeratorEx, the calls and the responsibility for the reaction will be redirected to the nested enumerator.
False - definitely not right, because it is a sign of the end.
So considering the semantics of use, I consider that everything is ok.

Answer 1 · 2016-08-05T18:30:57

In ParseWord you access enumerator.Current without first calling enumerator.MoveNext - and, strictly speaking, the state of enumerator.Current is not defined. Those. ParseWord hopes that someone in advance, before him, brought the enumerator into a valid state.

According to the discussion in the comments:

You see the principle of ISP as "dividing methods into several". This is not quite the correct interpretation. The essence of the ISP method on the example of Enumerator.

There is a class that can
- move forward ( MoveNext ).
- return current element.
- predict whether there is another element in the sequence ( HasNext ).
There are two categories of code (clients) using this object:
- Clients A - they just need to step forward and read the current element.
- Customers B - they need to step forward, read the current element, and know if it is possible to read the next one.

So, the ISP principle states that this object should provide two separate (segregated) interfaces.

Interface A, which is provided to clients A - MoveNext + Current
Interface B, which is provided to clients B - MoveNext + Current + HasNext

This is done to ensure that clients A are not dependent on the interface, part of which they do not use. Literally, the definition of the principle:

Client should not be forced.

In HasNext to your example, clients A who do not use HasNext should not depend on the interface in which there is a HasNext method.

So, IEnumerator is an interface for category A customers. There is no standard interface for category B customers. Accordingly, there is no its implementation in standard enumerators.

You wrote the client code of category B. Substituting in it a class that implements only an interface of category A (he was not going to implement the interface for B!). Naturally, this causes some inconvenience . But this has nothing to do with the violation of the ISP - the problem is caused by the use of a particular class for other purposes.

Any code that uses IEnumerable should not make assumptions about the state of the sequence. Those. he must first call MoveNext , and then Current. In practice, this means that you cannot use the same Current element in several methods - which makes the IEnumerable in its pure form inapplicable during parsing.

You should make a wrapper on top of an IEnumerable that caches the result of the last call to MoveNext . This is how it is decided, for example, in the C # lexer .

It uses the SlidingTextWindow wrapper, which, if necessary, rewinds back:

 /// Keeps a sliding buffer over the SourceText of a file for the lexer. Also /// provides the lexer with the ability to keep track of a current "lexeme" /// by leaving a marker and advancing ahead the offset. The lexer can then /// decide to "keep" the lexeme by erasing the marker, or abandon the current /// lexeme by moving the offset back to the marker.

And the first thing that is declared in it is a sign of the end of the file. With a very detailed comment on exactly your problem:

 /// <summary> /// In many cases, eg PeekChar, we need the ability to indicate that there are /// no characters left and we have reached the end of the stream, or some other /// invalid or not present character was asked for. Due to perf concerns, things /// like nullable or out variables are not viable. Instead we need to choose a /// char value which can never be legal. /// /// In .NET, all characters are represented in 16 bits using the UTF-16 encoding. /// Fortunately for us, there are a variety of different bit patterns which /// are *not* legal UTF-16 characters. 0xffff (char.MaxValue) is one of these /// characters -- a legal Unicode code point, but not a legal UTF-16 bit pattern. /// </summary> public const char InvalidCharacter = char.MaxValue;

Those. they solved the problem by casting Current into a known invalid state + with checks of the form TextWindow.PeekChar == SlidingTextWindow.InvalidCharacter in the calling code.

It is not entirely clear from which it follows that each method must first make a step, and then read.
If we assume that one token (I don’t quite understand what exactly is meant by this, but I assume some sequence of characters) follows another, then having read one token, we will be at the beginning of another.
In my example, without reading up the space, we will not understand that the word is over.
analysis of the word (token) ends at the moment when we find ourselves on a "foreign" token.
coming out of the method of analyzing the token, we stand on the following.
Perhaps it would be worthwhile to clarify that for the public method, yes, the state of the iterator can be relatively any and ideally should be done first with Reset + MoveNext.
But if we are talking about private methods that are highly integrated with each other, then I think here most likely everything depends on the agreement that was adopted in the logic of the class.
@RomanEmroline is precisely because you need to read the characters and unwind back - pure IEnumerable does not suit you.
if you need something like IEnumerable + HasNext - then take it and write it yourself.
It is strange to expect from the general interface in the standard library something that is simply not there.
@RomanEmrolin briefly - SOLID violates not IEnumerable, but your class, which simultaneously deals with the analysis of tokens + tracking the state of the sequence.
Вам просто надо взять и вынести отслеживание/"получение признака достижения конца последовательности" в отдельный класс-обертку поверх IEnumerable.
вы ждете что IEnumerable будет и вычитывать элементы и отслеживать признак конца последовательности (что нарушит S). На самом деле IEnumerable просто вычитывает элементы. Все остальное - не его дело.
@РоманЕрмолин возможно. Осталось выяснить, какое же из них общепринятое, а какое - совсем не isp. Смотрите: oodesign.com/interface-segregation-principle.html . там есть пример применения. Был один интерфейс с двумя методами, стало два с одним. Instead of one fat interface many small interfaces are preferred based on groups of methods, each one serving one submodule.

Answer 2 · 2016-08-05T20:11:58

Хороший интерфейс берётся далеко не с потолка. Хороший интерфейс придумывается так, чтобы с ним было легко работать, и чтобы имплементация его не приводила к ошибкам. Нарушение любого из этих правил ведёт к бесполезности интерфейса.

Модификация интерфейса IEnumerator<T> , предлагаемая в вопросе, не выбрана разработчиками C# потому, что она намного хуже для имплементаторов, чем текущий интерфейс.

Дело в том, что предлагаемый вами метод HasHext требует мутации состояния , что приведёт к проблемам при попытке композиции нескольких последовательностей.

Пример: пусть энумератор обладает интерфейсом

 public interface IEnumerator<T> { T Current { get; } bool MoveNext(); bool HasNext { get; } void Reset(); }

Попробуем заимплементировать LINQ-оператор Where . Для HasNext нам придётся в реальности вычитывать элементы из последовательности и кэшировать их! Go.

Стандартное начало:

 public class FilteringEnumerator<T> { IEnumerator<T> wrapped; Func<T, bool> filter; public FilteringEnumerator(IEnumerator<T> wrapped, Func<T, bool> filter) { this.wrapped = wrapped; this.filter = filter; }

Посмотрим, как будет выглядеть MoveNext . Мы должны будем проверить, есть ли в внутренней последовательности значения, проходящие фильтр. Для этого нам придётся эти значения фактически прочитать, и закешировать, на случай, если будет вызван MoveNext . Теперь, если закешированное значение уже есть , нужна специальная логика.

  bool cacheOK = false; T cache; public bool HasNext { get { if (cacheOK) return true; while (wrapped.MoveNext()) { if (filter(wrapped.Current)) { cache = wrapped.Current; cacheOK = true; return true; } } return false; } }

Теперь Current . Мы не можем просто вернуть wrapped.Current , потому что внутренняя последовательность могла уже убежать вперёд. Поэтому нам нужно «материальное» свойство.

  public T Current { get; private set; }

Теперь MoveNext . Если у нас есть закешированное значение, то просто отдаём его. Иначе нам придётся искать дальше.

  bool MoveNext() { if (!cacheOK) { var result = HasNext; if (!result) return false; } // тут у нас есть значение в кеше, вытягиваем его Current = cache; cacheOK = false; return true; } void Reset() { throw new NotSupportedException(); } }

Правда, сложная логика?

Для сравнения, смотрите, что будет с обычным определением.

 public interface IEnumerator<T> { T Current { get; } bool MoveNext(); void Reset(); }

То же стандартное начало:

 public class FilteringEnumerator<T> { IEnumerator<T> wrapped; Func<T, bool> filter; public FilteringEnumerator(IEnumerator<T> wrapped, Func<T, bool> filter) { this.wrapped = wrapped; this.filter = filter; }

И имплементация:

  T Current => wrapped.Current; bool MoveNext() { while (wrapped.MoveNext()) if (filter(wrapped.Current)) return true; return false; } void Reset() { throw new NotSupportedException(); } }

See the difference?

Итераторы с lookahead практически никогда не нужны в реальности. В тех немногих случаях, когда они действительно нужны (например, имплементация токенизатора), несложно заимплементировать lookahead самостоятельно:

 class Lookehead<T> : IDisposable { IEnumerator<T> en; public Lookehead(IEnumerator<T> en) { this.en = en; MoveForward(); } public bool HasValue { get; private set; } public T Value { get; private set; } public void MoveForward() { HasValue = en.MoveNext(); if (HasValue) Value = en.Current; } public void Dispose() => en.Dispose(); }

При этом ваш код получается очень простым:

 public class Parser : IDisposable { public Parser(IEnumerable<char> seq) { lookahead = new Lookehead<char>(seq.GetEnumerator()); } Lookahead<char> lookahead; public void Dispose() => lookahead.Dispose(); void Parse() { while (lookahead.HasValue) { if (!char.IsWhiteSpace(lookahead.Value)) // пробелы пропускаем ParseWord(); // мы за концом слова или текст завершён else lookahead.MoveForward(); } } void ParseWord() { // предусловие: если мы тут, то lookahead.HasValue == true // читаем пока буквы while (lookahead.HasValue && char.IsLetter(lookahead.Value)) { // ... lookahead.MoveForward(); } // постусловие: если мы тут, то lookahead.HasValue == false или // lookahead.Value - не буква } }

На счет Lookehead и Parser: именно об этом я и говорю, что разделение метода MoveNext на 2: MoveForward и HasValue (название больше понравилось, чем HasNext, спс за подсказку) позволяет сделать код более понятным и легким. Метод Parse сразу стал работать и не зависеть от хитросплетений Move-ов во вложенных методах. Ну разве так не удобнее, чем с MoveNext? (риторический вопрос)
@РоманЕрмолин: В том-то и дело, что удобно только в этом специальном случае: написании парсера. В практически всех остальных случаях эта функциональность не нужна. И хуже того, наличие этой функциональности повсюду очень затруднило бы имплементацию. Поэтому её нигде и нету, а при случае написать обёртку, как вы видите, вопрос двух минут.
Для меня совсем не очевидно, что затруднило бы имплементацию. Сейчас энумератор кэширует только само значение, что позволяет вызвать Current, а если бы еще кэшировал признак перехода, то и свойство HasNext было бы доступно. Если вообще убрать какое-либо кэширование, то получаем bool MoveNext(out T current); Разница лишь в том на кого перекладывается ответственность за кэширование.
@РоманЕрмолин: Ну у меня ж как раз пример гораздо более сложной имплементации Where для интерфейса с HasHext ( FilteringEnumerator ). Посмотрите!
Только сейчас до меня дошло, что мы говорим о разных вещах! Я взял название HasNext из Java, но подразумевал не следующее, которое еще не хранится в Current, а всего лишь запомненное из MoveNext. Его можно назвать HasValue или LastMoveNext. Да то, что Вы реализовали довольно сложная штука и я бы назвал это предчтением. Даже мысли не было об этом :) Так что прошу пересмотреть свой взгляд на предлагаемую мной идею. Она заключается лишь в том, что кроме Current надо еще кэшировать и то, что возвращает MoveNext(). И только.

End of sequence in IEnumerator <T>

2 answers 2

More articles: