I have two news for you: good and bad.
I'll start with the bad : there are no completely ready open solutions (at least they are not known to me).
And the good news is that the existing free tools are enough to build the solution you need.
I'll tell you how I solved a similar problem, maybe my experience will help you with something.
So, of all the obvious solutions, I preferred to use DOM tree transmission from client to operator and broadcast tree mutations and user actions. This approach allows you to abstract from the complexity of the target site and make the decision as flexible and versatile as possible.
The whole task can be divided into several parts:
- Get the DOM tree on the client and serialize it
- Transfer it to the operator
- Restore the tree from the operator
- Pick up tree mutations from the client and transfer them to the operator
- (Optional) Send user events (scrolling, navigation, keystroke, etc.) from the client to the operator
- (Optional) Transfer text from operator to client
1. Getting the DOM tree and its serialization:
Getting the tree itself is a trivial task. You can use the standard JS DOM API for this. You can start from the root node:
var root = document;
By traversing the child elements of this node, and throwing out potentially problematic / useless nodes (like <script> and <object> ), you get a subset of the tree that needs to be serialized to get an equivalent picture from the operator and client.
And here magic begins: a page with the same markup leads to building a different DOM object tree in different browsers. IE8 behaves the coolest, creating an additional text node for each line feed between tags ... But this problem can also be handled. How exactly, I will not tell, the problem is quite extensive.
Once you have a subset of the DOM tree, you need to serialize it. If for each node to keep its type, name as well as attributes (if they exist and if the type of the node can contain them), you can get the usual tree JSON.
2. Passing the serialized tree to the operator
There may be many options. I would start with Socket.io, it will allow you to reduce the amount of headache in the following paragraphs.
3. Restore the tree from the operator
JS allows you to dynamically create empty iframes (without src attributes) and dynamically work with their contents. Having properly serialized in JSON a tree to reconstruct the DOM object tree from it is no problem.
4. Pick up tree mutations from the client and transfer them to the operator
Not many people know, but in the DOM API there is a thing called MutationObserver . Using its API, you can dynamically pick up tree mutations and transfer it (using Socket.io means in real-time) to the operator. Again, there is a separate task for serializing / deserializing changes, but I trust you to solve this task yourself :)
5. (Optional) Send user events from client to operator
JS allows you to bind to arbitrary window events. To broadcast these events (being able to translate the coordinates of the nodes) is a trivial task.
6. (Optional) Transfer text from operator to client
For cross-browser work with text selection, there is a library Rangy . It is not difficult to get and restore text selection with its use. The most interesting task is to correctly calculate the coordinates of the beginning and end of the nodes (I remind you that the operator and client DOM trees are similar, but not equivalent, due to browser features). But this topic is very extensive and substantially related to the decisions you made in the preceding paragraphs. Therefore, you will have to solve this problem yourself.
As you already could understand, this topic is very, very extensive and, well, it does not fit into the QA format promoted by StackOverflow. Nevertheless, the question is very interesting, I think if you have specific questions on the technologies mentioned above, do not hesitate to fill them in with separate questions.