The question does not indicate any specific site, so I will answer and suggest a technique that is suitable for any site.
The task uses a solution only for images with the src attribute of the img tag, but a solution on this basis is also possible for images in background-image . It is more complicated, but also possible. For Javascript calls, jQuery is used for simplicity, assuming that it also exists on the target site. But it can also be pure JavaScript or another library provided that this library is used on the site.
Use the CefSharp library for such tasks.
What it is?
This is a managed shell over CEF ( Chromium Embedded Framework ). That is, you get the power of Chromium, which is controlled programmatically.
Why choose CEF / CefSharp?
- You should not bother with parsing pages (and this is a difficult and ungrateful task, which I highly recommend not to do).
- You can work with an already loaded page (after running scripts).
- It is possible to execute arbitrary javascript with the latest features.
- It makes it possible to invoke AJAX with the help of JavaScript, and then with success (success), to pull events in the C # code with the result of AJAX.
CeSharp Varieties
- CefSharp.WinForms
- CefSharp.Wpf
- CefSharp.OffScreen
The first two are used if you need to give users a Browser control. Conceptually similar to WebBrowser in Windows Forms, which is a wrapper for managing IE, not Chromium, as in our case.
Therefore, we will use the CefSharp.OffScreen (offscreen) version.
Code writing
Suppose we have a console application, but it already depends on you.
Install the CefSharp.OffScreen Nuget package of the 51st version:
Install-Package CefSharp.OffScreen -Version 51.0.0
The fact is that C # all arrays map to List<object> , the result of JavaScript is wrapped in object , which already contains List<object> , string , bool , int depending on the result. In order to make the results strongly typed, create a small ConvertHelper:
public static class ConvertHelper { public static T[] GetArrayFromObjectList<T>(object obj) { return ((IEnumerable<object>)obj) .Cast<T>() .ToArray(); } public static List<T> GetListFromObjectList<T>(object obj) { return ((IEnumerable<object>)obj) .Cast<T>() .ToList(); } public static T ToTypedVariable<T>(object obj) { if (obj == null) { dynamic dynamicResult = null; return dynamicResult; } Type type = typeof(T); if (type.IsArray) { dynamic dynamicResult = typeof(ConvertHelper).GetMethod(nameof(GetArrayFromObjectList)) .MakeGenericMethod(type.GetElementType()) .Invoke(null, new[] { obj }); return dynamicResult; } if (type.IsGenericType && type.GetGenericTypeDefinition() == typeof(List<>)) { dynamic dynamicResult = typeof(ConvertHelper).GetMethod(nameof(GetListFromObjectList)) .MakeGenericMethod(type.GetGenericArguments().Single()) .Invoke(null, new[] { obj }); return dynamicResult; } return (T)obj; } }
Create a CefSharpWrapper class:
public sealed class CefSharpWrapper { private ChromiumWebBrowser _browser; public void InitializeBrowser() { CefSettings settings = new CefSettings(); // Disable GPU in WPF and Offscreen until GPU issues has been resolved settings.CefCommandLineArgs.Add("disable-gpu", "1"); //Perform dependency check to make sure all relevant resources are in our output directory. Cef.Initialize(settings, shutdownOnProcessExit: true, performDependencyCheck: true); _browser = new ChromiumWebBrowser(); // wait till browser initialised AutoResetEvent waitHandle = new AutoResetEvent(false); EventHandler onBrowserInitialized = null; onBrowserInitialized = (sender, e) => { _browser.BrowserInitialized -= onBrowserInitialized; waitHandle.Set(); }; _browser.BrowserInitialized += onBrowserInitialized; waitHandle.WaitOne(); } public void ShutdownBrowser() { // Clean up Chromium objects. You need to call this in your application otherwise // you will get a crash when closing. Cef.Shutdown(); } public Task<T> GetResultAfterPageLoad<T>(string pageUrl, Func<Task<T>> onLoadCallback) { TaskCompletionSource<T> tcs = new TaskCompletionSource<T>(); EventHandler<LoadingStateChangedEventArgs> onPageLoaded = null; T t = default(T); // An event that is fired when the first page is finished loading. // This returns to us from another thread. onPageLoaded = async (sender, e) => { // Check to see if loading is complete - this event is called twice, one when loading starts // second time when it's finished // (rather than an iframe within the main frame). if (!e.IsLoading) { // Remove the load event handler, because we only want one snapshot of the initial page. _browser.LoadingStateChanged -= onPageLoaded; t = await onLoadCallback(); tcs.SetResult(t); } }; _browser.LoadingStateChanged += onPageLoaded; _browser.Load(pageUrl); return tcs.Task; } public async Task<T> EvaluateJavascript<T>(string script) { JavascriptResponse javascriptResponse = await _browser.EvaluateScriptAsync(script); if (javascriptResponse.Success) { object scriptResult = javascriptResponse.Result; return ConvertHelper.ToTypedVariable<T>(scriptResult); } throw new ScriptException(javascriptResponse.Message); } }
Next we call our CefSharpWrapper class from the Main method.
public class Program { private static void Main() { MainAsync().Wait(); } private static async Task MainAsync() { CefSharpWrapper wrapper = new CefSharpWrapper(); wrapper.InitializeBrowser(); string[] imageUrls = await wrapper.GetResultAfterPageLoad("https://yandex.ru", async () => await wrapper.EvaluateJavascript<string[]>("$('img').map((index, element) => $(element).prop('src')).toArray()")); string imageFolder = "C://Test"; if (!Directory.Exists(imageFolder)) { Directory.CreateDirectory(imageFolder); } WebClient client = new WebClient(); for (int i = 0; i < imageUrls.Length; i++) { string imageUrl = imageUrls[i]; byte[] fileBytes = await client.DownloadDataTaskAsync(imageUrl); // Можете написать алгоритм позволяющий подбирать расширения string imagePath = Path.Combine(imageFolder, i + ".jpg"); File.WriteAllBytes(imagePath, fileBytes); } wrapper.ShutdownBrowser(); } }
client.DownloadFileAsync(uri, "picture.jpg" );without waiting is not very correct. - VladDbackground-imageproperty - Vadim Ovchinnikov