This question has already been asked
Quote I have such a task, to take data from one site (our corporate, not from the Internet, but I think it does not matter), I don’t have access to the database and said they don’t give, you want to parse from the page. In general, in this topic for the first time, before the password only with xml or the whole html page is full))) And now you need to first go through authorization, then enter the desired person into the search engine id and pull the result out. Two days here I dig as it is possible to implement. Found several possible options. Using httpclient, but it gives me the passwords input page, it doesn't go any further. That is, I enter the page after entering the passwords and user name, but again sent to the page with the password request. Help me please.
They advised me there:
Quote What type of authorization on the site? If not BASIC, then you most likely need to use the POST method, and it is even more likely that this will use the HTTPS protocol. It would be nice to ask these questions to those who set you such a task. (25 Jun 1:39) a_gura do not forget about Encode / Decode data, with GET and POST requests, as well as when receiving data. And so agree with @a_gura - authorization requests in most cases it is POST requests
I started reading the book Jeff Heaton "HTTP Programming Recipes for Java Bots", understood the examples with complete confidence that this is what I need. So for this example, I did:
CookieUtility and FormUtility I took from the book
CookieUtility.java
import java.net.*; import java.util.*; public class CookieUtility { // Map that holds all of the cookie values. private Map<String, String> map = new HashMap<String, String>(); public Map<String, String> getMap() { return this.map; } /** * Load any cookies from the specified URLConnection * object. Cookies will be located by their Set-Cookie * headers. Any cookies that are found can be moved to a * new URLConnection class by calling saveCookies. * * @param http * The URLConnection object to load the cookies from. */ public void loadCookies(URLConnection http) { String str; int n = 1; do { str = http.getHeaderFieldKey(n); if ((str != null) && str.equalsIgnoreCase("Set-Cookie")) { str = http.getHeaderField(n); StringTokenizer tok = new StringTokenizer(str, "="); String name = tok.nextToken(); String value = tok.nextToken(); this.map.put(name, value); } n++; } while (str != null); } /** * Once you have loaded cookies with loadCookies, you can * call saveCookies to copy these cookies to a new HTTP * request. This allows you to easily support cookies. * * @param http * The URLConnection object to add cookies to. */ public void saveCookies(URLConnection http) { StringBuilder str = new StringBuilder(); Set<String> set = this.map.keySet(); for (String key : set) { String value = this.map.get(key); if (str.length() > 0) { str.append("; "); } str.append(key + "=" + value); } http.setRequestProperty("Cookie", str.toString()); } } FormUtility.java
import java.io.*; import java.net.*; import java.util.*; public class FormUtility { // The charset to use for URL encoding. should always be UTF-8. private final static String encode = "UTF-8"; private static Random random = new Random(); //Generate a boundary for a multipart form. public static String getBoundary() { return "---------------------------" + randomString() + randomString() + randomString(); } /** * Parse a URL query string. Return a map of all of the * name value pairs. * * @param form * The query string to parse. * @return A map of name-value pairs. */ public static Map<String, String> parse(String form) { Map<String, String> result = new HashMap<String, String>(); StringTokenizer tok = new StringTokenizer(form, "&"); while (tok.hasMoreTokens()) { String str = tok.nextToken(); StringTokenizer tok2 = new StringTokenizer(str, "="); if (!tok2.hasMoreTokens()) { continue; } String left = tok2.nextToken(); if (!tok2.hasMoreTokens()) { left = encode(left); result.put(left, null); continue; } String right = tok2.nextToken(); right = encode(right); result.put(left, right); } return result; } private static String encode(String str) { try { return URLEncoder.encode(str, encode); } catch (UnsupportedEncodingException e) { return str; } } protected static String randomString() { return Long.toString(random.nextLong(), 36); } /* * The boundary used for a multipart post. This field is * null if this is not a multipart form and has a value if * this is a multipart form. */ private String boundary; private OutputStream os; private boolean first; /** * Prepare to access either a regular, or multipart, form. * * @param os * The stream to output to. * @param boundary * The boundary to be used, or null if this is * not a multipart form. */ public FormUtility(OutputStream os, String boundary) { this.os = os; this.boundary = boundary; } //Add a file to a multipart form. public void add(String name, File file) throws IOException { if (this.boundary != null) { boundary(); writeName(name); write("; filename=\""); write(file.getName()); write("\""); newline(); write("Content-Type: "); String type = URLConnection.guessContentTypeFromName(file.getName()); if (type == null) { type = "application/octet-stream"; } writeln(type); newline(); byte[] buf = new byte[8192]; int nread; InputStream in = new FileInputStream(file); while ((nread = in.read(buf, 0, buf.length)) >= 0) { this.os.write(buf, 0, nread); } newline(); } } // Add a regular text field to either a regular or multipart form. public void add(String name, String value) throws IOException { if (this.boundary != null) { boundary(); writeName(name); newline(); newline(); writeln(value); } else { if (!this.first) { write("&"); } write(encode(name)); write("="); write(encode(value)); } this.first = false; } //Complete the building of the form. public void complete() throws IOException { if (this.boundary != null) { boundary(); writeln("--"); this.os.flush(); } } // Generate a multipart form boundary. private void boundary() throws IOException { write("--"); write(this.boundary); } // Create a new line by displaying a carriage return and linefeed. private void newline() throws IOException { write("\r\n"); } // Write the specified string, without a carriage return and line feed. private void write(String str) throws IOException { this.os.write(str.getBytes()); } // Write the name element for a multipart post. private void writeName(String name) throws IOException { newline(); write("Content-Disposition: form-data; name=\""); write(name); write("\""); } //Write a string, with a carriage return and linefeed.270 HTTP Programming Recipes for Java Bots protected void writeln(String str) throws IOException { write(str); newline(); } } And actually my parser
Parser.java
package parser; import java.io.IOException; import java.io.InputStream; import java.io.OutputStream; import java.net.HttpURLConnection; import java.net.URL; import java.net.URLConnection; import java.net.URLEncoder; public class Parser { private CookieUtility cookies = new CookieUtility(); private boolean login(String username, String password) throws IOException { URL url = new URL("http://10.10.10.90/Account/Login.aspx?ReturnUrl=%2fdefault.aspx"); HttpURLConnection http = (HttpURLConnection) url.openConnection(); http.setInstanceFollowRedirects(false); http.setDoOutput(true); OutputStream os = http.getOutputStream(); FormUtility form = new FormUtility(os, null); form.add("login_name", URLEncoder.encode(username)); form.add("login_password", URLEncoder.encode(password)); form.complete(); http.getInputStream(); cookies.loadCookies(http); return (cookies.getMap().containsKey("ASP.NET_SessionId")); } public String downloadPage(URL url, int timeout) throws IOException { StringBuilder result = new StringBuilder(); byte buffer[] = new byte[8192]; URLConnection http = url.openConnection(); http.setConnectTimeout(10000); InputStream s = http.getInputStream(); int size = 0; do { size = s.read(buffer); if (size != -1) result.append(new String(buffer, 0, size)); } while (size != -1); return result.toString(); } public void process(String username, String password) throws IOException { if(login(username,password)) { URL url = new URL("http://10.10.10.90/default.aspx"); String buffer = downloadPage(url,10000); System.out.println(buffer); } else { System.out.println("Authorization error.."); } } public static void main(String args[]) { try { Parser p = new Parser(); p.process("d.aim", "432545"); } catch (Exception e) { e.printStackTrace(); } } } It seems everything is there, catching cookies, sending a POST request, but still getting a page with an authorization request. Maybe the fact is that authorization on this site is implemented in javascript? But it seems that if you read that aforementioned book this should not be a problem. So what am I mistaken? What did you do wrong? Or maybe I'm doing the wrong thing? Help solve, not the first week I fight with this authorization: (