My hadoop program was initially run in local mode, and now my goal was to run in fully distributed mode.

To do this, you need to provide access to files that are read in the reducer and mapper functions from all computers in the cluster and therefore I asked the question - also because it is not known on which computer the mapper function will run (only one program mapper from the program logic and the program will run with one mapper), it is also necessary to provide access on the entire cluster to the file input to the mapper function.

In this regard, I have a question: Is it possible to directly use hdfs-files, that is, to pre-copy files from the Linux file system to the HDFS file system (thus, I suppose, these files will become available on all computers of the cluster, if it is not then please correct) and then use the HDFS Java API to read these files, in the reducer and mapper functions running on the cluster computers?

If the answer to this question is positive, then please give an example of copying from the Linux file system to the HDFS file system and reading these files in a java program using the HDFS Java API.

    1 answer 1

    HDFS is inherently still a file system (albeit distributed). Accordingly, you can copy any files from the local file system into it.

     Configuration conf = new Configuration(); FileSystem fs = FileSystem.get(conf); Path localPath = new Path("/home/user/file"); Path hdfsPath = new Path("/user/hadoop/file"); fs.copyFromLocalFile(localPath, hdfsPath); 
    • @a_gura I, firstly, want to clarify, you probably wanted to write in the second line of FileSystem hdfs = FileSystem.get (conf) and secondly, I wanted to ask whether hdfs files are available on all computers of the cluster, and, thirdly It is also very necessary to know how to read the contents of the hdfs file and write this content to the java line. - ivan89
    • @ ivan31 No, I wrote what I wanted. Yes, the files will be available. What do you mean by hdfs file? - a_gura
    • @a_gura about clarification - in the book of Chuck Lem, Hadoop in action was exactly FileSystem hdfs = FileSystem.get (conf) FileSystem local = FileSystem.get (conf) that's why I specified. And about the hdfs file - I just saw the following lines of the Path code in the same book hdfsFile = new Path (args [1]) FSDataOutputStream out = hdfs.create (hdfsFile) As I understand, to read the entire contents of the hdfs file, you need to create a variable FSDataInputStream in = hdfs.create (hdfsFile) but I don’t imagine what to do next to read the hdfs file and write all its contents to the java line. - ivan89
    • @ ivan31 You already have an InputStream. Just read from it as from a regular file. Read the Java IO tutorial ( docs.oracle.com/javase/tutorial/essential/io ). - a_gura
    • @a_gura and copying a file from hdfs by using the function fs.copyToLocalFile (hdfsPath, localPath)? - ivan89