The file path was created using bash:

hdfs dfs -mkdir -p user/result/`date +%Y`/`date +%m`/`date +%d` 

the file will be in the last folder:

 hdfs dfs -put table.csv user/result/`date +%Y`/`date +%m`/`date +%d` 

Now the question is: how to set the path to the file in the Pig script correctly, if every day a new folder with the name of the day of the month is created - date +%d ? That is, every day you will need to extract the file from the new folder. The following option did not work:

 A = LOAD user/result/`date +%Y`/`date +%m`/`date +%d`/table.csv as (f1:chararray); 

    1 answer 1

    usually in such cases symlink is made to the actual file / directory.

    approximately like this:

     $ d=$(date +'%Y/%m/%d') $ mkdir -p $d $ ln -snf $d actual 

    and it turns out this picture:

     $ tree . ├── 2015 │  └── 11 │  └── 03 └── actual -> 2015/11/03 4 directories, 0 files 

    now by the name of actual (arbitrary) you can refer to the contents of the catalog 2015/11/03 .

    and tomorrow, after executing the same commands, symlink actual will already point to the catalog 2015/11/04 .


    as applied to your particular case, as I understand it, it should look something like this (instead of actual I used the name last ):

     $ d=$(date +'%Y/%m/%d') $ hdfs dfs -mkdir -p user/result/$d $ ln -snf user/result/$d user/result/last $ hdfs dfs -put table.csv user/result/last/ 

    update

    I read the comments to this answer - yes, in hdfs there are no symlink- s.

    then it probably makes sense to save the required path in the file. or only locally, or (in addition) in the same place, in hdfs . something like this:

     $ d=$(date +'%Y/%m/%d') $ hdfs dfs -mkdir -p user/result/$d # сохраняем «переменную» часть пути в локальный файл last $ echo $d > last $ hdfs dfs -put table.csv user/result/$d/ # сохраняем локальный файл last в hdfs $ hdfs dfs -put last user/result/ 

    if you need to get a saved file, do get :

     $ hdfs dfs -get user/result/last last 

    and read the contents of the last file into a variable:

     $ d=$(cat last) 

    which can then be used for addressing:

     $ hdfs dfs -get user/result/$d/table.csv table.csv 
    • and what does the ln -snf command do? - sinedskid
    • ln creates symlink -and and hardlink -i. for options, see man ln for the options used: -s , -n , -f . - aleksandr barakin
    • locally everything works, but in hdfs a folder, for example, "last" is not created, - sinedskid
    • added update. - aleksandr barakin
    • I thought here that it would be easier to create a separate temporary folder, move the file there, and after the process is complete, just delete it. I'm just new to bash, and it all looks a bit confusing for me - sinedskid